Overview of IberAuTexTification at IberLEF 2024: Detection and Attribution of Machine-Generated Text on Languages of the Iberian Peninsula

Areg Mikael Sarvazyan, José Ángel González, Francisco Rangel, Paolo Rosso, Marc Franco-Salvador

Resumen


This paper presents the overview of the IberAuTexTification shared task as part of the IberLEF 2024 Workshop in Iberian Languages Evaluation Forum, within the framework of the SEPLN 2024 conference. IberAuTexTification extends our previous AuTexTification shared task in three dimensions: (i) more domains, (ii) more languages from the Iberian Peninsula, and (iii) more prominent LLMs. This shared task frames a multilingual, multi-domain, and multi-model setting consisting of two subtasks. For Subtask 1, participants have to determine whether a text’s author is a human or machine. For Subtask 2, participants have to attribute a machine-generated text to a large language model. Our IberAuTexTification dataset contains about 168,000 texts across six languages (English, Spanish, Portuguese, Catalan, Basque, and Galician) and seven domains (chat, news, literary, reviews, tweets, wikipedia, and how-to articles). A total of 21 teams participated in the task with 68 runs, 54 for Subtask 1 and 14 for Subtask 2. In this overview, we present the IberAuTexTification task, the submitted participating systems, and the results.

Texto completo:

PDF