A Discourse Marker Tagger for Spanish using Transformers

Ana García Toro, Jordi Porta Zamorano, Antonio Moreno-Sandoval


We present an automatic discourse particle (DM) tagger developed using manual annotation and machine learning. The tagger has been developed on a dataset of financial letters, where human annotators have reached an 0.897 agreement rate (IAA) on the indications of a specific annotation guide. With the annotated dataset, a prototype has been developed using the pre-trained Transformers, adapting it to the task (fine-tunning), reaching an F1-score of 0.933. An evaluation of the results obtained by the tagger is included.

Texto completo: