Evaluating Galician language models for sentiment analysis on challenging linguistic phenomena

Anxo Alonso, Pablo Gamallo

Resumen


Sentiment analysis is still one of the most relevant tasks in NLP. However, lowresource languages lack sufficient datasets and models for this task. In this paper, we present a study on sentiment analysis in Galician, analyzing how linguistic phenomena can influence this task. For this purpose, we developed Senti-Gal, a dataset with 998 sentences including adversative, concessive and conditional sentences, diglossic phenomena, negation and irony. We evaluated Senti-Gal on seven models: a multilingual machine learning model, a multilingual decoder-only (or generative) model, and five encoder-only models (three multilingual and two monolingual), all of them fine-tuned with a training dataset we also developed. The results indicate that the best fine-tuned encoder-only models outperform the decoder-only model, that syntactic and pragmatic phenomena remain a challenge, and that monolingual and multilingual models perform similarly. We release Senti-Gal, the fine-tuned models and the first Galician training corpus for sentiment analysis freely available.

Texto completo:

PDF