Lexical Complexity Assessment of Spanish in Ecuadorian Public Documents

Jenny Ortiz-Zambrano, César Espin-Riofrio, Arturo Montejo-Ráez

Resumen


This study presents a comprehensive assessment of lexical complexity (LC) in texts from Ecuadorian public institutions, with a particular focus on the development and application of advanced natural language processing (NLP) techniques. The analysis includes a comparative evaluation of several models and approaches applied to the GovAIEc corpus, a recently developed collection of Ecuadorian government texts. The study examines the impact of incorporating linguistic features and varying the number of training epochs, providing an in-depth analysis of their contribution to model performance. Furthermore, a practical and accessible solution is proposed through a web platform designed to facilitate the understanding of complex words in public documents, which often hinder the successful execution of bureaucratic processes. This work aims to improve interactions with government systems by promoting more efficient and comprehensible communication. The best performance was achieved with bert-base-spanish-wwm-uncased, combining linguistic features and encodings, with a MAE = 0.1551. The results indicate that linguistic features are essential to improve performance, suggesting that hybrid approaches are more effective than those based solely on deep learning.

Texto completo:

PDF