A Supervised Central Unit Detector for Spanish

Kepa Bengoetxea, Mikel Iruskieta

Resumen


In this paper we present the first automatic detector of the Central Unit (CU) for Spanish scientific abstracts based on machine learning techniques. To do so, learning and evaluation data was extracted from the RST Spanish Treebank annotated under the Rhetorical Structure Theory (RST). We use a bag-of-words model based on Naive Bayes and SVM classifiers to detect the central units of a text. Finaly, we evaluate the performance of the classifiers and choose the best to create an automatic CU detector.

Texto completo:

PDF