Coreference Resolution for Morphologically Rich Languages. Adaptation of the Stanford System to Basque.

Ander Soraluze, Olatz Arregi, Xabier Arregi, Arantza Díaz de Ilarraza


This paper presents the adaptation of the Stanford coreference resolution system to Basque, an agglutinative head-final pro-drop language. The adapted system has been integrated into a global linguistic analysis pipeline so that the input of the system are original Basque raw texts linguistically processed, and annotated. We demonstrate that language-specific characteristics have a noteworthy effect on coreference resolution. In the case of agglutinative languages the use of morphosyntactic features improves substantially the system's performance, obtaining a gain in CoNLL F1 results of 5 points when automatic mentions are used and of 7.87 points when gold mentions are provided.

Texto completo: