Morphological segmentation for extracting Spanish-Nahuatl bilingual lexicon

Ximena Gutierrez-Vasques, Alfonso Medina-Urrea, Gerardo Sierra


The aim of this work is to extract word translation pairs from a small parallel corpus and to measure the impact of dealing with morphology for improving this task. We focus on the language pair Spanish-Nahuatl, both languages are morphologically rich and distant from each other. We generate semi-supervised morphological segementation models and we compare two approaches (estimation, association) for extracting bilingual correspondences. We show that taking into account typological properties of the languages, such as the morphology, helps to counteract the negative effect of working with a low-resource language.

Texto completo: