Análisis morfosintáctico y clasificación de entidades nombradas en un entorno Big Data

Pablo Gamallo, Juan Carlos Pichel, Marcos García, José Manuel Abuín, Tomás Fernández Pena


This article describes a suite of linguistic modules for the Spanish language based on a pipeline architecture, which contains tasks for PoS tagging and Named Entity Recognition and Classification (NERC). We have applied run-time parallelization techniques in a Big Data environment in order to make the suite of modules more efficient and scalable, and thereby to reduce computation time in a significant way. Therefore, we can address problems at Web scale. The linguistic modules have been developed using basic NLP techniques in order to easily integrate them in distributed computing environments. The qualitative performance of the modules is close the the state of the art.

Texto completo: