Automatic Annotation of the Catalan Wikipedia: Exploring the Semantic Space via multiple NERC systems

Jordi Atserias , Judith Domingo , Carlos Rodriguez , Teresa Suñol


This paper presents WikiNer, a snapshot of the Catalan Wikipedia processed with
different NLP tools (POS tagger, NERC, dependency parsers). The article focuses on the analysis of different NERC annotations using 3 taggers: JNET, YamCha and SST. Although Wikipedia text (specially in tables, lists, references) differs significantly in distributional properties from the corpora used to train the taggers, we believe that results of automatically
annotating the semantic space of the Catalan Wikipedia point to the quick availability of a resource containing massive text annotated with a degree of reliability that is enough for some
research tasks as well as for applications, such as simple Q&A, ontology enrichment and semantic search.

Texto completo: