Wikipedia used as a semantic tagger: some preliminary results in Spanish
Resumen
This paper describes a method based on data from Wikipedia for the automatic semantic tagging of common and proper nouns in context. We first predict the semantic category of each Wikipedia entry using a rule-based method that detects definition patterns, and then we generalize from there using a statistical model that associates semantic categories with elements of the entry. The evaluation of proper and common nouns in Spanish reveals a general precision of .82 and a recall of .77. One feature of the method is its conceptual simplicity and computational efficiency. The implementation is offered as open-source code and the data used in the study is in the public domain.


