Extraction of Terms Semantically Related to Colponyms: Evaluationin a Small Specialized Corpus

Juan Rojas-Garcia


EcoLexicon is a terminological knowledge base on environmental science, whose design permits the geographic contextualization of data. For the geographic contextualization of named entities such as colponyms (i.e., named bays such as Pensacola Bay) in EcoLexicon, both count-based and prediction-based distributional semantic models (DSMs) were applied to a small-sized, English specialized corpus to extract terms related to each colponymmentioned in it and their semantic relations. Since the evaluation of DSMs in small, specialized corpora has received little attention, this study identified both parameter combinations in DSMs and five similarity/distance measures suitable for the extraction of terms which related to colponyms through the semantic relations takes_place_in, located_at, and attribute_of. The models were thus evaluated using three gold standard datasets. The results showed that: count-based models outperformed prediction-based ones; the similarity/distance measures performed quite similar except for the Euclidean distance; and the detection of a specific relation depended on the context window size.

Texto completo: