Removing Noisy Mentions for Distant Supervision

Ander Intxaurrondo , Mihai Surdeanu , Oier Lopez de Lacalle , Eneko Agirre


Relation Extraction methods based on Distant Supervision rely on true tuples to retrieve noisy mentions, which are then used to train traditional supervised relation extraction methods. In this paper we analyze the sources of noise in the mentions, and explore simple methods to filter out noisy mentions. The results show that a combination of mention frequency cut-o, Pointwise Mutual Information and removal of mentions which are far from the feature centroids of relation labels is able to signicantly improve the results of two relation extraction models.

Texto completo: