An Unsupervised Algorithm for Person Name Disambiguation in the Web

Agustín D. Delgado, Raquel Martínez, Víctor Fresno, Soto Montalvo Herranz


In this paper we present an unsupervised approach for clustering the results of a search engine when the query is a person name shared by different individuals. We represent the web pages using n-grams, comparing different kind of information and different length of n-grams. Moreover, we propose a new clustering algorithm that calculates the number of clusters and establishes the groups of web pages according to the different individuals, without the need of any training data or predefined thresholds, as the successful state of the art systems do. Our approach is compared with three gold standard collections compiled by different evaluation campaigns for the task of Web People Search. We obtain really competitive results, comparable to those obtained by the best approaches that use annotated data.

Texto completo: