An empirical analysis of data selection techniques in statistical machine translation.

Mara Chinea-Rios, Germán Sanchis-Trilles, Francisco Casacuberta


Domain adaptation has recently gained interest in statistical machine
translation. One of the adaptation techniques is based in the selection data. Data selection aims to select the best subset of the bilingual sentences from an available pool of sentences, with which to train a SMT system. In this paper, we study how affect the bilingual corpora used for the data selection methods in the translation quality.

