Sentence selection for improving the tuning process of a statistical machine translation system

Verónica López-Ludeña, Ruben San-Segundo, Juan M. Montero, Jaime Lorenzo


This paper describes a sentence selection strategy for tuning a statistical machine translation system based on Moses that translates Spanish into English. This work proposes two techniques that allow selecting the more similar source sentences of the development corpus to the sentences to translate (source test sentences). With this selection, a better model weights are obtained to be used later in the translation process and therefore, to obtain better translation results. In particular, with the similarity selection method proposed in this paper, experiments report a BLEU improvement from 27.17%, with the complete development set, to 27.47% BLEU, selecting the sentences for tuning. This result is very close to the results obtained for the ORACLE experiment: BLEU of 27.51%. The ORACLE experiment consists of using the same test set for tuning the system weights.

Texto completo: