Neural Networks Classifier for Data Selection in Statistical Machine Translation

Álvaro Peris; Mara Chinea-Ríos; Francisco Casacuberta

Journal ArticleOPEN ACCESS

Neural Networks Classifier for Data Selection in Statistical Machine Translation

Peris Á
Chinea-Ríos M
Casacuberta F

The Prague Bulletin of Mathematical Linguistics (2017) 108(1) 283-294

DOI: 10.1515/pralin-2017-0027

N/ACitations

19Readers

Abstract

Corpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.

Cite

CITATION STYLE

APA

Peris, Á., Chinea-Ríos, M., & Casacuberta, F. (2017). Neural Networks Classifier for Data Selection in Statistical Machine Translation. The Prague Bulletin of Mathematical Linguistics, 108(1), 283–294. https://doi.org/10.1515/pralin-2017-0027

Neural Networks Classifier for Data Selection in Statistical Machine Translation

Abstract

Cite

Register to see more suggestions