Neural Networks Classifier for Data Selection in Statistical Machine Translation

  • Peris Á
  • Chinea-Ríos M
  • Casacuberta F
N/ACitations
Citations of this article
19Readers
Mendeley users who have this article in their library.

Abstract

Corpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.

Cite

CITATION STYLE

APA

Peris, Á., Chinea-Ríos, M., & Casacuberta, F. (2017). Neural Networks Classifier for Data Selection in Statistical Machine Translation. The Prague Bulletin of Mathematical Linguistics, 108(1), 283–294. https://doi.org/10.1515/pralin-2017-0027

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free