Abu-matran at WMT 2014 translation task: Two-step data selection and rbmt-style synthetic rules

5Citations
Citations of this article
77Readers
Mendeley users who have this article in their library.

Abstract

This paper presents the machine translation systems submitted by the Abu-MaTran project to the WMT 2014 translation task. The language pair concerned is English French with a focus on French as the target language. The French to English translation direction is also considered, based on the word alignment computed in the other direction. Large language and translation models are built using all the datasets provided by the shared task organisers, as well as the monolingual data from LDC. To build the translation models, we apply a two-step data selection method based on bilingual crossentropy difference and vocabulary saturation, considering each parallel corpus individually. Synthetic translation rules are extracted from the development sets and used to train another translation model. We then interpolate the translation models, minimising the perplexity on the development sets, to obtain our final SMT system. Our submission for the English to French translation task was ranked second amongst nine teams and a total of twenty submissions.

Cite

CITATION STYLE

APA

Rubino, R., Toral, A., Śanchez-Cartagena, V. M., Ferŕandez-Tordera, J., Ortiz-Rojas, S., Raḿirez-Śanchez, G., … Way, A. (2014). Abu-matran at WMT 2014 translation task: Two-step data selection and rbmt-style synthetic rules. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 171–177). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3319

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free