Abu-matran at WMT 2014 translation task: Two-step data selection and rbmt-style synthetic rules

Raphael Rubino; Antonio Toral; Victor M. Śanchez-Cartagena; Jorge Ferŕandez-Tordera; Sergio Ortiz-Rojas; Gema Raḿirez-Śanchez; Felipe Śanchez-Martinez; Andy Way

Conference Proceedings

Abu-matran at WMT 2014 translation task: Two-step data selection and rbmt-style synthetic rules

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2014) 171-177

DOI: 10.3115/v1/w14-3319

5Citations

87Readers

Get full text

Abstract

This paper presents the machine translation systems submitted by the Abu-MaTran project to the WMT 2014 translation task. The language pair concerned is English French with a focus on French as the target language. The French to English translation direction is also considered, based on the word alignment computed in the other direction. Large language and translation models are built using all the datasets provided by the shared task organisers, as well as the monolingual data from LDC. To build the translation models, we apply a two-step data selection method based on bilingual crossentropy difference and vocabulary saturation, considering each parallel corpus individually. Synthetic translation rules are extracted from the development sets and used to train another translation model. We then interpolate the translation models, minimising the perplexity on the development sets, to obtain our final SMT system. Our submission for the English to French translation task was ranked second amongst nine teams and a total of twenty submissions.

Cite

CITATION STYLE

APA

Rubino, R., Toral, A., Śanchez-Cartagena, V. M., Ferŕandez-Tordera, J., Ortiz-Rojas, S., Raḿirez-Śanchez, G., … Way, A. (2014). Abu-matran at WMT 2014 translation task: Two-step data selection and rbmt-style synthetic rules. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 171–177). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3319

Abu-matran at WMT 2014 translation task: Two-step data selection and rbmt-style synthetic rules

Abstract

Cite

Register to see more suggestions