This paper presents the machine translation systems submitted by the Abu-MaTran project to the WMT 2014 translation task. The language pair concerned is English French with a focus on French as the target language. The French to English translation direction is also considered, based on the word alignment computed in the other direction. Large language and translation models are built using all the datasets provided by the shared task organisers, as well as the monolingual data from LDC. To build the translation models, we apply a two-step data selection method based on bilingual crossentropy difference and vocabulary saturation, considering each parallel corpus individually. Synthetic translation rules are extracted from the development sets and used to train another translation model. We then interpolate the translation models, minimising the perplexity on the development sets, to obtain our final SMT system. Our submission for the English to French translation task was ranked second amongst nine teams and a total of twenty submissions.
CITATION STYLE
Rubino, R., Toral, A., Śanchez-Cartagena, V. M., Ferŕandez-Tordera, J., Ortiz-Rojas, S., Raḿirez-Śanchez, G., … Way, A. (2014). Abu-matran at WMT 2014 translation task: Two-step data selection and rbmt-style synthetic rules. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 171–177). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3319
Mendeley helps you to discover research relevant for your work.