Improved statistical machine translation for resource-poor languages using related resource-rich languages

Preslav Nakov; Hwee Tou Ng

Conference Proceedings

Improved statistical machine translation for resource-poor languages using related resource-rich languages

EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (2009) 1358-1367

DOI: 10.3115/1699648.1699682

58Citations

108Readers

Get full text

Abstract

We propose a novel language-independent approach for improving statistical machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X1 into a resource-rich language Y given a bi-text containing a limited number of parallel sentences for X 1-Y and a larger bi-text for X2-Y for some resource-rich language X2 that is closely related to X1. The evaluation for Indonesian→English (using Malay) and Spanish→English (using Portuguese and pretending Spanish is resource-poor) shows an absolute gain of up to 1.35 and 3.37 Bleu points, respectively, which is an improvement over the rivaling approaches, while using much less additional data. © 2009 ACL and AFNLP.

Cite

CITATION STYLE

APA

Nakov, P., & Ng, H. T. (2009). Improved statistical machine translation for resource-poor languages using related resource-rich languages. In EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (pp. 1358–1367). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1699648.1699682

Improved statistical machine translation for resource-poor languages using related resource-rich languages

Abstract

Cite

Register to see more suggestions