Abstract
We present a simple and effective method for extracting parallel sentences from comparable corpora. We employ a statistical machine translation (SMT) system built from small amounts of parallel texts to translate the source side of the non-parallel corpus. The target side texts are used, along with other corpora, in the language model of this SMT system. We then use information retrieval techniques and simple filters to create French/English parallel data from a comparable news corpora. We evaluate the quality of the extracted data by showing that it significantly improves the performance of an SMT systems. © 2009 Association for Computational Linguistics.
Cite
CITATION STYLE
Abdul-Rauf, S., & Schwenk, H. (2009). On the use of comparable corpora to improve SMT performance. In EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 16–23). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1609067.1609068
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.