On the use of comparable corpora to improve SMT performance

100Citations
Citations of this article
106Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a simple and effective method for extracting parallel sentences from comparable corpora. We employ a statistical machine translation (SMT) system built from small amounts of parallel texts to translate the source side of the non-parallel corpus. The target side texts are used, along with other corpora, in the language model of this SMT system. We then use information retrieval techniques and simple filters to create French/English parallel data from a comparable news corpora. We evaluate the quality of the extracted data by showing that it significantly improves the performance of an SMT systems. © 2009 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Abdul-Rauf, S., & Schwenk, H. (2009). On the use of comparable corpora to improve SMT performance. In EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 16–23). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1609067.1609068

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free