A simple sentence-level extraction algorithm for comparable data

18Citations
Citations of this article
78Readers
Mendeley users who have this article in their library.

Abstract

The paper presents a novel sentence pair extraction algorithm for comparable data, where a large set of candidate sentence pairs is scored directly at the sentence-level. The sentence-level extraction relies on a very efficient implementation of a simple symmetric scoring function: a computation speed-up by a factor of 30 is reported. On Spanish-English data, the extraction algorithm finds the highest scoring sentence pairs from close to 1 trillion candidate pairs without search errors. Significant improvements in BLEU are reported by including the extracted sentence pairs into the training of a phrase-based SMT (Statistical Machine Translation) system.

Cite

CITATION STYLE

APA

Tillmann, C., & Xu, J. M. (2009). A simple sentence-level extraction algorithm for comparable data. In NAACL-HLT 2009 - Human Language Technologies: 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (pp. 93–96). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1620853.1620881

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free