A simple sentence-level extraction algorithm for comparable data

Christoph Tillmann; Jian Ming Xu

Conference Proceedings

A simple sentence-level extraction algorithm for comparable data

NAACL-HLT 2009 - Human Language Technologies: 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (2009) 93-96

DOI: 10.3115/1620853.1620881

18Citations

86Readers

Get full text

Abstract

The paper presents a novel sentence pair extraction algorithm for comparable data, where a large set of candidate sentence pairs is scored directly at the sentence-level. The sentence-level extraction relies on a very efficient implementation of a simple symmetric scoring function: a computation speed-up by a factor of 30 is reported. On Spanish-English data, the extraction algorithm finds the highest scoring sentence pairs from close to 1 trillion candidate pairs without search errors. Significant improvements in BLEU are reported by including the extracted sentence pairs into the training of a phrase-based SMT (Statistical Machine Translation) system.

Cite

CITATION STYLE

APA

Tillmann, C., & Xu, J. M. (2009). A simple sentence-level extraction algorithm for comparable data. In NAACL-HLT 2009 - Human Language Technologies: 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (pp. 93–96). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1620853.1620881

A simple sentence-level extraction algorithm for comparable data

Abstract

Cite

Register to see more suggestions