Cutting the long tail: Hybrid language models for translation style adaptation

Arianna Bisazza; Marcello Federico

Conference Proceedings

Cutting the long tail: Hybrid language models for translation style adaptation

EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (2012) 439-448

9Citations

92Readers

Abstract

In this paper, we address statistical machine translation of public conference talks. Modeling the style of this genre can be very challenging given the shortage of available in-domain training data. We investigate the use of a hybrid LM, where infrequent words are mapped into classes. Hybrid LMs are used to complement word-based LMs with statistics about the language style of the talks. Extensive experiments comparing different settings of the hybrid LM are reported on publicly available benchmarks based on TED talks, from Arabic to English and from English to French. The proposed models show to better exploit in-domain data than conventional word-based LMs for the target language modeling component of a phrase-based statistical machine translation system.

Cite

CITATION STYLE

APA

Bisazza, A., & Federico, M. (2012). Cutting the long tail: Hybrid language models for translation style adaptation. In EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 439–448). Association for Computational Linguistics (ACL).

Cutting the long tail: Hybrid language models for translation style adaptation

Abstract

Cite

Register to see more suggestions