Abstract
In this paper, we address statistical machine translation of public conference talks. Modeling the style of this genre can be very challenging given the shortage of available in-domain training data. We investigate the use of a hybrid LM, where infrequent words are mapped into classes. Hybrid LMs are used to complement word-based LMs with statistics about the language style of the talks. Extensive experiments comparing different settings of the hybrid LM are reported on publicly available benchmarks based on TED talks, from Arabic to English and from English to French. The proposed models show to better exploit in-domain data than conventional word-based LMs for the target language modeling component of a phrase-based statistical machine translation system.
Cite
CITATION STYLE
Bisazza, A., & Federico, M. (2012). Cutting the long tail: Hybrid language models for translation style adaptation. In EACL 2012 - 13th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 439–448). Association for Computational Linguistics (ACL).
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.