Class-Based language modeling for translating into morphologically rich languages

10Citations
Citations of this article
83Readers
Mendeley users who have this article in their library.

Abstract

Class-based language modeling (LM) is a long-studied and effective approach to overcome data sparsity in the context of n-gram model training. In statistical machine translation (SMT), different forms of class-based LMs have been shown to improve baseline translation quality when used in combination with standard word-level LMs but no published work has systematically compared different kinds of classes, model forms and LM combination methods in a unified SMT setting. This paper aims to fill these gaps by focusing on the challenging problem of translating into Russian, a language with rich inflectional morphology and complex agreement phenomena. We conduct our evaluation in a large-data scenario and report statistically significant BLEU improvements of up to 0.6 points when using a refined variant of the class-based model originally proposed by Brown et al. (1992).

Cite

CITATION STYLE

APA

Bisazza, A., & Monz, C. (2014). Class-Based language modeling for translating into morphologically rich languages. In COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers (pp. 1918–1927). Association for Computational Linguistics, ACL Anthology.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free