Abstract
Statistical models in machine translation exhibit spurious ambiguity. That is, the probability of an output string is split among many distinct derivations (e.g., trees or segmentations). In principle, the goodness of a string is measured by the total probability of its many derivations. However, finding the best string (e.g., during decoding) is then computationally intractable. Therefore, most systems use a simple Viterbi approximation that measures the goodness of a string using only its most probable derivation. Instead, we develop a variational approximation, which considers all the derivations but still allows tractable decoding. Our particular variational distributions are parameterized as n-gram models. We also analytically show that interpolating these n-gram models for different n is similar to minimum-risk decoding for BLEU (Tromble et al., 2008). Experiments show that our approach improves the state of the art.
Cite
CITATION STYLE
Li, Z., Eisner, J., & Khudanpur, S. (2009). Variational Decoding for Statistical Machine Translation. In ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. (pp. 593–601). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1690219.1690229
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.