Lexical selection is crucial for statistical machine translation. Previous studies separately exploit sentence-level contexts and documentlevel topics for lexical selection, neglecting their correlations. In this paper, we propose a context-Aware topic model for lexical selection, which not only models local contexts and global topics but also captures their correlations. The model uses target-side translations as hidden variables to connect document topics and source-side local contextual words. In order to learn hidden variables and distributions from data, we introduce a Gibbs sampling algorithm for statistical estimation and inference. A new translation probability based on distributions learned by the model is integrated into a translation system for lexical selection. Experiment results on NIST Chinese-English test sets demonstrate that 1) our model significantly outperforms previous lexical selection methods and 2) modeling correlations between local words and global topics can further improve translation quality.
CITATION STYLE
Su, J., Xiong, D., Liu, Y., Han, X., Lin, H., Yao, J., & Zhang, M. (2015). A context-Aware topic model for statistical machine translation. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 1, pp. 229–238). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-1023
Mendeley helps you to discover research relevant for your work.