Abstract
One of the major challenges for statistical machine translation (SMT) is to choose the appropriate translation rules based on the sentence context. This paper proposes a continuous space rule selection (CSRS) model for syntax-based SMT to perform this context-dependent rule selection. In contrast to existing maximum entropy based rule selection (MERS) models, which use discrete representations of words as features, the CSRS model is learned by a feed-forward neural network and uses real-valued vector representations of words, allowing for better generalization. In addition, we propose a method to train the rule selection models only on minimal rules, which are more frequent and have richer training data compared to non-minimal rules. We tested our model on different translation tasks and the CSRS model outperformed a baseline without rule selection and the previous MERS model by up to 2.2 and 1.1 points of BLEU score respectively.
Cite
CITATION STYLE
Zhang, J., Utiyama, M., Sumita, E., Neubig, G., & Nakamura, S. (2016). A continuous space rule selection model for syntax-based statistical machine translation. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers (Vol. 3, pp. 1372–1381). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-1130
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.