Abstract
Current methods of using lexical features in machine translation have difficulty in scaling up to realistic MT tasks due to a prohibitively large number of parameters involved. In this paper, we propose methods of using new linguistic and contextual features that do not suffer from this problem and apply them in a state-ofthe-art hierarchical MT system. The features used in this work are non-terminal labels, non-terminal length distribution, source string context and source dependency LM scores. The effectiveness of our techniques is demonstrated by significant improvements over a strong baseline. On Arabic-to-English translation, improvements in lower-cased BLEU are 2.0 on NIST MT06 and 1.7 on MT08 newswire data on decoding output. On Chinese-to-English translation, the improvements are 1.0 on MT06 and 0.8 on MT08 newswire data. © 2009 ACL and AFNLP.
Cite
CITATION STYLE
Shen, L., Xu, J., Zhang, B., Matsoukas, S., & Weischedel, R. (2009). Effective use of linguistic and contextual information for statistical machine translation. In EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (pp. 72–80). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1699510.1699520
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.