Neural language models (NLMs) have been able to improve machine translation (MT) thanks to their ability to generalize well to long contexts. Despite recent successes of deep neural networks in speech and vision, the general practice in MT is to incorporate NLMs with only one or two hidden layers and there have not been clear results on whether having more layers helps. In this paper, we demonstrate that deep NLMs with three or four layers outperform those with fewer layers in terms of both the perplexity and the translation quality. We combine various techniques to successfully train deep NLMs that jointly condition on both the source and target contexts. When reranking nbest lists of a strong web-forum baseline, our deep models yield an average boost of 0.5 TER / 0.5 BLEU points compared to using a shallow NLM. Additionally, we adapt our models to a new sms-chat domain and obtain a similar gain of 1.0 TER / 0.5 BLEU points.1
CITATION STYLE
Luong, M. T., Kayser, M., & Manning, C. D. (2015). Deep neural language models for machine translation. In CoNLL 2015 - 19th Conference on Computational Natural Language Learning, Proceedings (pp. 305–309). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/k15-1031
Mendeley helps you to discover research relevant for your work.