Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

69Citations
Citations of this article
147Readers
Mendeley users who have this article in their library.

Abstract

In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained models. However, these improvements are not uniform; often multilingual parameter sharing results in a decrease in accuracy due to translation models not being able to accommodate different languages in their limited parameter space. In this work, we examine parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model. We find that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family. However, even in the case where target languages are from different families where full parameter sharing leads to a noticeable drop in BLEU scores, our proposed methods for partial sharing of parameters can lead to substantial improvements in translation accuracy.

Cite

CITATION STYLE

APA

Sachan, D. S., & Neubig, G. (2018). Parameter Sharing Methods for Multilingual Self-Attentional Translation Models. In WMT 2018 - 3rd Conference on Machine Translation, Proceedings of the Conference (Vol. 1, pp. 261–271). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-6327

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free