Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

Devendra Singh Sachan; Graham Neubig

Conference ProceedingsOPEN ACCESS

Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

WMT 2018 - 3rd Conference on Machine Translation, Proceedings of the Conference (2018) 1 261-271

DOI: 10.18653/v1/w18-6327

73Citations

153Readers

Abstract

In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained models. However, these improvements are not uniform; often multilingual parameter sharing results in a decrease in accuracy due to translation models not being able to accommodate different languages in their limited parameter space. In this work, we examine parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model. We find that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family. However, even in the case where target languages are from different families where full parameter sharing leads to a noticeable drop in BLEU scores, our proposed methods for partial sharing of parameters can lead to substantial improvements in translation accuracy.

Cite

CITATION STYLE

APA

Sachan, D. S., & Neubig, G. (2018). Parameter Sharing Methods for Multilingual Self-Attentional Translation Models. In WMT 2018 - 3rd Conference on Machine Translation, Proceedings of the Conference (Vol. 1, pp. 261–271). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-6327

Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

Abstract

Cite

Register to see more suggestions