Unifying the Convergences in Multilingual Neural Machine Translation

11Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Although all-in-one-model multilingual neural machine translation (MNMT) has achieved remarkable progress, the convergence inconsistency in the joint training is ignored, i.e., different language pairs reaching convergence in different epochs. This leads to the trained MNMT model over-fitting low-resource language translations while under-fitting high-resource ones. In this paper, we propose a novel training strategy named LSSD (Language-Specific Self-Distillation), which can alleviate the convergence inconsistency and help MNMT models achieve the best performance on each language pair simultaneously. Specifically, LSSD picks up language-specific best checkpoints for each language pair to teach the current model on the fly. Furthermore, we systematically explore three sample-level manipulations of knowledge transferring. Experimental results on three datasets show that LSSD obtains consistent improvements towards all language pairs and achieves the state-of-the-art.

Cite

CITATION STYLE

APA

Huang, Y., Feng, X., Geng, X., & Qin, B. (2022). Unifying the Convergences in Multilingual Neural Machine Translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 6822–6835). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.458

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free