Recent work demonstrates the potential of training one model for multilingual machine translation. In parallel, denoising pretraining using unlabeled monolingual data as a starting point for finetuning bitext machine translation systems has demonstrated strong performance gains. However, little has been explored on the potential to combine denoising pretraining with multilingual machine translation in a single model. In this work, we fill this gap by studying how multilingual translation models can be created through multilingual finetuning. Fintuning multilingual model from a denoising pretrained model incorporates the benefits of large quantities of unlabeled monolingual data, which is particularly important for low resource languages where bitext is rare. Further, we create the ML50 benchmark to facilitate reproducible research by standardizing training and evaluation data. On ML50, we show that multilingual finetuning significantly improves over multilingual models trained from scratch and bilingual finetuning for translation into English. We also find that multilingual finetuning can significantly improve over multilingual models trained from scratch for zero-shot translation on non-English directions. Finally, we discuss that the pretraining and finetuning paradigm alone is not enough to address the challenges of multilingual models for to-Many directions performance.
CITATION STYLE
Tang, Y., Tran, C., Li, X., Chen, P. J., Goyal, N., Chaudhary, V., … Fan, A. (2021). Multilingual Translation from Denoising Pre-Training. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 3450–3466). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.304
Mendeley helps you to discover research relevant for your work.