Filtering Back-Translated Data in Unsupervised Neural Machine Translation

Jyotsana Khatri; Pushpak Bhattacharyya

Conference ProceedingsOPEN ACCESS

Filtering Back-Translated Data in Unsupervised Neural Machine Translation

COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (2020) 4334-4339

DOI: 10.18653/v1/2020.coling-main.383

11Citations

59Readers

Abstract

Unsupervised neural machine translation (NMT) utilizes only monolingual data for training. The quality of back-translated data plays an important role in the performance of NMT systems. In back-translation, all generated pseudo parallel sentence pairs are not of the same quality. Taking inspiration from domain adaptation where in-domain sentences are given more weight in training, in this paper we propose an approach to filter back-translated data as part of the training process of unsupervised NMT. Our approach gives more weight to good pseudo parallel sentence pairs in the back-translation phase. We calculate the weight of each pseudo parallel sentence pair using sentence-wise round-trip BLEU score which is normalized batch-wise. We compare our approach with the current state of the art approaches for unsupervised NMT.

Cite

CITATION STYLE

APA

Khatri, J., & Bhattacharyya, P. (2020). Filtering Back-Translated Data in Unsupervised Neural Machine Translation. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 4334–4339). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.383

Filtering Back-Translated Data in Unsupervised Neural Machine Translation

Abstract

Cite

Register to see more suggestions