An Extensive Exploration of Back-Translation in 60 Languages

5Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Back-translation is a data augmentation technique that has been shown to improve model quality through the creation of synthetic training bitext. Early studies showed the promise of the technique and follow on studies have produced additional refinements. We have undertaken a broad investigation using back-translation to train models from 60 languages into English; the majority of these languages are considered moderate- or low-resource languages. We observed consistent gains, though compared to prior work we saw conspicuous gains in quite a number of lower-resourced languages. We analyzed differences in translations between baseline and back-translation models, and observed many indications of improved translation quality. Translation of both rare and common terms is improved, and these improvements occur despite the less natural synthetic source-language text used in training.

Cite

CITATION STYLE

APA

McNamee, P., & Duh, K. (2023). An Extensive Exploration of Back-Translation in 60 Languages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 8166–8183). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.518

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free