Iterative domain-repaired back-translation

Hao Ran Wei; Zhirui Zhang; Boxing Chen; Weihua Luo

Conference ProceedingsOPEN ACCESS

Iterative domain-repaired back-translation

EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2020) 5884-5893

DOI: 10.18653/v1/2020.emnlp-main.474

20Citations

82Readers

Abstract

In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent. One common and effective strategy for this case is exploiting in-domain monolingual data with the back-translation method. However, the synthetic parallel data is very noisy because they are generated by imperfect out-of-domain systems, resulting in the poor performance of domain adaptation. To address this issue, we propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair (DR) model to refine translations in synthetic bilingual data. To this end, we construct corresponding data for the DR model training by round-trip translating the monolingual sentences, and then design the unified training framework to optimize paired DR and NMT models jointly. Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach, achieving 15.79 and 4.47 BLEU improvements on average over unadapted models and back-translation.

Cite

CITATION STYLE

APA

Wei, H. R., Zhang, Z., Chen, B., & Luo, W. (2020). Iterative domain-repaired back-translation. In EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 5884–5893). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.emnlp-main.474

Iterative domain-repaired back-translation

Abstract

Cite

Register to see more suggestions