Domain Adaptation of Machine Translation with Crowdworkers

Makoto Morishita; Jun Suzuki; Masaaki Nagata

Conference ProceedingsOPEN ACCESS

Domain Adaptation of Machine Translation with Crowdworkers

EMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track (2022) 616-628

DOI: 10.18653/v1/2022.emnlp-industry.62

1Citations

16Readers

Abstract

Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the target domain’s data are limited. However, there is great demand for high-quality domain-specific machine translation models for many domains. We propose a framework that efficiently and effectively collects parallel sentences in a target domain from the web with the help of crowdworkers. With the collected parallel data, we can quickly adapt a machine translation model to the target domain. Our experiments show that the proposed method can collect target-domain parallel data over a few days at a reasonable cost. We tested it with five domains, and the domain-adapted model improved the BLEU scores to +19.7 by an average of +7.8 points compared to a general-purpose translation model.

Cite

CITATION STYLE

APA

Morishita, M., Suzuki, J., & Nagata, M. (2022). Domain Adaptation of Machine Translation with Crowdworkers. In EMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track (pp. 616–628). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-industry.62

Domain Adaptation of Machine Translation with Crowdworkers

Abstract

Cite

Register to see more suggestions