Domain Adaptation of Machine Translation with Crowdworkers

1Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the target domain’s data are limited. However, there is great demand for high-quality domain-specific machine translation models for many domains. We propose a framework that efficiently and effectively collects parallel sentences in a target domain from the web with the help of crowdworkers. With the collected parallel data, we can quickly adapt a machine translation model to the target domain. Our experiments show that the proposed method can collect target-domain parallel data over a few days at a reasonable cost. We tested it with five domains, and the domain-adapted model improved the BLEU scores to +19.7 by an average of +7.8 points compared to a general-purpose translation model.

Cite

CITATION STYLE

APA

Morishita, M., Suzuki, J., & Nagata, M. (2022). Domain Adaptation of Machine Translation with Crowdworkers. In EMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track (pp. 616–628). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-industry.62

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free