Selecting syntactic, non-redundant segments in active learning for machine translation

Akiva Miura; Graham Neubig; Michael Paul; Satoshi Nakamura

Conference ProceedingsOPEN ACCESS

Selecting syntactic, non-redundant segments in active learning for machine translation

2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (2016) 20-29

DOI: 10.18653/v1/n16-1003

15Citations

91Readers

Abstract

Active learning is a framework that makes it possible to efficiently train statistical models by selecting informative examples from a pool of unlabeled data. Previous work has found this framework effective for machine translation (MT), making it possible to train better translation models with less effort, particularly when annotators translate short phrases instead of full sentences. However, previous methods for phrase-based active learning in MT fail to consider whether the selected units are coherent and easy for human translators to translate, and also have problems with selecting redundant phrases with similar content. In this paper, we tackle these problems by proposing two new methods for selecting more syntactically coherent and less redundant segments in active learning for MT. Experiments using both simulation and extensive manual translation by professional translators find the proposed method effective, achieving both greater gain of BLEU score for the same number of translated words, and allowing translators to be more confident in their translations1.

Cite

CITATION STYLE

APA

Miura, A., Neubig, G., Paul, M., & Nakamura, S. (2016). Selecting syntactic, non-redundant segments in active learning for machine translation. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (pp. 20–29). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n16-1003

Selecting syntactic, non-redundant segments in active learning for machine translation

Abstract

Cite

Register to see more suggestions