Target Oriented Data Generation for Quality Estimation of Machine Translation

Huanqin Wu; Muyun Yang; Jiaqi Wang; Junguo Zhu; Tiejun Zhao

Conference Proceedings

Target Oriented Data Generation for Quality Estimation of Machine Translation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11838 LNAI 393-405

DOI: 10.1007/978-3-030-32233-5_31

1Citations

3Readers

Get full text

Abstract

Quality estimation (QE) is a non-trivial issue for machine translation (MT) and the neural approach appears a promising solution to this task. Annotating QE training corpora is a costly process but necessary for supervised QE systems. To provide informative large scale training data for the MT quality estimation model, this paper proposes an approach to generate pseudo QE training data. By leveraging the provided labeled corpus in this task, our method generates pseudo training samples with a purpose of similar distribution of translation error of the labeled corpus. It also describes a sentence specific data expansion strategy to incrementally boost the model performance. The experiments on the different open datasets and models confirm the effectiveness of the method, and indicate that our proposed method can significantly improve the QE performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Wu, H., Yang, M., Wang, J., Zhu, J., & Zhao, T. (2019). Target Oriented Data Generation for Quality Estimation of Machine Translation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11838 LNAI, pp. 393–405). Springer. https://doi.org/10.1007/978-3-030-32233-5_31

Target Oriented Data Generation for Quality Estimation of Machine Translation

Abstract

Author supplied keywords

Cite

Register to see more suggestions