Automatic threshold detection for data selection in machine translation

Mirela Stefania Duma; Wolfgang Menzel

Conference ProceedingsOPEN ACCESS

Automatic threshold detection for data selection in machine translation

WMT 2017 - 2nd Conference on Machine Translation, Proceedings (2017) 483-488

DOI: 10.18653/v1/w17-4754

1Citations

69Readers

Abstract

We present in this paper the participation of the University of Hamburg in the Biomedical Translation Task of the Second Conference on Machine Translation (WMT 2017). Our contribution lies in adopting a new direction for performing data selection for Machine Translation via Paragraph Vector and a Feed Forward Neural Network Classifier. Continuous distributed vector representations of the sentences are used as features for the binary classifier. Most approaches in data selection rely on scoring and ranking general domain sentences with respect to their similarity to the in-domain and setting a range of thresholds for selecting a percentage of them for training various MT systems. The novelty of our method consists in developing an automatic threshold detection paradigm for data selection which provides an efficient and simple way for selecting the most similar sentences to the in-domain. Encouraging results are obtained using this approach for seven language pairs and four data sets.

Cite

CITATION STYLE

APA

Duma, M. S., & Menzel, W. (2017). Automatic threshold detection for data selection in machine translation. In WMT 2017 - 2nd Conference on Machine Translation, Proceedings (pp. 483–488). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4754

Automatic threshold detection for data selection in machine translation

Abstract

Cite

Register to see more suggestions