Automatic threshold detection for data selection in machine translation

1Citations
Citations of this article
69Readers
Mendeley users who have this article in their library.

Abstract

We present in this paper the participation of the University of Hamburg in the Biomedical Translation Task of the Second Conference on Machine Translation (WMT 2017). Our contribution lies in adopting a new direction for performing data selection for Machine Translation via Paragraph Vector and a Feed Forward Neural Network Classifier. Continuous distributed vector representations of the sentences are used as features for the binary classifier. Most approaches in data selection rely on scoring and ranking general domain sentences with respect to their similarity to the in-domain and setting a range of thresholds for selecting a percentage of them for training various MT systems. The novelty of our method consists in developing an automatic threshold detection paradigm for data selection which provides an efficient and simple way for selecting the most similar sentences to the in-domain. Encouraging results are obtained using this approach for seven language pairs and four data sets.

Cite

CITATION STYLE

APA

Duma, M. S., & Menzel, W. (2017). Automatic threshold detection for data selection in machine translation. In WMT 2017 - 2nd Conference on Machine Translation, Proceedings (pp. 483–488). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4754

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free