Topic model based adaptation data selection for domain-specific machine translation

Liang Yao; Mengyi Liu; Yu Hong; Hao Liu; Jianmin Yao

Conference Proceedings

Topic model based adaptation data selection for domain-specific machine translation

Communications in Computer and Information Science (2016) 669 162-171

DOI: 10.1007/978-981-10-2993-6_14

0Citations

3Readers

Get full text

Abstract

Current domain-specific machine translation (MT) suffers from the lack of high-quality bilingual corpora. Existing work in this field has shown the advantage of Adaptation data selection (Ada-selection) for enriching the corpora. Encouraged by the empirical finding that topic distribution is conductive to characterizing a distinctive domain, we propose to use topic model to improve Ada-selection. Based on a joint LDA approach, we incorporate topic distribution in measuring the relevance between the target domain and the candidate parallel sentence pairs. On the basis, we select the highly relevant candidates as the high-quality domain-specific bilingual corpora. In practice, we apply our method for the acquisition of domain-specific corpora from the generaldomain. Experiments on an end-to-end domain-specific MT task show that our method outperforms the state of the art, yielding at least 1.5 BLEU points at different scales of training data.

Author supplied keywords

Cite

CITATION STYLE

APA

Yao, L., Liu, M., Hong, Y., Liu, H., & Yao, J. (2016). Topic model based adaptation data selection for domain-specific machine translation. In Communications in Computer and Information Science (Vol. 669, pp. 162–171). Springer Verlag. https://doi.org/10.1007/978-981-10-2993-6_14

Topic model based adaptation data selection for domain-specific machine translation

Abstract

Author supplied keywords

Cite

Register to see more suggestions