Topic model based adaptation data selection for domain-specific machine translation

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Current domain-specific machine translation (MT) suffers from the lack of high-quality bilingual corpora. Existing work in this field has shown the advantage of Adaptation data selection (Ada-selection) for enriching the corpora. Encouraged by the empirical finding that topic distribution is conductive to characterizing a distinctive domain, we propose to use topic model to improve Ada-selection. Based on a joint LDA approach, we incorporate topic distribution in measuring the relevance between the target domain and the candidate parallel sentence pairs. On the basis, we select the highly relevant candidates as the high-quality domain-specific bilingual corpora. In practice, we apply our method for the acquisition of domain-specific corpora from the generaldomain. Experiments on an end-to-end domain-specific MT task show that our method outperforms the state of the art, yielding at least 1.5 BLEU points at different scales of training data.

Cite

CITATION STYLE

APA

Yao, L., Liu, M., Hong, Y., Liu, H., & Yao, J. (2016). Topic model based adaptation data selection for domain-specific machine translation. In Communications in Computer and Information Science (Vol. 669, pp. 162–171). Springer Verlag. https://doi.org/10.1007/978-981-10-2993-6_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free