Current domain-specific machine translation (MT) suffers from the lack of high-quality bilingual corpora. Existing work in this field has shown the advantage of Adaptation data selection (Ada-selection) for enriching the corpora. Encouraged by the empirical finding that topic distribution is conductive to characterizing a distinctive domain, we propose to use topic model to improve Ada-selection. Based on a joint LDA approach, we incorporate topic distribution in measuring the relevance between the target domain and the candidate parallel sentence pairs. On the basis, we select the highly relevant candidates as the high-quality domain-specific bilingual corpora. In practice, we apply our method for the acquisition of domain-specific corpora from the generaldomain. Experiments on an end-to-end domain-specific MT task show that our method outperforms the state of the art, yielding at least 1.5 BLEU points at different scales of training data.
CITATION STYLE
Yao, L., Liu, M., Hong, Y., Liu, H., & Yao, J. (2016). Topic model based adaptation data selection for domain-specific machine translation. In Communications in Computer and Information Science (Vol. 669, pp. 162–171). Springer Verlag. https://doi.org/10.1007/978-981-10-2993-6_14
Mendeley helps you to discover research relevant for your work.