Using comparable corpora to adapt mt models to new domains

1Citations
Citations of this article
90Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In previous work we showed that when using an SMT model trained on old-domain data to translate text in a new-domain, most errors are due to unseen source words, unseen target translations, and inaccurate translation model scores (Irvine et al., 2013a). In this work, we target errors due to inaccurate translation model scores using new-domain comparable corpora, which we mine from Wikipedia. We assume that we have access to a large olddomain parallel training corpus but only enough new-domain parallel data to tune model parameters and do evaluation. We use the new-domain comparable corpora to estimate additional feature scores over the phrase pairs in our baseline models. Augmenting models with the new features improves the quality of machine translations in the medical and science domains by up to 1.3 BLEU points over very strong baselines trained on the 150 million word Canadian Hansard dataset.

Cite

CITATION STYLE

APA

Irvine, A., & Burch, C. C. (2014). Using comparable corpora to adapt mt models to new domains. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 437–444). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3357

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free