Using comparable corpora to adapt mt models to new domains

Ann Irvine; Chris Callison Burch

Conference Proceedings

Using comparable corpora to adapt mt models to new domains

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2014) 437-444

DOI: 10.3115/v1/w14-3357

1Citations

90Readers

Get full text

Abstract

In previous work we showed that when using an SMT model trained on old-domain data to translate text in a new-domain, most errors are due to unseen source words, unseen target translations, and inaccurate translation model scores (Irvine et al., 2013a). In this work, we target errors due to inaccurate translation model scores using new-domain comparable corpora, which we mine from Wikipedia. We assume that we have access to a large olddomain parallel training corpus but only enough new-domain parallel data to tune model parameters and do evaluation. We use the new-domain comparable corpora to estimate additional feature scores over the phrase pairs in our baseline models. Augmenting models with the new features improves the quality of machine translations in the medical and science domains by up to 1.3 BLEU points over very strong baselines trained on the 150 million word Canadian Hansard dataset.

Cite

CITATION STYLE

APA

Irvine, A., & Burch, C. C. (2014). Using comparable corpora to adapt mt models to new domains. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 437–444). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3357

Using comparable corpora to adapt mt models to new domains

Abstract

Cite

Register to see more suggestions