Latent domain phrase-based models for adaptation

10Citations
Citations of this article
78Readers
Mendeley users who have this article in their library.

Abstract

Phrase-based models directly trained on mix-of-domain corpora can be sub-optimal. In this paper we equip phrase-based models with a latent domain variable and present a novel method for adapting them to an in-domain task represented by a seed corpus. We derive an EM algorithm which alternates between inducing domain-focused phrase pair estimates, and weights for mix-domain sentence pairs reflecting their relevance for the in-domain task. By embedding our latent domain phrase model in a sentence-level model and training the two in tandem, we are able to adapt all core translation components together - phrase, lexical and reordering. We show experiments on weighing sentence pairs for relevance as well as adapting phrase-based models, showing significant performance improvement in both tasks.

Cite

CITATION STYLE

APA

Cuong, H., & Sima’an, K. (2014). Latent domain phrase-based models for adaptation. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 566–576). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1062

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free