Latent domain phrase-based models for adaptation

10Citations
Citations of this article
82Readers
Mendeley users who have this article in their library.

Abstract

Phrase-based models directly trained on mix-of-domain corpora can be sub-optimal. In this paper we equip phrase-based models with a latent domain variable and present a novel method for adapting them to an in-domain task represented by a seed corpus. We derive an EM algorithm which alternates between inducing domain-focused phrase pair estimates, and weights for mix-domain sentence pairs reflecting their relevance for the in-domain task. By embedding our latent domain phrase model in a sentence-level model and training the two in tandem, we are able to adapt all core translation components together - phrase, lexical and reordering. We show experiments on weighing sentence pairs for relevance as well as adapting phrase-based models, showing significant performance improvement in both tasks.

References Powered by Scopus

A systematic comparison of various statistical alignment models

2939Citations
N/AReaders
Get full text

The alignment template approach to statistical machine translation

665Citations
N/AReaders
Get full text

Experiments in domain adaptation for statistical machine translation

190Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Instance weighting for neural machine translation domain adaptation

109Citations
N/AReaders
Get full text

Sentence embedding for neural machine translation domain adaptation

78Citations
N/AReaders
Get full text

Sentence selection and weighting for neural machine translation domain adaptation

39Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Cuong, H., & Sima’an, K. (2014). Latent domain phrase-based models for adaptation. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 566–576). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1062

Readers over time

‘14‘15‘16‘17‘19‘20‘21‘22‘23‘24‘2505101520

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 24

63%

Researcher 8

21%

Lecturer / Post doc 4

11%

Professor / Associate Prof. 2

5%

Readers' Discipline

Tooltip

Computer Science 36

80%

Linguistics 5

11%

Engineering 2

4%

Agricultural and Biological Sciences 2

4%

Save time finding and organizing research with Mendeley

Sign up for free
0