Learning latent topic information for language model adaptation

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper is concerned with data selection for adapting language model (LM) in statistical machine translation (SMT), and aims to find the LM training sentences that are topic similar to the translation task. Although the traditional methods have gained significant performance, they ignore the topic information and the distribution of words in calculating the sentence similarity. In this paper, the authors propose a topic model to discover the latent topics in the content of sentences, and combine the latent topic based similarity with TF-IDF into a unified framework for data selection. Furthermore, the authors combine a cross-lingual projecting method with the topic model, which makes the data selection depend on the source input directly. Large-scale experimental results demonstrate that the proposed approach significantly outperforms the traditional approaches on both LM perplexity and SMT performance. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Lu, S., Wei, W., Fu, X., Fan, L., & Xu, B. (2012). Learning latent topic information for language model adaptation. In Communications in Computer and Information Science (Vol. 333 CCIS, pp. 143–153). https://doi.org/10.1007/978-3-642-34456-5_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free