Learning latent topic information for language model adaptation

Shixiang Lu; Wei Wei; Xiaoyin Fu; Lichun Fan; Bo Xu

Conference Proceedings

Learning latent topic information for language model adaptation

Lu S
Wei W
Fu X
et al.

Communications in Computer and Information Science (2012) 333 CCIS 143-153

DOI: 10.1007/978-3-642-34456-5_14

0Citations

2Readers

Get full text

Abstract

This paper is concerned with data selection for adapting language model (LM) in statistical machine translation (SMT), and aims to find the LM training sentences that are topic similar to the translation task. Although the traditional methods have gained significant performance, they ignore the topic information and the distribution of words in calculating the sentence similarity. In this paper, the authors propose a topic model to discover the latent topics in the content of sentences, and combine the latent topic based similarity with TF-IDF into a unified framework for data selection. Furthermore, the authors combine a cross-lingual projecting method with the topic model, which makes the data selection depend on the source input directly. Large-scale experimental results demonstrate that the proposed approach significantly outperforms the traditional approaches on both LM perplexity and SMT performance. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Lu, S., Wei, W., Fu, X., Fan, L., & Xu, B. (2012). Learning latent topic information for language model adaptation. In Communications in Computer and Information Science (Vol. 333 CCIS, pp. 143–153). https://doi.org/10.1007/978-3-642-34456-5_14

Learning latent topic information for language model adaptation

Abstract

Author supplied keywords

Cite

Register to see more suggestions