Nonparametric symmetric correspondence topic models for multilingual text analysis

0Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Topic model aims to analyze collection of documents and has been widely used in the fields of machine learning and natural language processing. Recently, researchers proposed some topic models for multilingual parallel or comparable documents. The symmetric correspondence Latent Dirichlet Allocation (SymCorrLDA) is one such model. Despite its advantages over some other existing multilingual topic models, this model is a classic Bayesian parametric model, thus can’t overcome the shortcoming of Bayesian parametric models. For example, the number of topics must be specified in advance. Based on this intuition, we extend this model and propose a Bayesian nonparametric model (NPSymCorrLDA). Experiments on Chinese-English datasets extracted from Wikipedia (https://zh.wikipedia.org/) show significant improvement over SymCorrLDA.

Cite

CITATION STYLE

APA

Cai, R., Chen, M., & Wang, H. (2015). Nonparametric symmetric correspondence topic models for multilingual text analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9362, pp. 270–281). Springer Verlag. https://doi.org/10.1007/978-3-319-25207-0_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free