Topic models incorporating statistical word senses

Guoyu Tang; Yunqing Xia; Jun Sun; Min Zhang; Thomas Fang Zheng

Conference Proceedings

Topic models incorporating statistical word senses

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8403 LNCS(PART 1) 151-162

DOI: 10.1007/978-3-642-54906-9_13

3Citations

2Readers

Get full text

Abstract

LDA considers a surface word to be identical across all documents and measures the contribution of a surface word to each topic. However, a surface word may present different signatures in different contexts, i.e. polysemous words can be used with different senses in different contexts. Intuitively, disambiguating word senses for topic models can enhance their discriminative capabilities. In this work, we propose a joint model to automatically induce document topics and word senses simultaneously. Instead of using some pre-defined word sense resources, we capture the word sense information via a latent variable and directly induce them in a fully unsupervised manner from the corpora. Experimental results show that the proposed joint model outperforms the classic LDA and a standalone sense-based LDA model significantly in document clustering. © 2014 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Tang, G., Xia, Y., Sun, J., Zhang, M., & Zheng, T. F. (2014). Topic models incorporating statistical word senses. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8403 LNCS, pp. 151–162). Springer Verlag. https://doi.org/10.1007/978-3-642-54906-9_13

Topic models incorporating statistical word senses

Abstract

Author supplied keywords

Cite

Register to see more suggestions