LDA considers a surface word to be identical across all documents and measures the contribution of a surface word to each topic. However, a surface word may present different signatures in different contexts, i.e. polysemous words can be used with different senses in different contexts. Intuitively, disambiguating word senses for topic models can enhance their discriminative capabilities. In this work, we propose a joint model to automatically induce document topics and word senses simultaneously. Instead of using some pre-defined word sense resources, we capture the word sense information via a latent variable and directly induce them in a fully unsupervised manner from the corpora. Experimental results show that the proposed joint model outperforms the classic LDA and a standalone sense-based LDA model significantly in document clustering. © 2014 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Tang, G., Xia, Y., Sun, J., Zhang, M., & Zheng, T. F. (2014). Topic models incorporating statistical word senses. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8403 LNCS, pp. 151–162). Springer Verlag. https://doi.org/10.1007/978-3-642-54906-9_13
Mendeley helps you to discover research relevant for your work.