Leveraging Global and Local Topic Popularities for LDA-Based Document Clustering

23Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Document clustering is of high importance for many natural language technologies. A wide range of computational traditional topic models, such as LDA (Latent Dirichlet Allocation) and its variants, have made great progress. However, traditional LDA-based clustering algorithms might not give good results due to such probabilistic models require prior distributions which are always difficult to define. In this paper, we propose a probabilistic model named tpLDA, which incorporates different levels of topic popularity information to determine the prior LDA distribution, discover the latent topics and achieve better clustering. Specifically, global topic popularity is introduced to reduce the potential distraction in local cluster popularity and the local cluster popularity draws more attention on certain parts of the global topic popularity. The two popularities contribute complementary information and their integration can dynamically adjust statistical parameters of the model. Experimental evaluations on real data sets show that, compared with state-of-the-art approaches, our proposed framework dramatically improves the accuracy of documents clustering.

Cite

CITATION STYLE

APA

Yang, P., Yao, Y., & Zhou, H. (2020). Leveraging Global and Local Topic Popularities for LDA-Based Document Clustering. IEEE Access, 8, 24734–24745. https://doi.org/10.1109/ACCESS.2020.2969525

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free