Renormalization Approach to the Task of Determining the Number of Topics in Topic Modeling

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Topic modeling is a widely used approach for clustering text documents, however, it possesses a set of parameters that must be determined by a user, for example, the number of topics. In this paper, we propose a novel approach for fast approximation of the optimal topic number that corresponds well to human judgment. Our method combines the renormalization theory and the Renyi entropy approach. The main advantage of this method is computational speed which is crucial when dealing with big data. We apply our method to Latent Dirichlet Allocation model with Gibbs sampling procedure and test our approach on two datasets in different languages. Numerical results and comparison of computational speed demonstrate a significant gain in time with respect to standard grid search methods.

Cite

CITATION STYLE

APA

Koltcov, S., & Ignatenko, V. (2020). Renormalization Approach to the Task of Determining the Number of Topics in Topic Modeling. In Advances in Intelligent Systems and Computing (Vol. 1228 AISC, pp. 234–247). Springer. https://doi.org/10.1007/978-3-030-52249-0_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free