Topic modeling is a widely used approach for clustering text documents, however, it possesses a set of parameters that must be determined by a user, for example, the number of topics. In this paper, we propose a novel approach for fast approximation of the optimal topic number that corresponds well to human judgment. Our method combines the renormalization theory and the Renyi entropy approach. The main advantage of this method is computational speed which is crucial when dealing with big data. We apply our method to Latent Dirichlet Allocation model with Gibbs sampling procedure and test our approach on two datasets in different languages. Numerical results and comparison of computational speed demonstrate a significant gain in time with respect to standard grid search methods.
CITATION STYLE
Koltcov, S., & Ignatenko, V. (2020). Renormalization Approach to the Task of Determining the Number of Topics in Topic Modeling. In Advances in Intelligent Systems and Computing (Vol. 1228 AISC, pp. 234–247). Springer. https://doi.org/10.1007/978-3-030-52249-0_16
Mendeley helps you to discover research relevant for your work.