Sparse hybrid variational-Gibbs algorithm for latent Dirichlet allocation

7Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Topic modeling algorithms such as the latent Dirichlet allocation (LDA) play an important role in machine learning research. Fitting LDA using Gibbs sampler-related algorithms involves a sampling process over K topics. We can use the sparsity in LDA to accelerate this expensive topic sampling process even for very large K values. However, LDA gradually loses sparsity as the number of documents increases. Motivated by the goal of fast LDA inference with large numbers of both topics and documents, in this paper we propose the novel sparse hybrid variational-Gibbs (SHVG) algorith-m. The SHVG algorithm divides the topic sampling probability into a sparse term that scales linearly with the number of per-document instantiated topics Kd, and a dense term that uses the Alias method to reduce the time cost to constant 0(1) time. This will lead to a significant improvement on efficiency. Using stochastic optimization techniques, we further develop an online version of SHVG for streaming documents. Experimental results on corpora with a wide range of sizes demonstrate the efficiency and effectiveness of the proposed SHVG algorithm.

Cite

CITATION STYLE

APA

Li, X., Ouyang, J., & Zhou, X. (2016). Sparse hybrid variational-Gibbs algorithm for latent Dirichlet allocation. In 16th SIAM International Conference on Data Mining 2016, SDM 2016 (pp. 729–737). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974348.82

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free