Sparse hybrid variational-Gibbs algorithm for latent Dirichlet allocation

Ximing Li; Jihong Ouyang; Xiaotang Zhou

Conference Proceedings

Sparse hybrid variational-Gibbs algorithm for latent Dirichlet allocation

16th SIAM International Conference on Data Mining 2016, SDM 2016 (2016) 729-737

DOI: 10.1137/1.9781611974348.82

7Citations

5Readers

Get full text

Abstract

Topic modeling algorithms such as the latent Dirichlet allocation (LDA) play an important role in machine learning research. Fitting LDA using Gibbs sampler-related algorithms involves a sampling process over K topics. We can use the sparsity in LDA to accelerate this expensive topic sampling process even for very large K values. However, LDA gradually loses sparsity as the number of documents increases. Motivated by the goal of fast LDA inference with large numbers of both topics and documents, in this paper we propose the novel sparse hybrid variational-Gibbs (SHVG) algorith-m. The SHVG algorithm divides the topic sampling probability into a sparse term that scales linearly with the number of per-document instantiated topics Kd, and a dense term that uses the Alias method to reduce the time cost to constant 0(1) time. This will lead to a significant improvement on efficiency. Using stochastic optimization techniques, we further develop an online version of SHVG for streaming documents. Experimental results on corpora with a wide range of sizes demonstrate the efficiency and effectiveness of the proposed SHVG algorithm.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, X., Ouyang, J., & Zhou, X. (2016). Sparse hybrid variational-Gibbs algorithm for latent Dirichlet allocation. In 16th SIAM International Conference on Data Mining 2016, SDM 2016 (pp. 729–737). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974348.82

Sparse hybrid variational-Gibbs algorithm for latent Dirichlet allocation

Abstract

Author supplied keywords

Cite

Register to see more suggestions