Abstract
Topic modeling algorithms such as the latent Dirichlet allocation (LDA) play an important role in machine learning research. Fitting LDA using Gibbs sampler-related algorithms involves a sampling process over K topics. We can use the sparsity in LDA to accelerate this expensive topic sampling process even for very large K values. However, LDA gradually loses sparsity as the number of documents increases. Motivated by the goal of fast LDA inference with large numbers of both topics and documents, in this paper we propose the novel sparse hybrid variational-Gibbs (SHVG) algorith-m. The SHVG algorithm divides the topic sampling probability into a sparse term that scales linearly with the number of per-document instantiated topics Kd, and a dense term that uses the Alias method to reduce the time cost to constant 0(1) time. This will lead to a significant improvement on efficiency. Using stochastic optimization techniques, we further develop an online version of SHVG for streaming documents. Experimental results on corpora with a wide range of sizes demonstrate the efficiency and effectiveness of the proposed SHVG algorithm.
Author supplied keywords
Cite
CITATION STYLE
Li, X., Ouyang, J., & Zhou, X. (2016). Sparse hybrid variational-Gibbs algorithm for latent Dirichlet allocation. In 16th SIAM International Conference on Data Mining 2016, SDM 2016 (pp. 729–737). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974348.82
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.