Accelerating topic model training on a single machine

Mian Lu; Ge Bai; Qiong Luo; Jie Tang; Jiuxin Zhao

Conference Proceedings

Accelerating topic model training on a single machine

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7808 LNCS 184-195

DOI: 10.1007/978-3-642-37401-2_20

10Citations

11Readers

Get full text

Abstract

We present the design and implementation of GLDA, a library that utilizes the GPU (Graphics Processing Unit) to perform Gibbs sampling of Latent Dirichlet Allocation (LDA) on a single machine. LDA is an effective topic model used in many applications, e.g., classification, feature selection, and information retrieval. However, training an LDA model on large data sets takes hours, even days, due to the heavy computation and intensive memory access. Therefore, we explore the use of the GPU to accelerate LDA training on a single machine. Specifically, we propose three memory-efficient techniques to handle large data sets on the GPU: (1) generating document-topic counts as needed instead of storing all of them, (2) adopting a compact storage scheme for sparse matrices, and (3) partitioning word tokens. Through these techniques, the LDA training which would take 10 GB memory originally, can be performed on a commodity GPU card with only 1 GB GPU memory. Furthermore, our GLDA achieves a speedup of 15X over the original CPU-based LDA for large data sets. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Lu, M., Bai, G., Luo, Q., Tang, J., & Zhao, J. (2013). Accelerating topic model training on a single machine. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7808 LNCS, pp. 184–195). https://doi.org/10.1007/978-3-642-37401-2_20

Accelerating topic model training on a single machine

Abstract

Cite

Register to see more suggestions