A Distributed Topic Model for Large-Scale Streaming Text

Yicong Li; Dawei Feng; Menglong Lu; Dongsheng Li

Conference Proceedings

A Distributed Topic Model for Large-Scale Streaming Text

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11776 LNAI 37-48

DOI: 10.1007/978-3-030-29563-9_4

2Citations

2Readers

Get full text

Abstract

Learning topic information from large-scale unstructured text has attracted extensive attention from both the academia and industry. Topic models, such as LDA and its variants, are a popular machine learning technique to discover such latent structure. Among them, online variational hierarchical Dirichlet process (onlineHDP) is a promising candidate for dynamically processing streaming text. Instead of a static assignment in advance, the number of topics in onlineHDP is inferred from the corpus as the training process proceeds. However, when dealing with large scale streaming data it still suffers from the limited model capacity problem. To this end, we proposed a distributed version of the onlineHDP algorithm (named as DistHDP) in this paper, the training task is split into many sub-batch tasks and distributed across multiple worker nodes, such that the whole training process is accelerated. The model convergence is guaranteed through a distributed variation inference algorithm. Extensive experiments conducted on several real-world datasets demonstrate the usability and scalability of the proposed algorithm.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, Y., Feng, D., Lu, M., & Li, D. (2019). A Distributed Topic Model for Large-Scale Streaming Text. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11776 LNAI, pp. 37–48). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-29563-9_4

A Distributed Topic Model for Large-Scale Streaming Text

Abstract

Author supplied keywords

Cite

Register to see more suggestions