A Distributed Topic Model for Large-Scale Streaming Text

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Learning topic information from large-scale unstructured text has attracted extensive attention from both the academia and industry. Topic models, such as LDA and its variants, are a popular machine learning technique to discover such latent structure. Among them, online variational hierarchical Dirichlet process (onlineHDP) is a promising candidate for dynamically processing streaming text. Instead of a static assignment in advance, the number of topics in onlineHDP is inferred from the corpus as the training process proceeds. However, when dealing with large scale streaming data it still suffers from the limited model capacity problem. To this end, we proposed a distributed version of the onlineHDP algorithm (named as DistHDP) in this paper, the training task is split into many sub-batch tasks and distributed across multiple worker nodes, such that the whole training process is accelerated. The model convergence is guaranteed through a distributed variation inference algorithm. Extensive experiments conducted on several real-world datasets demonstrate the usability and scalability of the proposed algorithm.

Cite

CITATION STYLE

APA

Li, Y., Feng, D., Lu, M., & Li, D. (2019). A Distributed Topic Model for Large-Scale Streaming Text. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11776 LNAI, pp. 37–48). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-29563-9_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free