Hash-based stream LDA: Topic modeling in social streams

Anton Slutsky; Xiaohua Hu; Yuan An

Conference Proceedings

Hash-based stream LDA: Topic modeling in social streams

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8443 LNAI(PART 1) 151-162

DOI: 10.1007/978-3-319-06608-0_13

3Citations

10Readers

Get full text

Abstract

We study the problem of topic modeling in continuous social media streams and propose a new generative probabilistic model called Hash-Based Stream LDA (HS-LDA), which is a generalization of the popular LDA approach. The model differs from LDA in that it exposes facilities to include inter-document similarity in topic modeling. The corresponding inference algorithm outlined in the paper relies on efficient estimation of document similarity with Locality Sensitive Hashing to retain the knowledge of past social discourse in a scalable way. The historical knowledge of previous messages is used in inference to improve quality of topic discovery. Performance of the new algorithm was evaluated against classical LDA approach as well as the stream-oriented On-line LDA and SparseLDA using data sets collected from the Twitter microblog system and an IRC chat community. Experimental results showed that HS-LDA outperformed other techniques by more than 12% for the Twitter dataset and by 21% for the IRC data in terms of average perplexity. © 2014 Springer International Publishing.

Cite

CITATION STYLE

APA

Slutsky, A., Hu, X., & An, Y. (2014). Hash-based stream LDA: Topic modeling in social streams. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8443 LNAI, pp. 151–162). Springer Verlag. https://doi.org/10.1007/978-3-319-06608-0_13

Hash-based stream LDA: Topic modeling in social streams

Abstract

Cite

Register to see more suggestions