Enriching text representation with frequent pattern mining for probabilistic topic modeling

Hyun Duk Kim; Dae Hoon Park; Yue Lu; Cheng Xiang Zhai

Journal ArticleOPEN ACCESS

Enriching text representation with frequent pattern mining for probabilistic topic modeling

Proceedings of the ASIST Annual Meeting (2012) 49(1) 1-10

DOI: 10.1002/meet.14504901209

N/ACitations

41Readers

Abstract

Probabilistic topic models have been proven very useful for many text mining tasks. Although many variants of topic models have been proposed, most existing works are based on the bag-of-words representation of text in which word combination and order are generally ignored, resulting in inaccurate semantic representation of text. In this paper, we propose a general way to go beyond the bag-of-words representation for topic modeling by applying frequent pattern mining to discover frequent word patterns that can capture semantic associations between words and then using them as additional supplementary semantic units to augment the conventional bag-of-words representation. By viewing a topic model as a generative model for such augmented text data, we can go beyond the bag-of-words assumption to potentially capture more semantic associations between words. Since efficient algorithms for mining frequent word patterns are available, this general strategy for improving topic models can be applied to improve any topic models without substantially increasing the computational complexity of the model. Experiment results show that such a frequent pattern-based data enrichment approach can improve over two representative existing probabilistic topic models for the classification task. We also studied variations of frequent pattern usage in topic modeling and found that using compressed and closed patterns performs best.

Author supplied keywords

Cite

CITATION STYLE

APA

Kim, H. D., Park, D. H., Lu, Y., & Zhai, C. X. (2012). Enriching text representation with frequent pattern mining for probabilistic topic modeling. Proceedings of the ASIST Annual Meeting, 49(1), 1–10. https://doi.org/10.1002/meet.14504901209

Enriching text representation with frequent pattern mining for probabilistic topic modeling

Abstract

Author supplied keywords

Cite

Register to see more suggestions