Applying word co-occurrence graph in enhancing LDA model for topic discovering in large-scaled text corpus

Phu Pham; Phuc Do

Journal ArticleOPEN ACCESS

Applying word co-occurrence graph in enhancing LDA model for topic discovering in large-scaled text corpus

International Journal of Recent Technology and Engineering (2019) 8(2 Special Issue 8) 1366-1371

DOI: 10.35940/ijrte.B1068.0882S819

0Citations

5Readers

Get full text

Abstract

Topic modeling, such as LDA is considered as a useful tool for the statistical analysis of text document collections and other text-based data. Recently, topic modeling becomes an attractive researching field due to its wide applications. However, there are remained disadvantages of traditional topic modeling like as LDA due the shortcoming of bag-of-words (BOW) model as well as low-performance in handle large text corpus. Therefore, in this paper, we present a novel approach of topic model, called LDA-GOW, which is the combination of word co-occurrence, also called: graph-of-words (GOW) model and traditional LDA topic discovering model. The LDA-GOW topic model not only enable to extract more informative topics from text but also be able to leverage the topic discovering process from large-scaled text corpus. We test our proposed model in comparing with the traditional LDA topic model, within several standardized datasets, include: WebKB, Reuters-R8 and annotated scientific documents which are collected from ACM digital library to demonstrate the effectiveness of our proposed model. For overall experiments, our proposed LDA-GOW model gains approximately 70.86% in accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Pham, P., & Do, P. (2019). Applying word co-occurrence graph in enhancing LDA model for topic discovering in large-scaled text corpus. International Journal of Recent Technology and Engineering, 8(2 Special Issue 8), 1366–1371. https://doi.org/10.35940/ijrte.B1068.0882S819

Applying word co-occurrence graph in enhancing LDA model for topic discovering in large-scaled text corpus

Abstract

Author supplied keywords

Cite

Register to see more suggestions