Topic significance ranking of LDA generative models

Loulwah AlSumait; Daniel Barbará; James Gentle; Carlotta Domeniconi

Conference ProceedingsOPEN ACCESS

Topic significance ranking of LDA generative models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5781 LNAI(PART 1) 67-82

DOI: 10.1007/978-3-642-04180-8_22

112Citations

196Readers

Abstract

Topic models, like Latent Dirichlet Allocation (LDA), have been recently used to automatically generate text corpora topics, and to subdivide the corpus words among those topics. However, not all the estimated topics are of equal importance or correspond to genuine themes of the domain. Some of the topics can be a collection of irrelevant words, or represent insignificant themes. Current approaches to topic modeling perform manual examination to find meaningful topics. This paper presents the first automated unsupervised analysis of LDA models to identify junk topics from legitimate ones, and to rank the topic significance. Basically, the distance between a topic distribution and three definitions of "junk distribution" is computed using a variety of measures, from which an expressive figure of the topic significance is implemented using 4-phase Weighted Combination approach. Our experiments on synthetic and benchmark datasets show the effectiveness of the proposed approach in ranking the topic significance. © 2009 Springer.

Cite

CITATION STYLE

APA

AlSumait, L., Barbará, D., Gentle, J., & Domeniconi, C. (2009). Topic significance ranking of LDA generative models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5781 LNAI, pp. 67–82). https://doi.org/10.1007/978-3-642-04180-8_22

Topic significance ranking of LDA generative models

Abstract

Cite

Register to see more suggestions