How many topics? Stability analysis for topic models

Derek Greene; Derek O'Callaghan; Pádraig Cunningham

Conference ProceedingsOPEN ACCESS

How many topics? Stability analysis for topic models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8724 LNAI(PART 1) 498-513

DOI: 10.1007/978-3-662-44848-9_32

135Citations

191Readers

Abstract

Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus. Choosing too few topics will produce results that are overly broad, while choosing too many will result in the"over-clustering" of a corpus into many small, highly-similar topics. In this paper, we propose a term-centric stability analysis strategy to address this issue, the idea being that a model with an appropriate number of topics will be more robust to perturbations in the data. Using a topic modeling approach based on matrix factorization, evaluations performed on a range of corpora show that this strategy can successfully guide the model selection process. © 2014 Springer-Verlag.

Cite

CITATION STYLE

APA

Greene, D., O’Callaghan, D., & Cunningham, P. (2014). How many topics? Stability analysis for topic models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8724 LNAI, pp. 498–513). Springer Verlag. https://doi.org/10.1007/978-3-662-44848-9_32

How many topics? Stability analysis for topic models

Abstract

Cite

Register to see more suggestions