Comparing LDA with pLSI as a dimensionality reduction method in document clustering

Tomonari Masada; Senya Kiyasu; Sueharu Miyahara

Conference Proceedings

Comparing LDA with pLSI as a dimensionality reduction method in document clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 4938 LNAI 13-26

DOI: 10.1007/978-3-540-78159-2_2

17Citations

27Readers

Get full text

Abstract

In this paper, we compare latent Dirichlet allocation (LDA) with probabilistic latent semantic indexing (pLSI) as a dimensionality reduction method and investigate their effectiveness in document clustering by using real-world document sets. For clustering of documents, we use a method based on multinomial mixture, which is known as an efficient framework for text mining. Clustering results are evaluated by F-measure, i.e., harmonic mean of precision and recall. We use Japanese and Korean Web articles for evaluation and regard the category assigned to each Web article as the ground truth for the evaluation of clustering results. Our experiment shows that the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors. Therefore, we can reduce the vector dimension without degrading cluster quality. Further, both LDA and pLSI are more effective than random projection, the baseline method in our experiment. However, our experiment provides no meaningful difference between LDA and pLSI. This result suggests that LDA does not replace pLSI at least for dimensionality reduction in document clustering. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Masada, T., Kiyasu, S., & Miyahara, S. (2008). Comparing LDA with pLSI as a dimensionality reduction method in document clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4938 LNAI, pp. 13–26). https://doi.org/10.1007/978-3-540-78159-2_2

Comparing LDA with pLSI as a dimensionality reduction method in document clustering

Abstract

Cite

Register to see more suggestions