Spectral analysis of text collection for similarity-based clustering

3Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Clustering of natural text collections is generally difficult due to the high dimensionality, heterogeneity, and large size of text collections. These characteristics compound the problem of determining the appropriate similarity space for clustering algorithms. In this paper, we propose to use the spectral analysis of the similarity space of a text collection to predict clustering behavior before actual clustering is performed. Spectral analysis is a technique that has been adopted across different domains to analyze the key encoding information of a system. Spectral analysis for prediction is useful in first determining the quality of the similarity space and discovering any possible problems the selected feature set may present. Our experiments showed that such insights can be obtained by analyzing the spectrum of the similarity matrix of a text collection. We showed that spectrum analysis can be used to estimate the number of clusters in advance.

Cite

CITATION STYLE

APA

Li, W., Ng, W. K., & Lim, E. P. (2004). Spectral analysis of text collection for similarity-based clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3056, pp. 389–393). Springer Verlag. https://doi.org/10.1007/978-3-540-24775-3_47

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free