Spectral analysis of text collection for similarity-based clustering

Wenyuan Li; Wee Keong Ng; Ee Peng Lim

Conference Proceedings

Spectral analysis of text collection for similarity-based clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3056 389-393

DOI: 10.1007/978-3-540-24775-3_47

3Citations

2Readers

Get full text

Abstract

Clustering of natural text collections is generally difficult due to the high dimensionality, heterogeneity, and large size of text collections. These characteristics compound the problem of determining the appropriate similarity space for clustering algorithms. In this paper, we propose to use the spectral analysis of the similarity space of a text collection to predict clustering behavior before actual clustering is performed. Spectral analysis is a technique that has been adopted across different domains to analyze the key encoding information of a system. Spectral analysis for prediction is useful in first determining the quality of the similarity space and discovering any possible problems the selected feature set may present. Our experiments showed that such insights can be obtained by analyzing the spectrum of the similarity matrix of a text collection. We showed that spectrum analysis can be used to estimate the number of clusters in advance.

Cite

CITATION STYLE

APA

Li, W., Ng, W. K., & Lim, E. P. (2004). Spectral analysis of text collection for similarity-based clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3056, pp. 389–393). Springer Verlag. https://doi.org/10.1007/978-3-540-24775-3_47

Spectral analysis of text collection for similarity-based clustering

Abstract

Cite

Register to see more suggestions