On the performance of feature weighting K-means for text subspace clustering

8Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Text clustering is an effective way of not only organizing textual information, but discovering interesting patterns. Most existing methods, however, suffer from two main drawbacks; they cannot provide an understandable representation for text clusters, and cannot scale to very large text collections. Highly scalable text clustering algorithms are becoming increasingly relevant. In this paper, we present a performance study of a new subspace clustering algorithm for large sparse text data. This algorithm automatically calculates the feature weights in the k-means clustering process. The feature weights are used to discover clusters from subspaces of the text vector space and identify terms that represent the semantics of the clusters. A series of experiments have been conducted to test the performance of the algorithm, including resource consumption and clustering quality. The experimental results on real-world text data have shown that our algorithm quickly converges to a local optimal solution and is scalable to the number of documents, terms and the number of clusters. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Jing, L., Ng, M. K., Xu, J., & Huang, J. Z. (2005). On the performance of feature weighting K-means for text subspace clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3739 LNCS, pp. 502–512). https://doi.org/10.1007/11563952_44

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free