An optimized K-means algorithm of reducing cluster intra-dissimilarity for document clustering

2Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Due to the high-dimension and sparseness properties of documents, clustering the similar documents together is a tough task. The most popular document clustering method K-Means has the shortcoming of its cluster intra-dissimilarity, i.e. inclining to clustering unrelated documents together. One of the reasons is that all objects (documents) in a cluster produce the same influence to the mean of the cluster. SOM (Self Organizing Map) is a method to reduce the dimension of data and display the data in low dimension space, and it has been applied successfully to clustering of high-dimensional objects. The scalar factor is an important part of SOM. In this paper, an optimized K-Means algorithm is proposed. It introduces the scalar factor from SOM into means during K-Means assignment stage for controlling the influence to the means from new objects. Experiments show that the optimized K-Means algorithm has more F-Measure and less Entropy of clustering than standard K-Means algorithm, thereby reduces the intra-dissimilarity of clusters effectively. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Wang, D., Yu, G., Bao, Y., & Zhang, M. (2005). An optimized K-means algorithm of reducing cluster intra-dissimilarity for document clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3739 LNCS, pp. 785–790). Springer Verlag. https://doi.org/10.1007/11563952_81

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free