An optimized K-means algorithm of reducing cluster intra-dissimilarity for document clustering

Daling Wang; Ge Yu; Yubin Bao; Meng Zhang

Conference Proceedings

An optimized K-means algorithm of reducing cluster intra-dissimilarity for document clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3739 LNCS 785-790

DOI: 10.1007/11563952_81

2Citations

10Readers

Get full text

Abstract

Due to the high-dimension and sparseness properties of documents, clustering the similar documents together is a tough task. The most popular document clustering method K-Means has the shortcoming of its cluster intra-dissimilarity, i.e. inclining to clustering unrelated documents together. One of the reasons is that all objects (documents) in a cluster produce the same influence to the mean of the cluster. SOM (Self Organizing Map) is a method to reduce the dimension of data and display the data in low dimension space, and it has been applied successfully to clustering of high-dimensional objects. The scalar factor is an important part of SOM. In this paper, an optimized K-Means algorithm is proposed. It introduces the scalar factor from SOM into means during K-Means assignment stage for controlling the influence to the means from new objects. Experiments show that the optimized K-Means algorithm has more F-Measure and less Entropy of clustering than standard K-Means algorithm, thereby reduces the intra-dissimilarity of clusters effectively. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Wang, D., Yu, G., Bao, Y., & Zhang, M. (2005). An optimized K-means algorithm of reducing cluster intra-dissimilarity for document clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3739 LNCS, pp. 785–790). Springer Verlag. https://doi.org/10.1007/11563952_81

An optimized K-means algorithm of reducing cluster intra-dissimilarity for document clustering

Abstract

Cite

Register to see more suggestions