Hierarchical method for determining the number of clusters

50Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

A fundamental and difficult problem in cluster analysis is the determination of the 'true' number of clusters in a dataset. The common trail-and-error method generally depends on certain clustering algorithms and is inefficient when processing large datasets. In this paper, a hierarchical method is proposed to get rid of repeatedly clustering on large datasets. The method firstly obtains the CF (clustering feature) via scanning the dataset and agglomerative generates the hierarchical partitions of dataset, then a curve of the clustering quality w. r. t the varying partitions is incrementally constructed. The partitions corresponding to the extremum of the curve is used to estimate the number of clusters finally. A new validity index is also presented to quantify the clustering quality, which is independent of clustering algorithm and emphasis on the geometric features of clusters, handling efficiently the noisy data and arbitrary shaped clusters. Experimental results on both real world and synthesis datasets demonstrate that the new method outperforms the recently published approaches, while the efficiency is significantly improved.

Cite

CITATION STYLE

APA

Chen, L. F., Jiang, Q. S., & Wang, S. R. (2008). Hierarchical method for determining the number of clusters. Ruan Jian Xue Bao/Journal of Software, 19(1), 62–72. https://doi.org/10.3724/SP.J.1001.2008.00062

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free