Efficient hierarchical clustering algorithms using partially overlapping partitions

Manoranjan Dash; Huan Liu

Conference Proceedings

Efficient hierarchical clustering algorithms using partially overlapping partitions

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2001) 2035 495-506

DOI: 10.1007/3-540-45357-1_52

4Citations

3Readers

Get full text

Abstract

Clustering is an important data exploration task. A promi- nent clustering algorithm is agglomerative hierarchical clustering. Roughly, in each iteration, it merges the closest pair of clusters. It was first proposed way back in 1951, and since then there have been numer- ous modifications. Some of its good features are: a natural, simple, and non-parametric grouping of similar objects which is capable of finding clusters of different shape such as spherical and arbitrary. But large CPU time and high memory requirement limit its use for large data. In this paper we show that geometric metric (centroid, median, and minimum variance) algorithms obey a 90-10 relationship where roughly the first 90iterations are spent on merging clusters with distance less than 10the maximum merging distance. This characteristic is exploited by partially overlapping partitioning. It is shown with experiments and analyses that different types of existing algorithms benefit excellently by drastically reducing CPU time and memory. Other contributions of this paper in- clude comparison study of multi-dimensional vis-a-vis single-dimensional partitioning, and analytical and experimental discussions on setting of parameters such as number of partitions and dimensions for partitioning.

Cite

CITATION STYLE

APA

Dash, M., & Liu, H. (2001). Efficient hierarchical clustering algorithms using partially overlapping partitions. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2035, pp. 495–506). Springer Verlag. https://doi.org/10.1007/3-540-45357-1_52

Efficient hierarchical clustering algorithms using partially overlapping partitions

Abstract

Cite

Register to see more suggestions