A hierarchical algorithm for extreme clustering

76Citations
Citations of this article
139Readers
Mendeley users who have this article in their library.

Abstract

Many modern clustering methods scale well to a large number of data points, N, but not to a large number of clusters, K. This paper introduces PERCH, a new non-greedy, incremental algorithm for hierarchical clustering that scales to both massive N and K - a problem setting we term extreme clustering. Our algorithm efficiently routes new data points to the leaves of an incrementally-built tree. Motivated by the desire for both accuracy and speed, our approach performs tree rotations for the sake of enhancing subtree purity and encouraging balancedness. We prove that, under a natural separability assumption, our non-greedy algorithm will produce trees with perfect dendrogram purity regardless of data arrival order. Our experiments demonstrate that PERCH constructs more accurate trees than other tree-building clustering algorithms and scales well with both N and K, achieving a higher quality clustering than the strongest flat clustering competitor in nearly half the time.

Author supplied keywords

Cite

CITATION STYLE

APA

Kobren, A., Monath, N., Krishnamurthy, A., & McCallum, A. (2017). A hierarchical algorithm for extreme clustering. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Vol. Part F129685, pp. 255–264). Association for Computing Machinery. https://doi.org/10.1145/3097983.3098079

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free