Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

Erich Schubert; Peter J. Rousseeuw

Conference Proceedings

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11807 LNCS 171-187

DOI: 10.1007/978-3-030-32047-8_16

170Citations

194Readers

Get full text

Abstract

Clustering non-Euclidean data is difficult, and one of the most used algorithms besides hierarchical clustering is the popular algorithm Partitioning Around Medoids (PAM), also simply referred to as k-medoids. In Euclidean geometry the mean—as used in k-means—is a good estimator for the cluster center, but this does not exist for arbitrary dissimilarities. PAM uses the medoid instead, the object with the smallest dissimilarity to all others in the cluster. This notion of centrality can be used with any (dis-)similarity, and thus is of high relevance to many domains and applications. A key issue with PAM is its high run time cost. We propose modifications to the PAM algorithm that achieve an O(k)-fold speedup in the second (“SWAP”) phase of the algorithm, but will still find the same results as the original PAM algorithm. If we slightly relax the choice of swaps performed (while retaining comparable quality), we can further accelerate the algorithm by performing up to k swaps in each iteration. With the substantially faster SWAP, we can now explore faster intialization strategies. We also show how the CLARA and CLARANS algorithms benefit from the proposed modifications.

Author supplied keywords

Cite

CITATION STYLE

APA

Schubert, E., & Rousseeuw, P. J. (2019). Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11807 LNCS, pp. 171–187). Springer. https://doi.org/10.1007/978-3-030-32047-8_16

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

Abstract

Author supplied keywords

Cite

Register to see more suggestions