Accelerating lloyd's algorithm for k–means clustering

91Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The k–means clustering algorithm, a staple of data mining and unsupervised learning, is popular because it is simple to implement, fast, easily parallelized, and offers intuitive results. Lloyd's algorithm is the standard batch, hill–climbing approach for minimizing the k–means optimization criterion. It spends a vast majority of its time computing distances between each of the k cluster centers and the n data points. It turns out that much of this work is unnecessary, because points usually stay in the same clusters after the first few iterations. In the last decade researchers have developed a number of optimizations to speed up Lloyd’s algorithm for both low–and high–dimensional data.In this chapter we survey some of these optimizations and present new ones. In particular we focus on those which avoid distance calculations by the triangle inequality. By caching known distances and updating them efficiently with the triangle inequality, these algorithms can provably avoid many unnecessary distance calculations. All the optimizations examined produce the same results as Lloyd's algorithm given the same input and initialization, so are suitable as drop–in replacements. These new algorithms can run many times faster and compute far fewer distances than the standard unoptimized implementation. In our experiments, it is common to see speedups of over 30–50x compared to Lloyd's algorithm. We examine the trade–offs for using these methods with respect to the number of examples n, dimensions d, clusters k, and structure of the data.

Cite

CITATION STYLE

APA

Hamerly, G., & Drake, J. (2015). Accelerating lloyd’s algorithm for k–means clustering. In Partitional Clustering Algorithms (pp. 41–78). Springer International Publishing. https://doi.org/10.1007/978-3-319-09259-1_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free