How much can k-means be improved by using better initialization and repeats?

Pasi Fränti; Sami Sieranoja

Journal ArticleOPEN ACCESS

How much can k-means be improved by using better initialization and repeats?

Pattern Recognition (2019) 93 95-112

DOI: 10.1016/j.patcog.2019.04.014

290Citations

329Readers

Abstract

In this paper, we study what are the most important factors that deteriorate the performance of the k-means algorithm, and how much this deterioration can be overcome either by using a better initialization technique, or by repeating (restarting) the algorithm. Our main finding is that when the clusters overlap, k-means can be significantly improved using these two tricks. Simple furthest point heuristic (Maxmin) reduces the number of erroneous clusters from 15% to 6%, on average, with our clustering benchmark. Repeating the algorithm 100 times reduces it further down to 1%. This accuracy is more than enough for most pattern recognition applications. However, when the data has well separated clusters, the performance of k-means depends completely on the goodness of the initialization. Therefore, if high clustering accuracy is needed, a better algorithm should be used instead.

Author supplied keywords

Cite

CITATION STYLE

APA

Fränti, P., & Sieranoja, S. (2019). How much can k-means be improved by using better initialization and repeats? Pattern Recognition, 93, 95–112. https://doi.org/10.1016/j.patcog.2019.04.014

How much can k-means be improved by using better initialization and repeats?

Abstract

Author supplied keywords

Cite

Register to see more suggestions