An Extensive Empirical Comparison of k-means Initialization Algorithms

5Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The k-means clustering algorithm, whilst widely popular, is not without its drawbacks. In this paper, we focus on the sensitivity of k-means to its initial set of centroids. Since the cluster recovery performance of k-means can be improved by better initialisation, numerous algorithms have been proposed aiming at producing good initial centroids. However, it is still unclear which algorithm should be used in any particular clustering scenario. With this in mind, we compare 17 such algorithms on 6,000 synthetic and 28 real-world data sets. The synthetic data sets were produced under different configurations, allowing us to show which algorithm excels in each scenario. Hence, the results of our experiments can be particularly useful for those considering k-means for a non-trivial clustering scenario.

Cite

CITATION STYLE

APA

Harris, S., & De Amorim, R. C. (2022). An Extensive Empirical Comparison of k-means Initialization Algorithms. IEEE Access, 10, 58752–58768. https://doi.org/10.1109/ACCESS.2022.3179803

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free