An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering

7Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper, the notion of a well-clusterable data set is defined combining the point of view of the objective of k-means clustering algorithm (minimizing the centric spread of data elements) and common sense (clusters shall be separated by gaps). Conditions are identified under which the optimum of k-means objective coincides with a clustering under which the data is separated by predefined gaps. Two cases are investigated: when the whole clusters are separated by some gap and when only the cores of the clusters meet some separation condition. A major obstacle for using known clusterability criteria is their reference to the optimal clustering which is NP hard to identify. In this paper, this obstacle is overcome. Compared to other approaches to clusterability, the novelty consists in the possibility of an a posteriori (after running k-means) check if the data set is well-clusterable or not. As the k-means algorithm applied for this purpose has polynomial complexity so does therefore the appropriate check. Additionally, if k-means++ fails to identify a clustering that meets clusterability criteria, with high probability the data is not well-clusterable.

Cite

CITATION STYLE

APA

Kłopotek, M. A. (2020). An Aposteriorical Clusterability Criterion for k-Means++ and Simplicity of Clustering. SN Computer Science, 1(2). https://doi.org/10.1007/s42979-020-0079-8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free