A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

38Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We consider a framework of sample-based clustering. In this setting, the input to a clustering algorithm is a sample generated i.i.d by some unknown arbitrary distribution. Based on such a sample, the algorithm has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clustering algorithms that approximate the optimal clustering. We show that the K-median clustering, as well as K-means and the Vector Quantization problems, satisfy these conditions. Our results apply to the combinatorial optimization setting where, assuming that sampling uniformly over an input set can be done in constant time, we get a sampling-based algorithm for the K-median and K-means clustering problems that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the dependence of the running time of our algorithm on the Euclidean dimension is only linear. Our main technical tool is a uniform convergence result for center based clustering that can be viewed as showing that the effective VC-dimension of k-center clustering equals k. © Springer Science + Business Media, LLC 2007.

Cite

CITATION STYLE

APA

Ben-David, S. (2007). A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering. Machine Learning, 66(2–3), 243–257. https://doi.org/10.1007/s10994-006-0587-3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free