A framework for statistical clustering with a constant time approximation algorithms for K-median clustering

14Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We consider a framework in which the clustering algorithm gets as input a sample generated i.i.d by some unknown arbitrary distribution, and has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clusterings that approximate the optimal clustering. We show that the K-median clustering, as well as the Vector Quantization problem, satisfy these conditions. In particular our results apply to the sampling-based approximate clustering scenario. As a corollary, we get a sampling-based algorithm for the K-median clustering problem that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the running time of our algorithm is independent of the Euclidean dimension.

Cite

CITATION STYLE

APA

Ben-David, S. (2004). A framework for statistical clustering with a constant time approximation algorithms for K-median clustering. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3120, pp. 415–426). Springer Verlag. https://doi.org/10.1007/978-3-540-27819-1_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free