A framework for statistical clustering with a constant time approximation algorithms for K-median clustering

Shai Ben-David

Conference Proceedings

A framework for statistical clustering with a constant time approximation algorithms for K-median clustering

Ben-David S

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2004) 3120 415-426

DOI: 10.1007/978-3-540-27819-1_29

14Citations

12Readers

Get full text

Abstract

We consider a framework in which the clustering algorithm gets as input a sample generated i.i.d by some unknown arbitrary distribution, and has to output a clustering of the full domain set, that is evaluated with respect to the underlying distribution. We provide general conditions on clustering problems that imply the existence of sampling based clusterings that approximate the optimal clustering. We show that the K-median clustering, as well as the Vector Quantization problem, satisfy these conditions. In particular our results apply to the sampling-based approximate clustering scenario. As a corollary, we get a sampling-based algorithm for the K-median clustering problem that finds an almost optimal set of centers in time depending only on the confidence and accuracy parameters of the approximation, but independent of the input size. Furthermore, in the Euclidean input case, the running time of our algorithm is independent of the Euclidean dimension.

Cite

CITATION STYLE

APA

Ben-David, S. (2004). A framework for statistical clustering with a constant time approximation algorithms for K-median clustering. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3120, pp. 415–426). Springer Verlag. https://doi.org/10.1007/978-3-540-27819-1_29

A framework for statistical clustering with a constant time approximation algorithms for K-median clustering

Abstract

Cite

Register to see more suggestions