A k-median algorithm with running time independent of data size

Adam Meyerson; Liadan O'Callaghan; Serge Plotkin

Journal ArticleOPEN ACCESS

A k-median algorithm with running time independent of data size

Machine Learning (2004) 56(1-3) 61-87

DOI: 10.1023/B:MACH.0000033115.78247.f0

45Citations

27Readers

Abstract

We give a sampling-based algorithm for the k-Median problem, with running time O(k(k 2/ε log k) 2 log (k/ε log k)), where k is the desired number of clusters and ε is a confidence parameter. This is the first k-Median algorithm with fully polynomial running time that is independent of n, the size of the data set. It gives a solution that is, with high probability, an O(1)-approximation, if each cluster in some optimal solution has Ω(nε/k) points. We also give weakly-polynomial-time algorithms for this problem and a relaxed version of k-Median in which a small fraction of outliers can be excluded. We give near-matching lower bounds showing that this assumption about cluster size is necessary. We also present a related algorithm for finding a clustering that excludes a small number of outliers.

Author supplied keywords

Cite

CITATION STYLE

APA

Meyerson, A., O’Callaghan, L., & Plotkin, S. (2004). A k-median algorithm with running time independent of data size. Machine Learning, 56(1–3), 61–87. https://doi.org/10.1023/B:MACH.0000033115.78247.f0

A k-median algorithm with running time independent of data size

Abstract

Author supplied keywords

Cite

Register to see more suggestions