A k-median algorithm with running time independent of data size

45Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

Abstract

We give a sampling-based algorithm for the k-Median problem, with running time O(k(k 2/ε log k) 2 log (k/ε log k)), where k is the desired number of clusters and ε is a confidence parameter. This is the first k-Median algorithm with fully polynomial running time that is independent of n, the size of the data set. It gives a solution that is, with high probability, an O(1)-approximation, if each cluster in some optimal solution has Ω(nε/k) points. We also give weakly-polynomial-time algorithms for this problem and a relaxed version of k-Median in which a small fraction of outliers can be excluded. We give near-matching lower bounds showing that this assumption about cluster size is necessary. We also present a related algorithm for finding a clustering that excludes a small number of outliers.

Author supplied keywords

Cite

CITATION STYLE

APA

Meyerson, A., O’Callaghan, L., & Plotkin, S. (2004). A k-median algorithm with running time independent of data size. Machine Learning, 56(1–3), 61–87. https://doi.org/10.1023/B:MACH.0000033115.78247.f0

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free