Analysis of k-means++ for separable data

17Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

k-means++ [5] seeding procedure is a simple sampling based algorithm that is used to quickly find k centers which may then be used to start the Lloyd's method. There has been some progress recently on understanding this sampling algorithm. Ostrovsky et al. [10] showed that if the data satisfies the separation condition that (Δ k-1P)/Δ k(P) ≥ c (Δ i(P) is the optimal cost w.r.t. i centers, c > 1 is a constant, and P is the point set), then the sampling algorithm gives an O(1)-approximation for the k-means problem with probability that is exponentially small in k. Here, the distance measure is the squared Euclidean distance. Ackermann and Blömer [2] showed the same result when the distance measure is any μ-similar Bregman divergence. Arthur and Vassilvitskii [5] showed that the k-means++ seeding gives an O(log k) approximation in expectation for the k-means problem. They also give an instance where k-means++ seeding gives Ω(log k) approximation in expectation. However, it was unresolved whether the seeding procedure gives an O(1) approximation with probability Ω(1/poly(k)), even when the data satisfies the above-mentioned separation condition. Brunsch and Röglin [8] addressed this question and gave an instances on which k-means++ achieves an approximation ratio of (2/3-ε)·log k only with exponentially small probability. However, the instances that they give satisfy Δ k-1(P) /Δ k(P) = 1 + o(1). In this work, we show that the sampling algorithm gives an O(1) approximation with probability Ω(1/k) for any k-means problem instance where the point set satisfy separation condition Δ k-1(P)/Δ k(P) ≥ 1 + γ, for some fixed constant γ. Our results hold for any distance measure that is a metric in an approximate sense. For point sets that do not satisfy the above separation condition, we show O(1) approximation with probability Ω(2 -2k). © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Jaiswal, R., & Garg, N. (2012). Analysis of k-means++ for separable data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7408 LNCS, pp. 591–602). https://doi.org/10.1007/978-3-642-32512-0_50

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free