Analysis of k-means++ for separable data

Ragesh Jaiswal; Nitin Garg

Conference Proceedings

Analysis of k-means++ for separable data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7408 LNCS 591-602

DOI: 10.1007/978-3-642-32512-0_50

17Citations

8Readers

Get full text

Abstract

k-means++ [5] seeding procedure is a simple sampling based algorithm that is used to quickly find k centers which may then be used to start the Lloyd's method. There has been some progress recently on understanding this sampling algorithm. Ostrovsky et al. [10] showed that if the data satisfies the separation condition that (Δ k-1P)/Δ k(P) ≥ c (Δ i(P) is the optimal cost w.r.t. i centers, c > 1 is a constant, and P is the point set), then the sampling algorithm gives an O(1)-approximation for the k-means problem with probability that is exponentially small in k. Here, the distance measure is the squared Euclidean distance. Ackermann and Blömer [2] showed the same result when the distance measure is any μ-similar Bregman divergence. Arthur and Vassilvitskii [5] showed that the k-means++ seeding gives an O(log k) approximation in expectation for the k-means problem. They also give an instance where k-means++ seeding gives Ω(log k) approximation in expectation. However, it was unresolved whether the seeding procedure gives an O(1) approximation with probability Ω(1/poly(k)), even when the data satisfies the above-mentioned separation condition. Brunsch and Röglin [8] addressed this question and gave an instances on which k-means++ achieves an approximation ratio of (2/3-ε)·log k only with exponentially small probability. However, the instances that they give satisfy Δ k-1(P) /Δ k(P) = 1 + o(1). In this work, we show that the sampling algorithm gives an O(1) approximation with probability Ω(1/k) for any k-means problem instance where the point set satisfy separation condition Δ k-1(P)/Δ k(P) ≥ 1 + γ, for some fixed constant γ. Our results hold for any distance measure that is a metric in an approximate sense. For point sets that do not satisfy the above separation condition, we show O(1) approximation with probability Ω(2 -2k). © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Jaiswal, R., & Garg, N. (2012). Analysis of k-means++ for separable data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7408 LNCS, pp. 591–602). https://doi.org/10.1007/978-3-642-32512-0_50

Analysis of k-means++ for separable data

Abstract

Cite

Register to see more suggestions