K-means+++: Outliers-resistant clustering

13Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

The k-means problem is to compute a set of k centers (points) that minimizes the sum of squared distances to a given set of n points in a metric space. Arguably, the most common algorithm to solve it is k-means++ which is easy to implement and provides a provably small approximation error in time that is linear in n. We generalize k-means++ to support outliers in two sense (simultaneously): (i) nonmetric spaces, e.g., M-estimators, where the distance dist(p, x) between a point p and a center x is replaced by min {dist(p, x), c} for an appropriate constant c that may depend on the scale of the input. (ii) k-means clustering with m ≥ 1 outliers, i.e., where the m farthest points from any given k centers are excluded from the total sum of distances. This is by using a simple reduction to the (k + m)-means clustering (with no outliers).

Author supplied keywords

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Statman, A., Rozenberg, L., & Feldman, D. (2020). K-means+++: Outliers-resistant clustering. Algorithms, 13(12). https://doi.org/10.3390/a13120311

Readers' Seniority

Tooltip

Lecturer / Post doc 1

50%

PhD / Post grad / Masters / Doc 1

50%

Readers' Discipline

Tooltip

Mathematics 2

50%

Biochemistry, Genetics and Molecular Bi... 1

25%

Computer Science 1

25%

Save time finding and organizing research with Mendeley

Sign up for free