We consider efficient clustering algorithm under data clusterability assumptions with added noise. In contrast with most literature on this topic that considers either the adversarial noise setting or some noise generative model, we examine a realistically motivated setting in which the only restriction about the noisy part of the data is that it does not create significantly large “clusters”. Another aspect in which our model deviates from common approaches is that we stipulate the goals of clustering as discovering meaningful cluster structure in the data, rather than optimizing some objective (clustering cost). We introduce efficient algorithms that discover and cluster every subset of the data with meaningful structure and lack of structure on its complement (under some formal definition of such “structure”). Notably, the success of our algorithms does not depend on any upper bound on the fraction of noisy data. We complement our results by showing that when either the notions of structure or the noise requirements are relaxed, no such results are possible.
CITATION STYLE
Kushagra, S., Samadi, S., & Ben-David, S. (2016). Finding meaningful cluster structure amidst background noise. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9925 LNAI, pp. 339–354). Springer Verlag. https://doi.org/10.1007/978-3-319-46379-7_23
Mendeley helps you to discover research relevant for your work.