Robust and sparse k-means clustering for high-dimensional data

Šárka Brodinová; Peter Filzmoser; Thomas Ortner; Christian Breiteneder; Maia Rohm

Journal ArticleOPEN ACCESS

Robust and sparse k-means clustering for high-dimensional data

Advances in Data Analysis and Classification (2019) 13(4) 905-932

DOI: 10.1007/s11634-019-00356-9

40Citations

78Readers

Abstract

In real-world application scenarios, the identification of groups poses a significant challenge due to possibly occurring outliers and existing noise variables. Therefore, there is a need for a clustering method which is capable of revealing the group structure in data containing both outliers and noise variables without any pre-knowledge. In this paper, we propose a k-means-based algorithm incorporating a weighting function which leads to an automatic weight assignment for each observation. In order to cope with noise variables, a lasso-type penalty is used in an objective function adjusted by observation weights. We finally introduce a framework for selecting both the number of clusters and variables based on a modified gap statistic. The conducted experiments on simulated and real-world data demonstrate the advantage of the method to identify groups, outliers, and informative variables simultaneously.

Author supplied keywords

Cite

CITATION STYLE

APA

Brodinová, Š., Filzmoser, P., Ortner, T., Breiteneder, C., & Rohm, M. (2019). Robust and sparse k-means clustering for high-dimensional data. Advances in Data Analysis and Classification, 13(4), 905–932. https://doi.org/10.1007/s11634-019-00356-9

Robust and sparse k-means clustering for high-dimensional data

Abstract

Author supplied keywords

Cite

Register to see more suggestions