Variable Selection and Outlier Detection for Automated K-means Clustering

Sung-Soo Kim

Journal ArticleOPEN ACCESS

Variable Selection and Outlier Detection for Automated K-means Clustering

Kim S

Communications for Statistical Applications and Methods (2015) 22(1) 55-67

DOI: 10.5351/csam.2015.22.1.055

N/ACitations

24Readers

Abstract

An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying out-liers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/∼sskim/SVOKmeans.r.

Cite

CITATION STYLE

APA

Kim, S.-S. (2015). Variable Selection and Outlier Detection for Automated K-means Clustering. Communications for Statistical Applications and Methods, 22(1), 55–67. https://doi.org/10.5351/csam.2015.22.1.055

Variable Selection and Outlier Detection for Automated K-means Clustering

Abstract

Cite

Register to see more suggestions