Variable Selection and Outlier Detection for Automated K-means Clustering

  • Kim S
N/ACitations
Citations of this article
24Readers
Mendeley users who have this article in their library.

Abstract

An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying out-liers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/∼sskim/SVOKmeans.r.

Cite

CITATION STYLE

APA

Kim, S.-S. (2015). Variable Selection and Outlier Detection for Automated K-means Clustering. Communications for Statistical Applications and Methods, 22(1), 55–67. https://doi.org/10.5351/csam.2015.22.1.055

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free