Effect of data standardization on the result of k-means clustering

9Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In applying clustering to multivariate data, in which there are some large-scale variables, clustering results depend on the variables more than the user's needs. In such cases, we should standardize the data to control the dependency. For high-dimensional data, Doherty et al. (Appl Soft Comput 7:203-210, 2007) argued numerically that data standardization by variable range leads to almost the same results regardless of the kinds of norms, although Aggarwal et al. (Lect Notes Comput Sci 1973:420-434, 2001) showed theoretically that a fraction norm reduces the effect of the curse of high dimensionality for k-means result more than the Euclidean norm does. However, they have not considered the effects of standardization and factors properly. In this paper, we verify the effects of six data standardization methods with various norms and examine factors that affect the clustering results for high-dimensional data. As a result, we show that data standardization with the fraction norm reduces the effect of the curse of high dimensionality and gives a more effective result than data standardization with the Euclidean norm and not applying data standardization with the fraction norm. © 2012 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Tanioka, K., & Yadohisa, H. (2012). Effect of data standardization on the result of k-means clustering. In Studies in Classification, Data Analysis, and Knowledge Organization (pp. 59–67). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-642-24466-7_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free