Analysis and Application of Normalization Methods with Supervised Feature Weighting to Improve K-means Accuracy

4Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Normalization methods are widely employed for transforming the variables or features of a given dataset. In this paper three classical feature normalization methods, Standardization (St), Min-Max (MM) and Median Absolute Deviation (MAD), are studied in different synthetic datasets from UCI repository. An exhaustive analysis of the transformed features’ ranges and their influence on the Euclidean distance is performed, concluding that knowledge about the group structure gathered by each feature is needed to select the best normalization method for a given dataset. In order to effectively collect the features’ importance and adjust their contribution, this paper proposes a two-stage methodology for normalization and supervised feature weighting based on a Pearson correlation coefficient and on a Random Forest Feature Importance estimation method. Simulations on five different datasets reveal that our two-stage proposed methodology, in terms of accuracy, outperforms or at least maintains the K-means performance obtained if only normalization is applied.

Cite

CITATION STYLE

APA

Niño-Adan, I., Landa-Torres, I., Portillo, E., & Manjarres, D. (2020). Analysis and Application of Normalization Methods with Supervised Feature Weighting to Improve K-means Accuracy. In Advances in Intelligent Systems and Computing (Vol. 950, pp. 14–24). Springer Verlag. https://doi.org/10.1007/978-3-030-20055-8_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free