Smote-cov: A new oversampling method based on the covariance matrix

Ireimis Leguen-deVarona; Julio Madera; Yoan Martínez-López; José Carlos Hernández-Nieto

Conference Proceedings

Smote-cov: A new oversampling method based on the covariance matrix

EAI/Springer Innovations in Communication and Computing (2020) 207-215

DOI: 10.1007/978-3-030-48149-0_15

3Citations

3Readers

Get full text

Abstract

Nowadays, many machine learning tasks involve learning from imbalanced datasets, leading to misclassification of the minority class. One of the state-of-the-art approaches to “solve” this problem at the data level is the Synthetic Minority Oversampling Technique (SMOTE), which in turn uses KNN to select and generate new instances. However, those approaches do not take into account the attributes’ dependency relationship. This chapter presents SMOTE-Cov, a modified SMOTE that uses the Covariance Matrix instead of KNN to balance datasets, with continuous attributes and binary class. We implemented two variants SMOTE-CovI, which generates new values within the interval of each attribute, and SMOTE-CovO, which allows some values to be outside the interval of the attributes. SMOTE-Cov was validated by means of an experimental study using C4.5 as a classifier. The results show that our approach displays similar performance to the state-of-the-art approaches. After applying the statistical tests of Friedman and Holm, we did not find any big significant difference.

Author supplied keywords

Cite

CITATION STYLE

APA

Leguen-deVarona, I., Madera, J., Martínez-López, Y., & Hernández-Nieto, J. C. (2020). Smote-cov: A new oversampling method based on the covariance matrix. In EAI/Springer Innovations in Communication and Computing (pp. 207–215). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-48149-0_15

Smote-cov: A new oversampling method based on the covariance matrix

Abstract

Author supplied keywords

Cite

Register to see more suggestions