Over-sampling imbalanced datasets using the covariance matrix

Ireimis Leguen-de Varona; Julio Madera; Yoan Martínez-López; José Carlos Hernández-Nieto

Journal ArticleOPEN ACCESS

Over-sampling imbalanced datasets using the covariance matrix

EAI Endorsed Transactions on Energy Web (2020) 7(27)

DOI: 10.4108/eai.13-7-2018.163982

1Citations

6Readers

Abstract

INTRODUCTION: Nowadays, many machine learning tasks involve learning from imbalanced datasets, leading to the miss-classification of the minority class. One of the state-of-the-art approaches to "solve" this problem at the data level is Synthetic Minority Over-sampling Technique (SMOTE) which in turn uses KNearest Neighbors (KNN) algorithm to select and generate new instances. OBJECTIVES: This paper presents SMOTE-Cov, a modified SMOTE that use Covariance Matrix instead of KNN to balance datasets, with continuous attributes and binary class. METHODS: We implemented two variants SMOTE-CovI, which generates new values within the interval of each attribute and SMOTE-CovO, which allows some values to be outside the interval of the attributes. RESULTS: The results show that our approach has a similar performance as the state-of-the-art approaches. CONCLUSION: In this paper, a new algorithm is proposed to generate synthetic instances of the minority class, using the Covariance Matrix.

Author supplied keywords

Cite

CITATION STYLE

APA

Leguen-de Varona, I., Madera, J., Martínez-López, Y., & Hernández-Nieto, J. C. (2020). Over-sampling imbalanced datasets using the covariance matrix. EAI Endorsed Transactions on Energy Web, 7(27). https://doi.org/10.4108/eai.13-7-2018.163982

Over-sampling imbalanced datasets using the covariance matrix

Abstract

Author supplied keywords

Cite

Register to see more suggestions