A Highly Adaptive Oversampling Approach to Address the Issue of Data Imbalance

Szilvia Szeghalmy; Attila Fazekas

Journal ArticleOPEN ACCESS

A Highly Adaptive Oversampling Approach to Address the Issue of Data Imbalance

Computers (2022) 11(5)

DOI: 10.3390/computers11050073

4Citations

15Readers

Abstract

Data imbalance is a serious problem in machine learning that can be alleviated at the data level by balancing the class distribution with sampling. In the last decade, several sampling methods have been published to address the shortcomings of the initial ones, such as noise sensitivity and incorrect neighbor selection. Based on the review of the literature, it has become clear to us that the algorithms achieve varying performance on different data sets. In this paper, we present a new oversampler that has been developed based on the key steps and sampling strategies identified by analyzing dozens of existing methods and that can be fitted to various data sets through an optimization process. Experiments were performed on a number of data sets, which show that the proposed method had a similar or better effect on the performance of SVM, DTree, kNN and MLP classifiers compared with other well-known samplers found in the literature. The results were also confirmed by statistical tests.

Author supplied keywords

Cite

CITATION STYLE

APA

Szeghalmy, S., & Fazekas, A. (2022). A Highly Adaptive Oversampling Approach to Address the Issue of Data Imbalance. Computers, 11(5). https://doi.org/10.3390/computers11050073

A Highly Adaptive Oversampling Approach to Address the Issue of Data Imbalance

Abstract

Author supplied keywords

Cite

Register to see more suggestions