A Highly Adaptive Oversampling Approach to Address the Issue of Data Imbalance

4Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

Data imbalance is a serious problem in machine learning that can be alleviated at the data level by balancing the class distribution with sampling. In the last decade, several sampling methods have been published to address the shortcomings of the initial ones, such as noise sensitivity and incorrect neighbor selection. Based on the review of the literature, it has become clear to us that the algorithms achieve varying performance on different data sets. In this paper, we present a new oversampler that has been developed based on the key steps and sampling strategies identified by analyzing dozens of existing methods and that can be fitted to various data sets through an optimization process. Experiments were performed on a number of data sets, which show that the proposed method had a similar or better effect on the performance of SVM, DTree, kNN and MLP classifiers compared with other well-known samplers found in the literature. The results were also confirmed by statistical tests.

Cite

CITATION STYLE

APA

Szeghalmy, S., & Fazekas, A. (2022). A Highly Adaptive Oversampling Approach to Address the Issue of Data Imbalance. Computers, 11(5). https://doi.org/10.3390/computers11050073

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free