Abstract
Data imbalance is a serious problem in machine learning that can be alleviated at the data level by balancing the class distribution with sampling. In the last decade, several sampling methods have been published to address the shortcomings of the initial ones, such as noise sensitivity and incorrect neighbor selection. Based on the review of the literature, it has become clear to us that the algorithms achieve varying performance on different data sets. In this paper, we present a new oversampler that has been developed based on the key steps and sampling strategies identified by analyzing dozens of existing methods and that can be fitted to various data sets through an optimization process. Experiments were performed on a number of data sets, which show that the proposed method had a similar or better effect on the performance of SVM, DTree, kNN and MLP classifiers compared with other well-known samplers found in the literature. The results were also confirmed by statistical tests.
Author supplied keywords
Cite
CITATION STYLE
Szeghalmy, S., & Fazekas, A. (2022). A Highly Adaptive Oversampling Approach to Address the Issue of Data Imbalance. Computers, 11(5). https://doi.org/10.3390/computers11050073
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.