Avoiding Overfitting dan Overlapping in Handling Class Imbalanced Using Hybrid Approach with Smoothed Bootstrap Resampling and Feature Selection

2Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

The dataset tends to have the possibility to experience imbalance as indicated by the presence of a class with a much larger number (majority) compared to other classes(minority). This condition results in the possibility of failing to obtain a minority class even though the accuracy obtained is high. In handling class imbalance, the problems of diversity and classifier performance must be considered. Hence, the Hybrid Approach method that combines the sampling method and classifier ensembles presents satisfactory results. The Hybrid Approach generally uses the oversampling method, which is prone to overfitting problems. The overfitting condition is indicated by high accuracy in the training data, but the testing data can show differences in accuracy. Therefore, in this study, Smoothed Bootstrap Resampling is the oversampling method used in the Hybrid Approach, which can prevent overfitting. However, it is not only the class imbalance that contributes to the decline in classifier performance. There are also overlapping issues that need to be considered. The approach that can be used to overcome overlapping is Feature Selection. Feature selection can reduce overlap by minimizing the overlap degree. This research combined the application of Feature Selection with Hybrid Approach Redefinition, which modifies the use of Smoothed Bootstrap Resampling in handling class imbalance in medical datasets. The preprocessing stage in the proposed method was carried out using Smoothed Bootstrap Resampling and Feature Selection. The Feature Selection method used is Feature Assessment by Sliding Thresholds (FAST). While the processing is done using Random Under Sampling and SMOTE. The overlapping measurement parameters use Augmented R-Value, and Classifier Performance uses the Balanced Error Rate, Precision, Recall, and F-Value parameters. The Balanced Error Rate states the combined error of the majority and minority classes in the 10-Fold Validation test, allowing each subset to become training data. The results showed that the proposed method provides better performance when compared to the comparison method.

Cite

CITATION STYLE

APA

Hartono, & Ongko, E. (2022). Avoiding Overfitting dan Overlapping in Handling Class Imbalanced Using Hybrid Approach with Smoothed Bootstrap Resampling and Feature Selection. International Journal on Informatics Visualization, 6(2), 343–348. https://doi.org/10.30630/joiv.6.2.985

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free