The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance

Cherfly Kaope; Yoga Pristyanto

Journal ArticleOPEN ACCESS

The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance

Kaope C
Pristyanto Y

MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer (2023) 22(2) 227-238

DOI: 10.30812/matrik.v22i2.2515

N/ACitations

118Readers

Abstract

Class imbalance is a condition where the amount of data in the minority class is smaller than that of the majority class. The impact of the class imbalance in the dataset is the occurrence of minority class misclassification, so it can affect classification performance. Various approaches have been taken to deal with the problem of class imbalances such as the data level approach, algorithmic level approach, and cost-sensitive learning. At the data level, one of the methods used is to apply the sampling method. In this study, the ADASYN, SMOTE, and SMOTE-ENN sampling methods were used to deal with the problem of class imbalance combined with the AdaBoost, K-Nearest Neighbor, and Random Forest classification algorithms. The purpose of this study was to determine the effect of handling class imbalances on the dataset on classification performance. The tests were carried out on five datasets and based on the results of the classification the integration of the ADASYN and Random Forest methods gave better results compared to other model schemes. The criteria used to evaluate include accuracy, precision, true positive rate, true negative rate, and g-mean score. The results of the classification of the integration of the ADASYN and Random Forest methods gave 5% to 10% better than other models.

Cite

CITATION STYLE

APA

Kaope, C., & Pristyanto, Y. (2023). The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 22(2), 227–238. https://doi.org/10.30812/matrik.v22i2.2515

The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance

Abstract

Cite

Register to see more suggestions