Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm

Wahyu Nugraha; Muhammad Sony Maulana; Agung Sasongko

Conference ProceedingsOPEN ACCESS

Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm

Journal of Physics: Conference Series (2020) 1641(1)

DOI: 10.1088/1742-6596/1641/1/012014

15Citations

29Readers

Abstract

Machine Learning is very difficult to make an effective learning model if the distribution of classes in the training data set that is used is not balanced. The problem of class imbalance is mostly found during classifications in the real world where one class is very small in number (minority class) while the other classes are very numerous (majority in class). Building a learning algorithm model without considering the problem of class imbalance causes the learning model to be flooded by majority class instances so that it ignores minority class predictions. Random undersampling and oversampling techniques have been widely used in various studies to overcome class imbalances. In this study using the undersampling strategy with clustering techniques while the classification model uses C4.5. Clustering is used to group data and the undersampling process is performed on eachdata group. The goal is that sample samples that are useful are not eliminated. Statistical test results from experiments using 10 imbalance datasets from KEEL-repository dan Kaggle dataset with various sample sizes indicate that clustering-based undersampling produces satisfactory performance. Improved performance can be seen from the sensitivity and AUC values that increased significantly.

Cite

CITATION STYLE

APA

Nugraha, W., Maulana, M. S., & Sasongko, A. (2020). Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm. In Journal of Physics: Conference Series (Vol. 1641). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/1641/1/012014

Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm

Abstract

Cite

Register to see more suggestions