The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models

Dilip Singh Sisodia; Upasana Verma

Journal ArticleOPEN ACCESS

The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models

International Journal on Electrical Engineering and Informatics (2018) 10(3) 433-446

DOI: 10.15676/ijeei.2018.10.3.2

9Citations

26Readers

Get full text

Abstract

The aim of this paper is to evaluate the effect of data sampling techniques on the performance of learners using real highly imbalanced Spanish bankruptcy dataset. The class imbalance problem refers to the highly uneven distribution of class instances where one class is having most of the instances than others. In the presence of highly skewed data distribution, the performance of classical learners is heavily biased in recognizing the majority class and consequently leads to the performance degradation of quantitative classifier or predictors models. In this paper, six sampling methods such as synthetic minority oversampling technique (SMOTE), Borderline-SMOTE, Safe-level-SMOTE, Random under sampling, random oversampling and condensed nearest neighbor are used with a different individual(SVM, C4.5, and Logistic regression) and ensemble learners(AdaBoostM1, DTBagging, and Random Forests). The different quantitative prediction models are designed by combination data sampling techniques and classical learners. The performance of quantitative prediction models are evaluated using G-Mean and area under the curve (AUC) measures on the real highly imbalanced data set. The result suggest that the performance of oversampling (with LR and DTBagging) and undersampling (with C4.5 and RF) methods are superior as compare to others on this data set.

Author supplied keywords

Cite

CITATION STYLE

APA

Sisodia, D. S., & Verma, U. (2018). The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models. International Journal on Electrical Engineering and Informatics, 10(3), 433–446. https://doi.org/10.15676/ijeei.2018.10.3.2

The impact of data re-sampling on learning performance of class imbalanced bankruptcy prediction models

Abstract

Author supplied keywords

Cite

Register to see more suggestions