Kmeans-SMOTE Integration for Handling Imbalance Data in Classifying Financial Distress Companies using SVM and Naïve Bayes

Didit Johar Maulana; Siti Saadah; Prasti Eko Yunanto

Journal ArticleOPEN ACCESS

Kmeans-SMOTE Integration for Handling Imbalance Data in Classifying Financial Distress Companies using SVM and Naïve Bayes

Jurnal RESTI (2024) 8(1) 54-61

DOI: 10.29207/resti.v8i1.5140

5Citations

25Readers

Abstract

Imbalanced data presents significant challenges in machine learning, leading to biased classification outcomes favouring the majority class. This issue is especially pronounced in financial distress classification, where data imbalance is common due to the scarcity of such instances in real-world datasets. This study aims to mitigate data imbalance in financial distress companies using the Kmeans-SMOTE method approach by combining K-means clustering and the Synthetic Minority Oversampling Technique (SMOTE). Various classification approaches, including Naïve Bayes and Support Vector Machine (SVM), are implemented on a financial distress dataset from Kaggle to evaluate the effectiveness of Kmeans-SMOTE. Experimental results show that SVM outperforms Naïve Bayes with impressive accuracy (99.1%), f1-score (99.1%), Area Under Precision-Recall (AUPRC) (99.1%), and Geometric-mean (Gmean) (98.1%). Based on these results, Kmeans-SMOTE can balance the data effectively, leading to a quite significant improvement in performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Maulana, D. J., Saadah, S., & Yunanto, P. E. (2024). Kmeans-SMOTE Integration for Handling Imbalance Data in Classifying Financial Distress Companies using SVM and Naïve Bayes. Jurnal RESTI, 8(1), 54–61. https://doi.org/10.29207/resti.v8i1.5140

Kmeans-SMOTE Integration for Handling Imbalance Data in Classifying Financial Distress Companies using SVM and Naïve Bayes

Abstract

Author supplied keywords

Cite

Register to see more suggestions