Kmeans-SMOTE Integration for Handling Imbalance Data in Classifying Financial Distress Companies using SVM and Naïve Bayes

5Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

Imbalanced data presents significant challenges in machine learning, leading to biased classification outcomes favouring the majority class. This issue is especially pronounced in financial distress classification, where data imbalance is common due to the scarcity of such instances in real-world datasets. This study aims to mitigate data imbalance in financial distress companies using the Kmeans-SMOTE method approach by combining K-means clustering and the Synthetic Minority Oversampling Technique (SMOTE). Various classification approaches, including Naïve Bayes and Support Vector Machine (SVM), are implemented on a financial distress dataset from Kaggle to evaluate the effectiveness of Kmeans-SMOTE. Experimental results show that SVM outperforms Naïve Bayes with impressive accuracy (99.1%), f1-score (99.1%), Area Under Precision-Recall (AUPRC) (99.1%), and Geometric-mean (Gmean) (98.1%). Based on these results, Kmeans-SMOTE can balance the data effectively, leading to a quite significant improvement in performance.

Cite

CITATION STYLE

APA

Maulana, D. J., Saadah, S., & Yunanto, P. E. (2024). Kmeans-SMOTE Integration for Handling Imbalance Data in Classifying Financial Distress Companies using SVM and Naïve Bayes. Jurnal RESTI, 8(1), 54–61. https://doi.org/10.29207/resti.v8i1.5140

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free