Abstract
Imbalanced data presents significant challenges in machine learning, leading to biased classification outcomes favouring the majority class. This issue is especially pronounced in financial distress classification, where data imbalance is common due to the scarcity of such instances in real-world datasets. This study aims to mitigate data imbalance in financial distress companies using the Kmeans-SMOTE method approach by combining K-means clustering and the Synthetic Minority Oversampling Technique (SMOTE). Various classification approaches, including Naïve Bayes and Support Vector Machine (SVM), are implemented on a financial distress dataset from Kaggle to evaluate the effectiveness of Kmeans-SMOTE. Experimental results show that SVM outperforms Naïve Bayes with impressive accuracy (99.1%), f1-score (99.1%), Area Under Precision-Recall (AUPRC) (99.1%), and Geometric-mean (Gmean) (98.1%). Based on these results, Kmeans-SMOTE can balance the data effectively, leading to a quite significant improvement in performance.
Author supplied keywords
Cite
CITATION STYLE
Maulana, D. J., Saadah, S., & Yunanto, P. E. (2024). Kmeans-SMOTE Integration for Handling Imbalance Data in Classifying Financial Distress Companies using SVM and Naïve Bayes. Jurnal RESTI, 8(1), 54–61. https://doi.org/10.29207/resti.v8i1.5140
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.