A Comparative Analysis of Smote and CSSF Techniques for Diabetes Classification Using Imbalanced Data

7Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.

Abstract

Diabetes, a prevalent chronic metabolic disorder, poses a significant burden on healthcare systems worldwide. Accurate and timely diagnosis is crucial for effective management and complication prevention. Machine learning presents a promising solution but often faces challenges due to class imbalance within datasets, particularly the underrepresentation of diabetic cases. To address this issue, we introduce Cluster-based Synthetic Sample Filtering (CSSF), a method that enhances synthetic sample quality through advanced clustering and filtering techniques. Building upon the Synthetic Minority Over-sampling Technique (SMOTE), CSSF strategically generates synthetic samples within clusters while eliminating noisy instances, thereby improving classification accuracy and reliability. Comparative analysis demonstrates CSSF’s effectiveness in mitigating class imbalance. Initial models achieved a 67% accuracy rate, which improved to 82% after smote preprocessing. CSSF further elevated accuracy to an impressive 90%. Notably, Support Vector Machines (SVM), neural networks (deep learning) and random forest achieved a remarkable 92% accuracy post-CSSF preprocessing. Decision tree and K-Nearest Neighbors (KNN) also demonstrated commendable accuracy after CSSF preprocessing. Crucially, CSSF consistently outperformed smote in precision, recall, and the F1-score, highlighting its superiority. Recognizing the importance of ethical AI practices, this study addresses ethical considerations and potential biases in machine learning within healthcare data analysis, promoting fairness, transparency and responsible AI utilization. This research underscores the necessity of ethical and effective approaches to address class imbalance in diabetes classification.

Cite

CITATION STYLE

APA

Aubaidan, B. H., Kadir, R. A., & Ijab, M. T. (2024). A Comparative Analysis of Smote and CSSF Techniques for Diabetes Classification Using Imbalanced Data. Journal of Computer Science, 20(9), 1146–1165. https://doi.org/10.3844/JCSSP.2024.1146.1165

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free