Optimized stacking ensemble for early-stage diabetes mellitus prediction

5Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

Abstract

This paper presents an optimized stacking-based hybrid machine learning approach for predicting early-stage diabetes mellitus (DM) using the PIMA Indian diabetes (PID) dataset and early-stage diabetes risk prediction (ESDRP) dataset. The methodology involves handling missing values through mean imputation, balancing the dataset using the synthetic minority over-sampling technique (SMOTE), normalizing features, and employing a stratified train-test split. Logistic regression (LR), naïve Bayes (NB), AdaBoost with support vector machines (AdaBoost+SVM), artificial neural network (ANN), and k-nearest neighbors (k-NN) are used as base learners (level 0), while random forest (RF) meta-classifier serves as the level 1 model to combine their predictions. The proposed model achieves impressive accuracy rates of 99.7222% for the ESDRP dataset and 94.2085% for the PID dataset, surpassing existing literature by absolute differences ranging from 10.2085% to 16.7222%. The stacking-based hybrid model offers advantages for early-stage DM prediction by leveraging multiple base learners and a meta-classifier. SMOTE addresses class imbalance, while feature normalization ensures fair treatment of features during training. The findings suggest that the proposed approach holds promise for early-stage DM prediction, enabling timely interventions and preventive measures.

Cite

CITATION STYLE

APA

Aman, & Chhillar, R. S. (2023). Optimized stacking ensemble for early-stage diabetes mellitus prediction. International Journal of Electrical and Computer Engineering, 13(6), 7048–7055. https://doi.org/10.11591/ijece.v13i6.pp7048-7055

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free