PE_DIM: An Efficient Probabilistic Ensemble Classification Algorithm for Diabetes Handling Class Imbalance Missing Values

10Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Diabetes has become one of the seven major diseases affecting human death, so early prediction of the disease to prevent it is critical. Several existing works of literature, however, make predictions about diabetes with few considerations of missing and imbalanced data proper. To overcome these problems, in this paper, we propose an efficient Probabilistic Ensemble classification algorithm for Diabetes handling class Imbalance Missing values (PE_DIM) which can effectively handle the issue of missing imbalances and improve classification accuracy. First, a novel method based on Local Median-based Gaussian Naive Bayes (LMeGNB) is proposed to compensate for the missing values, combined with the K-means SMOTE method to adjust the positive and negative samples of diabetes to obtain the normalized balanced data. Then, a probability-based multi-stage ensemble is devoted to building ensemble models on the different types of machine learning algorithms. When extreme gradient boosting, random forests, and weighted k nearest neighbors are integrated, the highest classification accuracy of 94.53% is obtained on Pima Indian diabetes dataset. Finally, to evaluate the PE_DIM model, the experiment equally considered two diabetes datasets, RSMH and Tabriz, to demonstrate the generality of the method in diabetes prediction. Additionally, in terms of area under the receiver operating characteristic curve metric uses several statistical tests to measure the performance of different classification methods. The ultimate results demonstrate that the average rank of this method is ranked first after 5-fold cross-validation, which is significantly different from the basic classifiers. Promisingly, the proposed method effectively solves the lack of diabetes imbalance and plays a significant role in intelligent medical treatment to improve diabetes research.

Cite

CITATION STYLE

APA

Jia, L., Wang, Z., Lv, S., & Xu, Z. (2022). PE_DIM: An Efficient Probabilistic Ensemble Classification Algorithm for Diabetes Handling Class Imbalance Missing Values. IEEE Access, 10, 107459–107476. https://doi.org/10.1109/ACCESS.2022.3212067

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free