Improving the Diabetes Diagnosis Prediction Rate Using Data Preprocessing, Data Augmentation and Recursive Feature Elimination Method

E. Sabitha; M. Durgadevi

Journal ArticleOPEN ACCESS

Improving the Diabetes Diagnosis Prediction Rate Using Data Preprocessing, Data Augmentation and Recursive Feature Elimination Method

International Journal of Advanced Computer Science and Applications (2022) 13(9) 921-930

DOI: 10.14569/IJACSA.2022.01309107

10Citations

43Readers

Abstract

Hyperglycemia is a symptom of diabetes mellitus, a metabolic condition brought on by the body's inability to produce enough insulin and respond to it. Diabetes can damage body organs if it is not adequately managed or detected in a timely manner. Many years of research into diabetes diagnosis has led to a suitable method for diabetes prediction. However, there is still scope for improvement regarding precision. The paper's primary objective is to emphasize the value of data preprocessing, feature selection, and data augmentation in disease prediction. Techniques for data preprocessing, feature selection, and data augmentation can assist classification algorithms function more effectively in the diagnosis and prediction of diabetes. A proposed method is employed for diabetes diagnosis and prediction using the PIMA Indian dataset. A systematic framework for conducting a comparison analysis based on the effectiveness of a three-category categorization model is provided in this study. The first category compares the model's performance with and without data preprocessing. The second category compares the performance of five alternative algorithms employing the Recursive Feature Elimination (RFE) feature selection method. Data augmentation is the third category; data augmentation is done with SMOTE Oversampling, and comparisons are made with and without SMOTE Oversampling. On the PIMA Indian Diabetes dataset, studies showed that data preprocessing, RFE with Random Forest Regression feature selection, and SMOTE Oversampling augmentation can produce accuracy scores of 81.25% with RF, 81.16 with DT, and 82.5% with SVC. From Six Classifiers LR, RF, DT, SVC, GNB and KNN, it is observed that RF, DT, and SVC performed better in accuracy level. The comparative study enables us to comprehend the value of data preprocessing, feature selection, and data augmentation in the disease prediction process as well as how they affect performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Sabitha, E., & Durgadevi, M. (2022). Improving the Diabetes Diagnosis Prediction Rate Using Data Preprocessing, Data Augmentation and Recursive Feature Elimination Method. International Journal of Advanced Computer Science and Applications, 13(9), 921–930. https://doi.org/10.14569/IJACSA.2022.01309107

Improving the Diabetes Diagnosis Prediction Rate Using Data Preprocessing, Data Augmentation and Recursive Feature Elimination Method

Abstract

Author supplied keywords

Cite

Register to see more suggestions