PRINCIPAL COMPONENT ANALYSIS IMPLEMENTATION ON MACHINE LEARNING IN DIABETES CLASSIFICATION

1Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Diabetes Mellitus, a global health burden linked to increased cancer risks, can be identified through variables like BMI, age, blood sugar, and HbA1c. This study explored diverse machine learning techniques for diabetes prediction, emphasizing dimensionality reduction and feature selection's role in enhancing model accuracy. Our motive is to compare the performance of multiple machine learning algorithms measures between original data and original data on which the handling sampling method or principal component analysis (PCA) was applied. The study utilizes Kaggle's "Diabetes Prediction Dataset" with 100,000 entries, employing eight features and one target variable related to diabetes. In the experiment, the dataset was divided into three distinct datasets: 1) whole dataset, 2) dataset containing males only, and 3) dataset containing females only. Those datasets were trained with multiple machine learning models: K-Nearest Neighbor (KNN), Decision Tree (DT), Support Vector Machines (SVM), XGBoost (XGB), and Random Forest (RF). The findings revealed that XGB outperformed other models with f1-score of 80.87 for an imbalanced dataset. Moreover, in diabetes classification based on gender, the random forest model was better for males with 80.34 as the f1-score while XGB was good for females 81.9 as the f1-score.

Cite

CITATION STYLE

APA

Tantowen, M., Putra, K., Isnan, M., & Pardamean, B. (2024). PRINCIPAL COMPONENT ANALYSIS IMPLEMENTATION ON MACHINE LEARNING IN DIABETES CLASSIFICATION. Communications in Mathematical Biology and Neuroscience, 2024. https://doi.org/10.28919/cmbn/8492

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free