A Diabetes Prediction System Based on Incomplete Fused Data Sources

5Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

In recent years, the diabetes population has grown younger. Therefore, it has become a key problem to make a timely and effective prediction of diabetes, especially given a single data source. Meanwhile, there are many data sources of diabetes patients collected around the world, and it is extremely important to integrate these heterogeneous data sources to accurately predict diabetes. For the different data sources used to predict diabetes, the predictors may be different. In other words, some special features exist only in certain data sources, which leads to the problem of missing values. Considering the uncertainty of the missing values within the fused dataset, multiple imputation and a method based on graph representation is used to impute the missing values within the fused dataset. The logistic regression model and stacking strategy are applied for diabetes training and prediction on the fused dataset. It is proved that the idea of combining heterogeneous datasets and imputing the missing values produced in the fusion process can effectively improve the performance of diabetes prediction. In addition, the proposed diabetes prediction method can be further extended to any scenarios where heterogeneous datasets with the same label types and different feature attributes exist.

Cite

CITATION STYLE

APA

Yuan, Z., Ding, H., Chao, G., Song, M., Wang, L., Ding, W., & Chu, D. (2023). A Diabetes Prediction System Based on Incomplete Fused Data Sources. Machine Learning and Knowledge Extraction, 5(2), 384–399. https://doi.org/10.3390/make5020023

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free