A Diabetes Prediction System Based on Incomplete Fused Data Sources

Zhaoyi Yuan; Hao Ding; Guoqing Chao; Mingqiang Song; Lei Wang; Weiping Ding; Dianhui Chu

Journal ArticleOPEN ACCESS

A Diabetes Prediction System Based on Incomplete Fused Data Sources

Machine Learning and Knowledge Extraction (2023) 5(2) 384-399

DOI: 10.3390/make5020023

5Citations

8Readers

Abstract

In recent years, the diabetes population has grown younger. Therefore, it has become a key problem to make a timely and effective prediction of diabetes, especially given a single data source. Meanwhile, there are many data sources of diabetes patients collected around the world, and it is extremely important to integrate these heterogeneous data sources to accurately predict diabetes. For the different data sources used to predict diabetes, the predictors may be different. In other words, some special features exist only in certain data sources, which leads to the problem of missing values. Considering the uncertainty of the missing values within the fused dataset, multiple imputation and a method based on graph representation is used to impute the missing values within the fused dataset. The logistic regression model and stacking strategy are applied for diabetes training and prediction on the fused dataset. It is proved that the idea of combining heterogeneous datasets and imputing the missing values produced in the fusion process can effectively improve the performance of diabetes prediction. In addition, the proposed diabetes prediction method can be further extended to any scenarios where heterogeneous datasets with the same label types and different feature attributes exist.

Author supplied keywords

Cite

CITATION STYLE

APA

Yuan, Z., Ding, H., Chao, G., Song, M., Wang, L., Ding, W., & Chu, D. (2023). A Diabetes Prediction System Based on Incomplete Fused Data Sources. Machine Learning and Knowledge Extraction, 5(2), 384–399. https://doi.org/10.3390/make5020023

A Diabetes Prediction System Based on Incomplete Fused Data Sources

Abstract

Author supplied keywords

Cite

Register to see more suggestions