The objective of this study is to derive a simple, yet effective type 2 Diabetes Risk Score Tool for Indian population using semantic discretization and machine learning techniques. The dataset used for training and validation is taken from Annual Health Survey, containing over 1.65 million people’s health-related information from 284 districts of India. This is the first study of its kind that truly represents the Indian population. A combination of feature selections techniques is used to find the minimal subset of attributes that optimally contribute in determining the class attribute. Continuous independent variables (various diabetes risk factors) are discretized using semantic discretization technique. The discretized dataset is then used in deriving Weighted Diabetes Risk Score for each risk factor. An optimal cutoff value for Total Weighted Diabetes Risk Score (TWDRS) is determined based on the evaluation parameters such as sensitivity, specificity, prediction accuracy, and proportion of population kept in high risk. The dataset used for this study contains 16,38,923 records. Records (7,42,605) that meet our criteria are selected for this study. Experimental results show that, at optimal cut point, TWDRS >=19, sensitivity is 72.55%, specificity is 61.99%, and proportion of population at high risk is 39.29%.
CITATION STYLE
Chandrakar, O., & Saini, J. R. (2018). Derivation of a novel diabetes risk score using semantic discretization for Indian population. In Advances in Intelligent Systems and Computing (Vol. 696, pp. 331–340). Springer Verlag. https://doi.org/10.1007/978-981-10-7386-1_29
Mendeley helps you to discover research relevant for your work.