Diabetes is the root cause of various chronic diseases. Developing an intelligent diabetes prediction model can handle the disease efficiently. Disease data is big as it is generated from the patient details with their diagnosis reports. Disease dataset contains the disease features/attributes (symptoms) values of the patient objects. Designing of an efficient prediction model to handle the big data, feature selection is necessary. Sometimes classification result varies with different algorithms for the same dataset. In that case, an ensemble classification approach is the solution. Thus, two modules, such as feature selection and classification, are important to design an efficient prediction model. The paper proposes a diabetes prediction model to handle big data by using genetic algorithm and machine learning techniques in MapReduce framework implementation. In the first phase, genetic algorithm is used to select the optimized feature subset, and in the second phase, ensemble classification system is developed from this reduced subsystem by using classification algorithms Naïve Bayes, random forest, and KNN with majority voting technique. The proposed prediction model can identify the label of the test patient objects correctly. Diabetes dataset is collected from UCI repository to test the model.
CITATION STYLE
Sengupta, S., & Ranjan Pal, K. (2022). Design of an Intelligent Diabetes Prediction Model in Big Data Environment. In Lecture Notes in Networks and Systems (Vol. 376, pp. 151–163). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-16-8826-3_14
Mendeley helps you to discover research relevant for your work.