Healthcare organizations aim at deriving valuable insights employing data mining and soft computing techniques on the vast data stores that have been accumulated over the years. This data however, might consist of missing, incorrect and most of the time, incomplete instances that can have a detrimental effect on the predictive analytics of the healthcare data. Preprocessing of this data, specifically the imputation of missing values offers a challenge for reliable modeling. This work presents a novel preprocessing phase with missing value imputation for both numerical and categorical data. A hybrid combination of Classification and Regression Trees (CART) and Genetic Algorithms to impute missing continuous values and Self Organizing Feature Maps (SOFM) to impute categorical values is adapted in this work. Further, Artificial Neural Networks (ANN) is used to validate the improved accuracy of prediction after imputation. To evaluate this model, we use PIMA Indians Diabetes Data set (PIDD), and Mammographic Mass Data (MMD). The accuracy of the proposed model that emphasizes on a preprocessing phase is shown to be superior over the existing techniques. This approach is simple, easy to implement and practically reliable. © 2011 Springer-Verlag.
CITATION STYLE
Bhat, V. H., Rao, P. G., Krishna, S., Shenoy, P. D., Venugopal, K. R., & Patnaik, L. M. (2011). An efficient framework for prediction in healthcare data using soft computing techniques. In Communications in Computer and Information Science (Vol. 192 CCIS, pp. 522–532). https://doi.org/10.1007/978-3-642-22720-2_55
Mendeley helps you to discover research relevant for your work.