The missing data issue is a fundamental challenge in terms of analyses and classification of data. The classification performance of incomplete data could be affected and produce different accuracy results compared with complete data. In this work we compare six scalable imputation methods, implemented on a Heart Failure dataset. The comparison is done by the performance metrics of three different classification methods namely J48, REPTree, and Random Forest. The aim of the research is to find a classifier that achieves best performance results after imputing the missing data using different imputation methods. The results show that in general, the Random Forest classification achieves the best results in comparison to the decision tree J48 and REP Tree. Furthermore, the performance of classification improved when imputing the missing values by concept most common (CMC) and support vector machine (SVM).
CITATION STYLE
Al Khaldy, M., & Kambhampati, C. (2018). Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset. In Lecture Notes in Networks and Systems (Vol. 16, pp. 415–425). Springer. https://doi.org/10.1007/978-3-319-56991-8_31
Mendeley helps you to discover research relevant for your work.