The data in real-world scenario often consists of missing values which leads to difficulty in analysis. Though there has been an emergence of various algorithms handling the issue, it is, in fact, troublesome to implement from industry perspective. Based on this issue, the ‘R’-software with various available packages for missing data handling can be a fruitful solution, which is hardly reported in any previous study. Hence, the availability of such packages demands analysis to compare their performances and check their suitability for a given dataset. A comparative study is performed using the ‘R’-packages, namely missForest, Multivariate imputation by chained equations (MICE), and AMELIA-II. Two classifiers, support vector machine (SVM) and logistic regression (LR), are used for prediction. The packages are compared with regard to imputation time, effects on variance, and the efficiency. The experimental results reveal that the performances depend on the dataset size and the percentage of missing values in data.
CITATION STYLE
Sarkar, S., Pramanik, A., Khatedi, N., & Maiti, J. (2020). An Investigation of the Effects of Missing Data Handling Using ‘R’-Packages. In Advances in Intelligent Systems and Computing (Vol. 1079, pp. 275–284). Springer. https://doi.org/10.1007/978-981-15-1097-7_24
Mendeley helps you to discover research relevant for your work.