An Investigation of the Effects of Missing Data Handling Using ‘R’-Packages

7Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The data in real-world scenario often consists of missing values which leads to difficulty in analysis. Though there has been an emergence of various algorithms handling the issue, it is, in fact, troublesome to implement from industry perspective. Based on this issue, the ‘R’-software with various available packages for missing data handling can be a fruitful solution, which is hardly reported in any previous study. Hence, the availability of such packages demands analysis to compare their performances and check their suitability for a given dataset. A comparative study is performed using the ‘R’-packages, namely missForest, Multivariate imputation by chained equations (MICE), and AMELIA-II. Two classifiers, support vector machine (SVM) and logistic regression (LR), are used for prediction. The packages are compared with regard to imputation time, effects on variance, and the efficiency. The experimental results reveal that the performances depend on the dataset size and the percentage of missing values in data.

Cite

CITATION STYLE

APA

Sarkar, S., Pramanik, A., Khatedi, N., & Maiti, J. (2020). An Investigation of the Effects of Missing Data Handling Using ‘R’-Packages. In Advances in Intelligent Systems and Computing (Vol. 1079, pp. 275–284). Springer. https://doi.org/10.1007/978-981-15-1097-7_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free