An Investigation of the Effects of Missing Data Handling Using ‘R’-Packages

Sobhan Sarkar; Anima Pramanik; Nikhil Khatedi; J. Maiti

Conference Proceedings

An Investigation of the Effects of Missing Data Handling Using ‘R’-Packages

Advances in Intelligent Systems and Computing (2020) 1079 275-284

DOI: 10.1007/978-981-15-1097-7_24

7Citations

3Readers

Get full text

Abstract

The data in real-world scenario often consists of missing values which leads to difficulty in analysis. Though there has been an emergence of various algorithms handling the issue, it is, in fact, troublesome to implement from industry perspective. Based on this issue, the ‘R’-software with various available packages for missing data handling can be a fruitful solution, which is hardly reported in any previous study. Hence, the availability of such packages demands analysis to compare their performances and check their suitability for a given dataset. A comparative study is performed using the ‘R’-packages, namely missForest, Multivariate imputation by chained equations (MICE), and AMELIA-II. Two classifiers, support vector machine (SVM) and logistic regression (LR), are used for prediction. The packages are compared with regard to imputation time, effects on variance, and the efficiency. The experimental results reveal that the performances depend on the dataset size and the percentage of missing values in data.

Author supplied keywords

Cite

CITATION STYLE

APA

Sarkar, S., Pramanik, A., Khatedi, N., & Maiti, J. (2020). An Investigation of the Effects of Missing Data Handling Using ‘R’-Packages. In Advances in Intelligent Systems and Computing (Vol. 1079, pp. 275–284). Springer. https://doi.org/10.1007/978-981-15-1097-7_24

An Investigation of the Effects of Missing Data Handling Using ‘R’-Packages

Abstract

Author supplied keywords

Cite

Register to see more suggestions