Parallel Fractional Hot-Deck Imputation and Variance Estimation for Big Incomplete Data Curing

11Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The fractional hot-deck imputation (FHDI) is a general-purpose, assumption-free imputation method for handling multivariate missing data by filling each missing item with multiple observed values without resorting to artificially created values. The corresponding R package FHDI J. Im, I. Cho, and J. K. Kim, 'An R package for fractional hot deck imputation,' R J., vol. 10, no. 1, pp. 140-154, 2018 holds generality and efficiency, but it is not adequate for tackling big incomplete data due to the requirement of excessive memory and long running time. As a first step to tackle big incomplete data by leveraging the FHDI, we developed a new version of a parallel fractional hot-deck imputation (named as P-FHDI) program suitable for curing large incomplete datasets. Results show a favorable speedup when the P-FHDI is applied to big datasets with up to millions of instances or 10,000 of variables. This paper explains the detailed parallel algorithms of the P-FHDI for large instances (big-n) or high-dimensionality (big-p) datasets and confirms the favorable scalability. The proposed program inherits all the advantages of the serial FHDI and enables a parallel variance estimation, which will benefit a broad audience in science and engineering.

Cite

CITATION STYLE

APA

Yang, Y., Kim, J. K., & Cho, I. H. (2022). Parallel Fractional Hot-Deck Imputation and Variance Estimation for Big Incomplete Data Curing. IEEE Transactions on Knowledge and Data Engineering, 34(8), 3912–3926. https://doi.org/10.1109/TKDE.2020.3029146

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free