Abstract
The fractional hot-deck imputation (FHDI) is a general-purpose, assumption-free imputation method for handling multivariate missing data by filling each missing item with multiple observed values without resorting to artificially created values. The corresponding R package FHDI J. Im, I. Cho, and J. K. Kim, 'An R package for fractional hot deck imputation,' R J., vol. 10, no. 1, pp. 140-154, 2018 holds generality and efficiency, but it is not adequate for tackling big incomplete data due to the requirement of excessive memory and long running time. As a first step to tackle big incomplete data by leveraging the FHDI, we developed a new version of a parallel fractional hot-deck imputation (named as P-FHDI) program suitable for curing large incomplete datasets. Results show a favorable speedup when the P-FHDI is applied to big datasets with up to millions of instances or 10,000 of variables. This paper explains the detailed parallel algorithms of the P-FHDI for large instances (big-n) or high-dimensionality (big-p) datasets and confirms the favorable scalability. The proposed program inherits all the advantages of the serial FHDI and enables a parallel variance estimation, which will benefit a broad audience in science and engineering.
Author supplied keywords
Cite
CITATION STYLE
Yang, Y., Kim, J. K., & Cho, I. H. (2022). Parallel Fractional Hot-Deck Imputation and Variance Estimation for Big Incomplete Data Curing. IEEE Transactions on Knowledge and Data Engineering, 34(8), 3912–3926. https://doi.org/10.1109/TKDE.2020.3029146
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.