Ensemble Learning for Heterogeneous Missing Data Imputation

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Missing values can significantly affect the result of analyses and decision making in any field. Two major approaches deal with this issue: statistical and model-based methods. While the former brings bias to the analyses, the latter is usually designed for limited and specific use cases. To overcome the limitations of the two methods, we present a stacked ensemble framework based on the integration of the adaptive random forest algorithm, the Jaccard index, and Bayesian probability. Considering the challenge that the heterogeneous and distributed data from multiple sources represents, we build a model in our use case, that supports different data types: continuous, discrete, categorical, and binary. The proposed model tackles missing data in a broad and comprehensive context of massive data sources and data formats. We evaluated our proposed framework extensively on five different datasets that contained labelled and unlabelled data. The experiments showed that our framework produces encouraging and competitive results when compared to statistical and model-based methods. Since the framework works for various datasets, it overcomes the model-based limitations that were found in the literature review.

Cite

CITATION STYLE

APA

Carvalho, A. L. C., Ameyed, D., & Cheriet, M. (2020). Ensemble Learning for Heterogeneous Missing Data Imputation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12402 LNCS, pp. 127–143). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-59612-5_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free