An effective imputation scheme for handling missing values in the heterogeneous dataset

0Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

A high level of data quality has always been a concern for many applications based on machine learning, including clinical decision support systems, weather forecasting, traffic predictions, and many others. A very limited amount of work is devoted to exploiting the missing values for effective imputation and better prediction. This paper introduces a unique approach to predicting and imputing missing data fields in the multivariate dataset such as numerical, categorical, and unstructured. The proposed imputation method is a multi-model scheme based on the joint approach of natural language processing (NLP) encoders, machine learning-driven feature extractors, and a sequential regression imputation technique to predict missing values. The proposed system is robust and scalable without requiring extensive engineering. The validation of the model is done on the benchmarked clinical dataset of heart disease obtained from UCI. The results show that the proposed methods achieve better imputation accuracy and require significantly less time than other missing data imputation methods.

Cite

CITATION STYLE

APA

Venkatesh, S., Kumar, M. V. V., & Virupakshappa, A. D. (2023). An effective imputation scheme for handling missing values in the heterogeneous dataset. Indonesian Journal of Electrical Engineering and Computer Science, 32(1), 423–431. https://doi.org/10.11591/ijeecs.v32.i1.pp423-431

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free