An effective imputation scheme for handling missing values in the heterogeneous dataset

Sowmya Venkatesh; Maragal Venkatamuni Vijaya Kumar; Ashoka Davanageri Virupakshappa

Journal ArticleOPEN ACCESS

An effective imputation scheme for handling missing values in the heterogeneous dataset

Indonesian Journal of Electrical Engineering and Computer Science (2023) 32(1) 423-431

DOI: 10.11591/ijeecs.v32.i1.pp423-431

0Citations

24Readers

Abstract

A high level of data quality has always been a concern for many applications based on machine learning, including clinical decision support systems, weather forecasting, traffic predictions, and many others. A very limited amount of work is devoted to exploiting the missing values for effective imputation and better prediction. This paper introduces a unique approach to predicting and imputing missing data fields in the multivariate dataset such as numerical, categorical, and unstructured. The proposed imputation method is a multi-model scheme based on the joint approach of natural language processing (NLP) encoders, machine learning-driven feature extractors, and a sequential regression imputation technique to predict missing values. The proposed system is robust and scalable without requiring extensive engineering. The validation of the model is done on the benchmarked clinical dataset of heart disease obtained from UCI. The results show that the proposed methods achieve better imputation accuracy and require significantly less time than other missing data imputation methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Venkatesh, S., Kumar, M. V. V., & Virupakshappa, A. D. (2023). An effective imputation scheme for handling missing values in the heterogeneous dataset. Indonesian Journal of Electrical Engineering and Computer Science, 32(1), 423–431. https://doi.org/10.11591/ijeecs.v32.i1.pp423-431

An effective imputation scheme for handling missing values in the heterogeneous dataset

Abstract

Author supplied keywords

Cite

Register to see more suggestions