A high level of data quality has always been a concern for many applications based on machine learning, including clinical decision support systems, weather forecasting, traffic predictions, and many others. A very limited amount of work is devoted to exploiting the missing values for effective imputation and better prediction. This paper introduces a unique approach to predicting and imputing missing data fields in the multivariate dataset such as numerical, categorical, and unstructured. The proposed imputation method is a multi-model scheme based on the joint approach of natural language processing (NLP) encoders, machine learning-driven feature extractors, and a sequential regression imputation technique to predict missing values. The proposed system is robust and scalable without requiring extensive engineering. The validation of the model is done on the benchmarked clinical dataset of heart disease obtained from UCI. The results show that the proposed methods achieve better imputation accuracy and require significantly less time than other missing data imputation methods.
CITATION STYLE
Venkatesh, S., Kumar, M. V. V., & Virupakshappa, A. D. (2023). An effective imputation scheme for handling missing values in the heterogeneous dataset. Indonesian Journal of Electrical Engineering and Computer Science, 32(1), 423–431. https://doi.org/10.11591/ijeecs.v32.i1.pp423-431
Mendeley helps you to discover research relevant for your work.