Comparative Study of Missing Value Imputation Techniques on E-Commerce Product Ratings

Dimple Chehal; Parul Gupta; Payal Gulati; Tanisha Gupta

Journal ArticleOPEN ACCESS

Comparative Study of Missing Value Imputation Techniques on E-Commerce Product Ratings

Informatica (Slovenia) (2023) 47(3) 373-382

DOI: 10.31449/inf.v47i3.4156

7Citations

40Readers

Abstract

Missing data is typical as it adds ambiguity to data interpretation, and missing values in a dataset represent loss of vital information. It is one of the most common data quality concerns, and missing values are typically expressed as NANs, blanks, or other placeholders. Missing values create imbalanced observations, biased estimates and sometimes lead to misleading results. As a result, to deliver an efficient and valid analysis, there arises a need to take the solutions into account appropriately. By filling in the missing values, a complete dataset can be created and the challenge of dealing with complex patterns of missingness can be avoided. In the present study, eight different imputation methods: SimpleImputer, KNN Imputation (KNN), Hot Deck, Linear Regression, MissForest, Random Forest Regression, DataWig, and Multivariate Imputation by Chained Equation (MICE) have been compared. The comparison has been performed on Amazon cell phone dataset based on three parameters: R-Squared Error (R2), Mean Squared Error (MSE), and Mean Absolute Error (MAE). Based on the findings KNN had the best outcomes, while DataWig had the worst results for R-Squared error (R2). In terms of Mean Squared Error (MSE) and Mean Absolute Error (MAE), the Hot Deck imputation approach fared best, whereas MissForest performed worst for Mean Absolute Error (MAE). The Hot Deck imputation approach seems to be of interest and should be investigated further in practice.

Author supplied keywords

Cite

CITATION STYLE

APA

Chehal, D., Gupta, P., Gulati, P., & Gupta, T. (2023). Comparative Study of Missing Value Imputation Techniques on E-Commerce Product Ratings. Informatica (Slovenia), 47(3), 373–382. https://doi.org/10.31449/inf.v47i3.4156

Comparative Study of Missing Value Imputation Techniques on E-Commerce Product Ratings

Abstract

Author supplied keywords

Cite

Register to see more suggestions