Missing Data Handling by Mean Imputation Method and Statistical Analysis of Classification Algorithm

K. Maheswari; P. Packia Amutha Priya; S. Ramkumar; M. Arun

Book Chapter

Missing Data Handling by Mean Imputation Method and Statistical Analysis of Classification Algorithm

Springer Science and Business Media Deutschland GmbH, (2020), 137-149

DOI: 10.1007/978-3-030-19562-5_14

7Citations

9Readers

Get full text

Abstract

The motive of data mining is to extract meaningful information from the large database. Because of the human errors, their high dimensionality, noisy data, and missing values, the process over dataset may degrade the performance. Therefore, the need for handling of those data in a proper way is important for improving the performance. There are many missing data handling methods available. Mean imputation is one of the methods for missing data in the dataset. This is the preprocessing operation performed before applying any machine learning algorithms. After applying mean imputation in a dataset, the decision is made either imputed mean value is good or bad. The rpart decision tree algorithm is applied on retailer dataset to handle more number of classes. From the experimental results, there is no significant difference among variables. The results of various GLM models with different were compared and analyzed to provide better performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Maheswari, K., Packia Amutha Priya, P., Ramkumar, S., & Arun, M. (2020). Missing Data Handling by Mean Imputation Method and Statistical Analysis of Classification Algorithm. In EAI/Springer Innovations in Communication and Computing (pp. 137–149). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-19562-5_14

Missing Data Handling by Mean Imputation Method and Statistical Analysis of Classification Algorithm

Abstract

Author supplied keywords

Cite

Register to see more suggestions