Missing value imputation based on k-mean clustering with weighted distance

40Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.
Get full text

Abstract

It is common to encounter databases that have up to a half of the entries missing, which is specifically true with medical databases. Most of the statistical and data mining techniques require complete datasets and obviously these techniques do not provide accurate results with missing values. Several methods have been proposed to deal with the missing data. Commonly used method is to delete instances with missing value attribute. These approaches are suitable when there are few missing values. In case of large number of missing values, deleting these instances results in loss of bulk of information. Other method to cope-up with this problem is to complete their imputation (filling in missing attribute). We propose an efficient missing value imputation method based on clustering with weighted distance. We divide the data set into clusters based on user specified value K. Then find a complete valued neighbor which is nearest to the missing valued instance. Then we compute the missing value by taking the average of the centroid value and the centroidal distance of the neighbor. This value is used as impute value. In our proposed approach we use K-means technique with weighted distance and show that our approach results in better performance. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Patil, B. M., Joshi, R. C., & Toshniwal, D. (2010). Missing value imputation based on k-mean clustering with weighted distance. In Communications in Computer and Information Science (Vol. 94 CCIS, pp. 600–609). https://doi.org/10.1007/978-3-642-14834-7_56

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free