Missing value imputation based on k-mean clustering with weighted distance

Bankat M. Patil; Ramesh C. Joshi; Durga Toshniwal

Conference Proceedings

Missing value imputation based on k-mean clustering with weighted distance

Communications in Computer and Information Science (2010) 94 CCIS(PART 1) 600-609

DOI: 10.1007/978-3-642-14834-7_56

40Citations

36Readers

Get full text

Abstract

It is common to encounter databases that have up to a half of the entries missing, which is specifically true with medical databases. Most of the statistical and data mining techniques require complete datasets and obviously these techniques do not provide accurate results with missing values. Several methods have been proposed to deal with the missing data. Commonly used method is to delete instances with missing value attribute. These approaches are suitable when there are few missing values. In case of large number of missing values, deleting these instances results in loss of bulk of information. Other method to cope-up with this problem is to complete their imputation (filling in missing attribute). We propose an efficient missing value imputation method based on clustering with weighted distance. We divide the data set into clusters based on user specified value K. Then find a complete valued neighbor which is nearest to the missing valued instance. Then we compute the missing value by taking the average of the centroid value and the centroidal distance of the neighbor. This value is used as impute value. In our proposed approach we use K-means technique with weighted distance and show that our approach results in better performance. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Patil, B. M., Joshi, R. C., & Toshniwal, D. (2010). Missing value imputation based on k-mean clustering with weighted distance. In Communications in Computer and Information Science (Vol. 94 CCIS, pp. 600–609). https://doi.org/10.1007/978-3-642-14834-7_56

Missing value imputation based on k-mean clustering with weighted distance

Abstract

Author supplied keywords

Cite

Register to see more suggestions