Imputation for categorical attributes with probabilistic reasoning

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Since incompleteness affects the data usage, missing values in database should be estimated to make data mining and analysis more accurate. In addition to ignoring or setting to default values, many imputation methods have been proposed, but all of them have their limitations. This paper proposes a probabilistic method to estimate missing values. We construct a Bayesian network in a novel way to identify the dependencies in a dataset, then use the Bayesian reasoning process to find the most probable substitution for each missing value. The benefits of this method include (1) irrelevant attributes can be ignored during estimation; (2) network is built with no target attribute, which means all attributes are handled in one model;(3) probability information can be obtained to measure the accuracy of the imputation. Experimental results show that our construction algorithm is effective and the quality of filled values outperforms the mode imputation method and kNN method. We also verify the effectiveness of the probabilities given by our method experimentally. © 2013 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Jin, L., Wang, H., & Gao, H. (2013). Imputation for categorical attributes with probabilistic reasoning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7923 LNCS, pp. 87–98). Springer Verlag. https://doi.org/10.1007/978-3-642-38562-9_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free