Data in probabilistic databases may not be absolutely correct, and worse, may be erroneous. Many existing data cleaning methods can be used to detect errors in traditional databases, but they fall short of guiding us to find errors in probabilistic databases, especially for databases with complex correlations among data. In this paper, we propose a method for tracing errors in probabilistic databases by adopting Bayesian network (BN) as the framework of representing the correlations among data. We first develop the techniques to construct an augmented Bayesian network (ABN) for an anomalous query to represent correlations among input data, intermediate data and output data in the query execution. Inspired by the notion of blame in causal models, we then define a notion of blame for ranking candidate errors. Next, we provide an efficient method for computing the degree of blame for each candidate error based on the probabilistic inference upon the ABN. Experimental results show the effectiveness and efficiency of our method.
CITATION STYLE
Duan, L., Yue, K., Jin, C., Xu, W., & Liu, W. (2015). Tracing errors in probabilistic databases based on the Bayesian network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9050, pp. 104–119). Springer Verlag. https://doi.org/10.1007/978-3-319-18123-3_7
Mendeley helps you to discover research relevant for your work.