Source selection for inconsistency detection

Lingli Li; Xu Feng; Hongyu Shao; Jinbao Li

Conference Proceedings

Source selection for inconsistency detection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10828 LNCS 370-385

DOI: 10.1007/978-3-319-91458-9_22

5Citations

4Readers

Get full text

Abstract

Inconsistencies in a database can be detected based on violations of integrity constraints, such as functional depencies (FDs). In big data era, many related data sources give us the chance of detecting inconsistency extensively. That is, even though violations do not exist in a single data set D, we can leverage other data sources to discover potential violations. A significant challenge for violation detection based on data sources is that accessing too many data sources introduces a huge cost, while involving too few data sources may miss serious violations. Motivated by this, we investigate how to select a proper subset of sources for inconsistency detection. To address this problem, we formulate the gain model of sources and introduce the optimization problem of source selection, called SSID, in which the gain is maximized with the cost under a threshold. We show that the SSID problem is NP-hard and propose a greedy approximation approach for SSID. To avoid accessing data sources, we also present a randomized technique for gain estimation with theoretical guarantees. Experimental results on both real and synthetic data show high performance on both effectiveness and efficiency of our algorithm.

Cite

CITATION STYLE

APA

Li, L., Feng, X., Shao, H., & Li, J. (2018). Source selection for inconsistency detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10828 LNCS, pp. 370–385). Springer Verlag. https://doi.org/10.1007/978-3-319-91458-9_22

Source selection for inconsistency detection

Abstract

Cite

Register to see more suggestions