Quality assessment of linked datasets using probabilistic approximation

Jeremy Debattista; Santiago Londoño; Christoph Lange; Sören Auer

Conference ProceedingsOPEN ACCESS

Quality assessment of linked datasets using probabilistic approximation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9088 221-236

DOI: 10.1007/978-3-319-18818-8_14

12Citations

27Readers

Abstract

With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.

Author supplied keywords

Cite

CITATION STYLE

APA

Debattista, J., Londoño, S., Lange, C., & Auer, S. (2015). Quality assessment of linked datasets using probabilistic approximation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9088, pp. 221–236). Springer Verlag. https://doi.org/10.1007/978-3-319-18818-8_14

Quality assessment of linked datasets using probabilistic approximation

Abstract

Author supplied keywords

Cite

Register to see more suggestions