Automatic discovery of high-level provenance using semantic similarity

Tom De Nies; Sam Coppens; Davy Van Deursen; Erik Mannens; Rik Van De Walle

Conference ProceedingsOPEN ACCESS

Automatic discovery of high-level provenance using semantic similarity

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7525 LNCS 97-110

DOI: 10.1007/978-3-642-34222-6_8

18Citations

26Readers

Abstract

As interest in provenance grows among the Semantic Web community, it is recognized as a useful tool across many domains. However, existing automatic provenance collection techniques are not universally applicable. Most existing methods either rely on (low-level) observed provenance, or require that the user discloses formal workflows. In this paper, we propose a new approach for automatic discovery of provenance, at multiple levels of granularity. To accomplish this, we detect entity derivations, relying on clustering algorithms, linked data and semantic similarity. The resulting derivations are structured in compliance with the Provenance Data Model (PROV-DM). While the proposed approach is purposely kept general, allowing adaptation in many use cases, we provide an implementation for one of these use cases, namely discovering the sources of news articles. With this implementation, we were able to detect 73% of the original sources of 410 news stories, at 68% precision. Lastly, we discuss possible improvements and future work. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

De Nies, T., Coppens, S., Van Deursen, D., Mannens, E., & Van De Walle, R. (2012). Automatic discovery of high-level provenance using semantic similarity. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7525 LNCS, pp. 97–110). https://doi.org/10.1007/978-3-642-34222-6_8

Automatic discovery of high-level provenance using semantic similarity

Abstract

Author supplied keywords

Cite

Register to see more suggestions