Bridging workflow and data provenance using strong links

25Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As scientists continue to migrate their work to computational methods, it is important to track not only the steps involved in the computation but also the data consumed and produced. While this provenance information can be captured, in existing approaches, it often contains only weak references between data and provenance. When data files or provenance are moved or modified, it can be difficult to find the data associated with the provenance or to find the provenance associated with the data. We propose a persistent storage mechanism that manages input, intermediate, and output data files, strengthening the links between provenance and data. This mechanism provides better support for reproducibility because it ensures the data referenced in provenance information can be readily located. Another important benefit of such management is that it allows caching of intermediate data which can then be shared with other users. We present an implemented infrastructure for managing data in a provenance-aware manner and demonstrate its application in scientific projects. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Koop, D., Santos, E., Bauer, B., Troyer, M., Freire, J., & Silva, C. T. (2010). Bridging workflow and data provenance using strong links. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6187 LNCS, pp. 397–415). https://doi.org/10.1007/978-3-642-13818-8_28

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free