Project histories: Managing data provenance across collection-oriented scientific workflow runs

Shawn Bowers; Timothy McPhillips; Martin Wu; Bertram Ludäscher

Conference Proceedings

Project histories: Managing data provenance across collection-oriented scientific workflow runs

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4544 LNBI 122-138

DOI: 10.1007/978-3-540-73255-6_12

18Citations

30Readers

Get full text

Abstract

While a number of scientific workflow systems support data provenance, they primarily focus on collecting and querying provenance for single workflow runs. Scientific research projects, however, typically involve (1) many interrelated workflows (where data from one or more workflow runs are selected and used as input to subsequent runs) and (2) tasks between workflow runs that cannot be fully automated. This paper addresses the need for recording data dependencies across multiple workflow runs and accommodating data management activities performed between runs. We define a new conceptual model for representing project-level provenance based on the notion of project histories and folders, and describe mechanisms to support this model in the collection-oriented modeling and design framework of KEPLER. Our approach allows users to conveniently organize their projects and data using the familiar folder-hierarchy metaphor, while at the same time integrating this information with detailed provenance of data products generated via automated scientific workflows. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Bowers, S., McPhillips, T., Wu, M., & Ludäscher, B. (2007). Project histories: Managing data provenance across collection-oriented scientific workflow runs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4544 LNBI, pp. 122–138). Springer Verlag. https://doi.org/10.1007/978-3-540-73255-6_12

Project histories: Managing data provenance across collection-oriented scientific workflow runs

Abstract

Cite

Register to see more suggestions