Differencing provenance in scientific workflows

53Citations
Citations of this article
67Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Scientific workflow management systems are increasingly providing the ability to manage and query the provenance of data products. However, the problem of differencing the provenance of two data products produced by executions of the same specification has not been adequately addressed. Although this problem is NP-hard for general workflow specifications, an analysis of real scientific (and business) workflows shows that their specifications can be captured as series-parallel graphs overlaid with well-nested forking and looping. For this natural restriction, we present efficient, polynomial-time algorithms for differencing executions of the same specification and thereby understanding the difference in the provenance of their data products. We then describe a prototype called PDiffView built around our differencing algorithm. Experimental results demonstrate the scalability of our approach using collected, real workflows and increasingly complex runs. © 2009 IEEE.

Cite

CITATION STYLE

APA

Bao, Z., Cohen-Boulakia, S., Davidson, S. B., Eyal, A., & Khanna, S. (2009). Differencing provenance in scientific workflows. In Proceedings - International Conference on Data Engineering (pp. 808–819). https://doi.org/10.1109/ICDE.2009.103

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free