Differencing provenance in scientific workflows

Zhuowei Bao; Sarah Cohen-Boulakia; Susan B. Davidson; Anat Eyal; Sanjeev Khanna

Conference Proceedings

Differencing provenance in scientific workflows

Proceedings - International Conference on Data Engineering (2009) 808-819

DOI: 10.1109/ICDE.2009.103

53Citations

67Readers

Get full text

Abstract

Scientific workflow management systems are increasingly providing the ability to manage and query the provenance of data products. However, the problem of differencing the provenance of two data products produced by executions of the same specification has not been adequately addressed. Although this problem is NP-hard for general workflow specifications, an analysis of real scientific (and business) workflows shows that their specifications can be captured as series-parallel graphs overlaid with well-nested forking and looping. For this natural restriction, we present efficient, polynomial-time algorithms for differencing executions of the same specification and thereby understanding the difference in the provenance of their data products. We then describe a prototype called PDiffView built around our differencing algorithm. Experimental results demonstrate the scalability of our approach using collected, real workflows and increasingly complex runs. © 2009 IEEE.

Cite

CITATION STYLE

APA

Bao, Z., Cohen-Boulakia, S., Davidson, S. B., Eyal, A., & Khanna, S. (2009). Differencing provenance in scientific workflows. In Proceedings - International Conference on Data Engineering (pp. 808–819). https://doi.org/10.1109/ICDE.2009.103

Differencing provenance in scientific workflows

Abstract

Cite

Register to see more suggestions