Sign up & Download
Sign in

Seamless Provenance Representation and Use in Collaborative Science Scenarios (Abstract)

by Paolo Missier, Bertram Ludascher, Shawn Bowers, Manish Kumar Anand, Ilkay Altintas, Saumen Dey, Anandarup Sarkar, Biva Shrestha, Carole Goble show all authors
AGU Fall Meeting (2010)

Abstract

Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can stitch together traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.

Cite this document (BETA)

Available from Paolo Missier's profile on Mendeley.
Page 1
hidden

Seamless Provenance Representation and Use in Collaborative Science Scenarios (Abstract)



Print
Submitted
on September 01, 11:11 AM
for agu-fm10
Paolo&nbspMissier Paid: $60.00, Transaction #:&nbsp015172
Credit Card Type:&nbspVisa
Credit Card Number:&nbspxxxxxxxxxxxx1017

Your abstract appears below.
Please print a copy of this page for your records.
To return to the Submission Center and check your list of submissions; click "View Submissions" in the left menu.

Proof
CONTROL ID: 959926
TITLE: Seamless Provenance Representation and Use in Collaborative Science Scenarios
PRESENTATION TYPE: Assigned by Committee (Oral or Poster)
CURRENT SECTION/FOCUS GROUP: Earth and Space Science Informatics (IN)
CURRENT SESSION: IN02. Enabling and Encouraging Transparency in Science Data
AUTHORS (FIRST NAME, LAST NAME): Paolo Missier
1
, Bertram Ludaescher
2
, Shawn Bowers
3
, Ilkay
Altintas
4
, Manish Kumar Anand
4
, Saumen Dey
2
, Anandarup Sarkar
2
, Biva Shrestha
5
, Carole Goble
1
INSTITUTIONS (ALL): 1. School of Computer Science, University of Manchester, Manchester, United
Kingdom.
2. Dept. of Computer Science, University of California, Davis, CA, United States.
3. Dept. of Computer Science, Gonzaga University, Spokane, WA, United States.
4. San Diego Supercomputer Center, University of California, San Diego, CA, United States.
5. Dept. of Computer Science, Appalachian State University, Boone, NC, United States.
Title of Team:
ABSTRACT BODY: The notion of sharing scientific data has only recently begun to gain ground in science,
where data is still considered a private asset. There is growing evidence, however, that the benefits of
scientific collaboration through early data sharing during the course of a science project may outgrow the risk
of losing exclusive ownership of the data. As exemplar success stories are making the headlines[1], principles
of effective information sharing have become the subject of e-science research. In particular, any piece of
published data should be self-describing, to the extent necessary for consumers to determine its suitability for
reuse in their own projects. This is accomplished by associating a body of formally specified and machine-
processable metadata to the data.
When data is produced and reused by independent groups, however, metadata interoperability issues
emerge. This is the case for provenance, a form of metadata that describes the history of a data product, Y.
Provenance is typically expressed as a graph-structured set of dependencies that account for the sequence of
computational or interactive steps that led to Y, often starting from some primary, observational data.
Traversing dependency graphs is one of the mechanisms used to answer questions on data reliability.
In the context of the NSF DataONE project[2], we have been studying issues of provenance interoperability in
scientific collaboration scenarios. Consider a first scientist, Alice, who publishes a data product X along with
Page 2
hidden
its provenance, and a second scientist who further transforms X into a new product Y, also along with its
provenance. A third scientist, who is interested in Y, expects to be able to trace Y's history up to the inputs
used by Alice. This is only possible, however, if provenance accumulates into a single, uniform graph that can
be seamlessly traversed. This becomes problematic when provenance is captured using different tools and
computational models (i.e. workflow systems), as well as when data is published and reused using
mechanisms that are not provenance-aware.
In this presentation we discuss requirements for ensuring provenance-aware data publishing and reuse, and
describe the design and implementation of a prototype toolkit that involves two specific, and broadly used,
workflow models, Kepler [3] and Taverna [4]. The implementation is expected to be adopted as part of
DataONE's investigators' toolkit, in support of its mission of large-scale data preservation.
Refs.
[1]Sharing of Data Leads to Progress on Alzheimer’s, G. Kolata, NYT, 8/12/2010
[2]http://www.dataone.org
[3]Ludaescher B., Altintas I. et al. Scientific Workflow Management and the Kepler System. Special Issue:
Workflow in Grid Systems. Concurrency and Computation: Practice & Experience 18(10): 1039-1065, 2006
[4]D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, T. Oinn. Taverna: a tool for building and
running workflows of services. Nucl. Acids Res. 34: W729-W732, 2006
INDEX TERMS: [1948] INFORMATICS / Metadata: Provenance, [1998] INFORMATICS / Workflow.
(No Table Selected)
(No Image Selected)
Sponsor
SPONSOR NAME: Bruce Wilson
SPONSOR EMAIL ADDRESS: wilsonbe@ornl.gov
SPONSOR MEMBER ID: 10549509
Additional Details
Previously Presented Material:
Scheduling Request:
ScholarOne Abstracts® (patent #7,257,767 and #7,263,655). © ScholarOne, Inc., 2010. All Rights Reserved.
ScholarOne Abstracts and ScholarOne are registered trademarks of ScholarOne, Inc.
Terms and Conditions of Use

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

3 Readers on Mendeley
by Discipline
 
by Academic Status
 
67% Researcher (at an Academic Institution)
 
33% Professor
by Country
 
67% United States
 
33% Germany