Sign up & Download
Sign in

Janus: from Workflows to Semantic Provenance and Linked Open Data

by Paolo Missier, Satya S Sahoo, Jun Zhao, Carole Goble, Amit Sheth
Life Sciences (2010)

Abstract

Data provenance graphs are form of metadata that can be used to establish a variety of properties of data products that undergo sequences of transformations, typically specified as workflows. Their use- fulness for answering user provenance queries is limited, however, unless the graphs are enhanced with domain-specific annotations. In this paper we propose a model and architecture for semantic, domain-aware prove- nance, and demonstrate its usefulness in answering typical user queries. Furthermore, we discuss the additional benefits and the technical impli- cations of publishing provenance graphs as a form of Linked Data. A prototype implementation of the model is available for data produced by the Taverna workflow system.

Cite this document (BETA)

Available from Paolo Missier's profile on Mendeley.
Page 1
hidden

Janus: from Workflows to Semantic Provenance and Linked Open Data

Janus: from Work
ows to Semantic Provenance
and Linked Open Data
Paolo Missier1, Satya S. Sahoo3, Jun Zhao2, Carole Goble1, and Amit Sheth3
1 School of Computer Science, University of Manchester, UK
fpmissier,caroleg@cs.man.ac.uk
2 Department of Zoology, University of Oxford, UK
jun.zhao@zoo.ox.ac.uk
3 The Kno.e.sis Center, Wright State University, Dayton, OH, USA
fsahoo.2,amit.sheth@wright.edug
Abstract. Data provenance graphs are form of metadata that can be
used to establish a variety of properties of data products that undergo
sequences of transformations, typically speci ed as work
ows. Their use-
fulness for answering user provenance queries is limited, however, unless
the graphs are enhanced with domain-speci c annotations. In this paper
we propose a model and architecture for semantic, domain-aware prove-
nance, and demonstrate its usefulness in answering typical user queries.
Furthermore, we discuss the additional bene ts and the technical impli-
cations of publishing provenance graphs as a form of Linked Data. A
prototype implementation of the model is available for data produced by
the Taverna work
ow system.
1 Introduction
Experimental science increasingly relies upon computational techniques and large-
scale data management to achieve its goals. As with any experimental method,
either manual or automated, an important step of the scienti c process is the
validation of its results. In the case of automated, high-throughput data gen-
eration and transformation pipelines, implemented for example as work
ows,
the complexity of the processes and the volumes of data call for validation pro-
cedures to be automated, too. One of the prominent approaches involves the
analysis of detailed traces of the data transformations that are recorded during
the execution of the data pipeline. These traces are a form of metadata, relative
to the data involved in the process, known as data provenance. The growing
realisation of the importance of this type of metadata for experimental science
has in recent years spurred a wealth of research in provenance acquisition and
analysis [17, 1, 5, 7].
Provenance metadata is structured as a causal graph amongst data elements
as they undergo several transformations through some composition of processes.
The two main strains of research in this area concentrate on (i) provenance
modelling, with the goal of supporting the users' data validation tasks; and (ii)
data architectures for provenance management. The work presented in this pa-
per falls in the former of these two categories. Most of the provenance models
proposed so far, including those just cited, have been focusing on describing
the causal relationships amongst data products, without speci c concern for the
Page 2
hidden
semantic characterisation of those products. We refer to these graphs as domain-
agnostic, as they do not include any reference to domain-speci c terms. In con-
trast, we propose a new semantic model of provenance, embodied by domain-
aware graphs, designed to support data derivation questions that are formulated
by user-scientists using domain-speci c terminology. Fig. 1 clari es the distinc-
tion between the two types of graphs4. The main di erences between Fig. 1(a)
and Fig. 1(b) are the additional semantic annotations shown in the latter. In
this limited example, these are of the form V instance-of C or V has-source
C 0, where V is a value, and C;C 0 are terms in some domain vocabulary, for bi-
ology concepts and biological database resources, respectively. We expect that,
regardless of the speci c formalism chosen to specify these annotations, domain-
aware graphs be useful to answer a broader class of user questions than their
domain-agnostic counterparts (namely those questions that rely upon domain
terms).
(a) Domain-agnostic
provenance graph
(b) Domain-aware provenance graph
Fig. 1. Adding simple annotations to provenance graphs
Taking this idea further, we also note that grounding a provenance model in
the Semantic Web framework presents additionally opportunities for support-
ing an even broader class of user questions. In particular, we explore the idea
of making semantic provenance graphs a part of the broad Web of Data, an
increasingly rich source of interconnected data that is uniformly represented ac-
cording to the principles and conventions of Linked Open Data (LOD) [4]. In
practice, we show how mapping data elements in the graph to equivalent data
that is published elsewhere in the Web of Data, makes it possible for queries to
seamlessly include conditions on properties of the data, which are not explicitly
represented in the graph or its annotations, but are instead associated with their
equivalent external representations.
1.1 Paper scope and contributions
The idea of semantic provenance was rst proposed in [16], but few concrete
examples exist to date of its realisation beyond, for example, [6]. In this paper
we take a concrete step towards the implementation of a semantic provenance
model, code-named Janus, cast speci cally in the context of provenance for data
processed by Life Sciences work
ows. We describe a practical implementation
4 We use an abstract notation that is close to the one adopted in the Open Provenance
Model http://www.openprovenance.org, where data values (the circles) are either
produced or consumed by processes (the squares).
Page 3
hidden
of Janus, which is grounded in the Taverna work
ow model [10] and (domain-
agnostic) provenance model [12], and demonstrate its technical feasibility as
well as its bene ts to users in terms of enhanced query answering capabilities.
The paper o ers the following speci c contributions. Firstly, we set Janus in
the Semantic Web framework, where we de ne its model as an extension of the
Provenir upper ontology for work
ow-based data provenance [15]. In this setting,
Janus consists of a domain-agnostic part, which models essentially the same
entities as the existing Taverna provenance model, and a domain-aware part,
obtained by extending the ontology to include properties and classes like those
shown earlier in Fig 1(b). Secondly, we describe the prototype implementation
of an extension to the current Taverna provenance architecture, which produces
semantic, RDF-based provenance graphs for work
ow runs, that conform to the
Janus ontology.
Thirdly, we show how the RDF provenance graph can be domain-enhanced
by associating semantic types from a variety of public ontologies to some of its
elements. We also discuss how existing semantic annotations on the work
ow
and its composing services, when available, can be automatically propagated to
the provenance graph. We then show how, in this setting, we can answer a class
of user queries that predicate on the domain annotations. Finally, we show on a
practical example how the provenance graph can \blend in" as part of the Web
of Data, and exemplify our approach by mapping data identi ers in the graph
to those in the Bio2RDF project [2], resulting in extended semantic provenance
queries.
1.2 Related work
While provenance data model is a well studied topic [17], the challenge of asso-
ciating domain semantics to it has received relatively little attention. The Open
Provenance Model (OPM) [13] provides the annotation framework to support the
need for adding extra information to provenance entities. However, this frame-
work is not de ned in the current OPM OWL ontology 5. Previous work by Cao
et al. [6] and Zhao et al. [18] experimented with providing semantic annotations
to provenance logs by post-processing, but without a clear data model for ac-
commodating domain-semantics. Such a data model is essential for building a
domain-aware provenance collection architecture that could scale beyond case
studies. In this paper, we extend the Provenir ontology to create the domain-
aware Janus provenance model to address the challenge.
Query frameworks and user-facing visualizations to support a user-oriented
view of provenance can be found in the work by Biton et al. [3] and Howe et
al. [9]. Provenance queries that present information in a more meaningful way to
the domain scientists have been implemented by Cao et al. [6] and McGuinness et
al. [11]. This work takes it further by connecting domain-enhanced provenance
graphs created locally with the global Web of Data in order to expand the
possible semantic provenance queries.
5 http://openprovenance.org/model/opm.owl
Page 4
hidden
2 A concrete example
Our running example consists of a bioinformatics work
ow designed to nd all
known relationships between a speci c region in the mouse genome, known as
a QTL (Quantitative Trait Loci), and the metabolic pathways involving genes
that are present in that region. A schematic representation of the work
ow is
given in Fig. 2(a).6 The work
ow starts by retrieving all the genes known to the
Ensembl public database for a given input region, using the Biomart service.
It then retrieves all metabolic pathways from the KEGG pathways database,
such that at least one of those genes are involved.7 Note that the schematic
representation does not include the many adapter scripts that are required in
reality to accomplish this composite task.
(a) Schematic representation of a
Taverna work
ow
(b) Schematic representation of a
provenance graph for a work
ow run
Fig. 2.
A scientist may want to ask a number of high-level questions regarding the
relationship between the outputs and some of the inputs of a work
ow execution
(\run"). Amongst these, we are going to consider the following two, which can
be expressed in terms of queries on a provenance graph:
1. for each Kegg pathway observed in the work
ow output (or for a speci c
one), nd all genes that are within the input QTL and are involved in that
pathway;
2. amongst all genes that are known to perform a certain biological function,
list those that are involved in a certain pathway.
The terms in italics refer to concepts in the bioinformatics domain, similar to
those in Fig. 1(b). Intuitively, one can answer (1) for a particular run, by travers-
ing a domain-aware provenance graph for that run, like the one sketched in
6 The actual work
ow, too large to be reproduced here, can be found on the myEx-
periment Web site: http://www.myexperiment.org/work
ows/931.
7 Ensembl: www.ensemble.org, Biomart: www.biomart.org/, KEGG:
www.genome.jp/kegg/pathway.html
Page 5
hidden
Fig. 2(b). An output value o for the work
ow depends on some input or inter-
mediate value i, if and only if there is a path from i to o in the graph. Thus,
(1) can be reduced to a query that nds all pairs (i; o) such that o is of type
pathway, i is of type gene, and there is a path from i to o. In Sec. 3.3 we show
how our proposed semantic provenance framework supports this query.
The graph, however, is not sucient to answer question (2), which refers to
the biological function of a gene, a concept that is not included in the semantic
annotations. Our approach in this case is based upon the idea that the genes
that appear in the graph may also be published elsewhere in the broad Web of
Data, where the missing annotations can potentially be found. When this is the
case, one can formulate a hybrid query that (i) retrieves the biological functions
of all the genes that appear in the graph, using a Linked Data query, and (ii)
for those genes that satisfy the condition, nd all paths to the corresponding
pathways in the graph8. We elaborate on this strategy and on its limitations in
Sec. 4, showing in particular how that the gene IDs in the graph can be mapped
to Bio2RDF genes.
3 The Janus Semantic Provenance Infrastructure
The examples from the previous section highlight the need for incorporating
domain semantics as part of the provenance model, to bridge the gap between
the domain-agnostic provenance produced during work
ow execution, and the
users' domain-oriented view of provenance. An expressive provenance model with
well-de ned formal semantics not only enables complex domain-speci c informa-
tion to be modeled, but also facilitates provenance interoperability and supports
reasoning over large sets of provenance information. As mentioned in the in-
troduction, formally Janus is an extension of Provenir, an upper-level reference
OWL DL ontology for provenance modeling designed to be extended to repre-
sent provenance in multiple domains. In turn, Provenir extends concepts from the
well-known Basic Formal Ontology (BFO)9 to de ne a set of provenance terms,
including the three fundamental concepts of data, process, and agent. Provenir
also de nes a set of 11 named relationships amongst classes, including partonomy
relations, temporal information, precedence, and causal relationships, providing
a foundation for the semantic modelling of provenance. As an upper-level ref-
erence model for provenance, Provenir ensures a common modeling approach,
conceptual clarity of provenance terms, and use of design patterns for consistent
provenance modeling.
3.1 Modeling Domain-agnostic Provenance in Janus
The Taverna provenance model de ned in [12] includes both a static and a
dynamic portion. The static portion describes the graph structure of a work-
ow speci cation (processors, processor ports, and data dependencies as links
8 Note however, that there is no guarantee that the gene will be found in the Web of
Data, or that the condition on its external annotations can be evaluated there.
9 http://ontology.bu alo.edu/bfo/
Page 6
hidden
between ports), such as the one in our running example of Fig. 2(a). The dy-
namic portion accounts for multiple invocations of a processor that occur during
work
ow execution, as well as for the binding of actual values to the processors
ports. In the rst step of the design, we model the existing Taverna provenance
model as an OWL ontology. As illustrated in Fig. 3, the classes in the static por-
tion (janus:workflow spec, janus:processor spec, and class janus:port)
extend corresponding Provenir classes and are associated through appropri-
ate properties, for example janus:processor spec provenir:has parameter
janus:port. Note that data links in the work
ow are modelled using the link from
property from port onto itself. Individuals in these classes include the work-
Fig. 3. Domain-aware Janus as an extension of Provenir
ow itself (gene pathway workflow), its processors (eg. genes in qtl), and the
processors' ports (eg. qtl end position, chromosome name). In turn, these in-
dividuals may be related to one or more run-time counterparts in the dynamic
portion of the ontology through object property has execution:
dom(has execution) = workflow spec or processor spec
range(has execution) = workflow exec or processor exec
and has value binding:
dom(has value binding) = port; range(has value binding) = port value
Page 7
hidden
3.2 Modeling Semantic Provenance in Janus
We now describe the Janus extension to include domain-speci c terms. A vari-
ety of scienti c communities are creating ontologies to model domain knowledge,
for example the National Center for Biomedical Ontologies (NCBO)10 currently
lists 166 publicly available ontologies in the Life Sciences domain. To model
semantic provenance in Janus, we re-use the classes de ned in four public on-
tologies listed at NCBO, namely the BioPAX, National Cancer Institute The-
saurus, Foundational Model of Anatomy (FMA), and the Sequence ontologies,
while the fth ontology, OWL Time11 is available from the W3C. This reuse
strategy facilitates the interoperability of Janus-conformant provenance graphs
with large public datasets. For example, these graph can be easily linked to the
KEGG, Reactome, and BioCyc databases, which currently make their biological
pathway datasets available as BioPAX-conformant RDF datasets.
These extensions are used to annotate both work
ow processors and their
ports. For example, the three input ports for our example work
ow: chromosome name,
start position and end position, are annotated with concepts so:chromosome
and so:base pair, respectively, where the so pre x denotes the NCBO-listed
Sequence Ontology. Similarly, ports that denote proteins and pathways are an-
notated using terms fma:protein and biopax:pathway, from the FMA and the
BioPax ontology, respectively. In general, semantic types are associated to ports
in an extensible way through the generic has value type property, according to
the following pattern:
dom(has value type) = port; range(has value type) = domain entity
BioPax:pathway owl:subClassOf domain entity
FMA:protein owl:subClassOf domain entity
For each work
ow run, the Taverna provenance component produces domain-
agnostic provenance in the form of an RDF graph that conforms to the
Janus ontology just described, i.e., it contains RDF statements of the form
N rdf:type C, where N is a node in the provenance graph and C is some
Janus concept. The semantic annotation of these graphs assumes that the work-
ow speci cation is itself semantically annotated, and it involves automatically
propagating those annotations, rst to the static portion of the provenance
graph, and then to the dynamic portion. Statically annotating the work
ows
prior to their execution is a realistic proposition. While this may involve a man-
ual curation process, typical work
ows never include more than a handful of
services, and furthermore, in the long run one can assume that these annota-
tions will be available through a registry that describes the services that compose
the work
ow (Taverna work
ows essentially specify Web service compositions).
The BioCatalogue registry for Life Science services12, for example, is set out to
provide semantic annotations for hundreds of services, and these annotations
carry over to the work
ows where the services are invoked.
10 http://www.bioontology.org/
11 http://www.w3.org/TR/owl-time
12 http://www.biocatalogue.org
Page 8
hidden
The propagation of work
ow annotations to the provenance graph is fairly
straightforward. Firstly, consider a static work
ow element, say port X =
chromosome name, annotated with concept C = so:chromosome in the work-
ow (using any available formalism). In the provenance graph this is expressed
using the pattern:
X rdf:type Port; C = fcg; X has value type c
for example:
chromosome name rdf:type Port; so:chromosome = fsingleton chromosomeg
chromosome name has value type singleton chromosome
Secondly, the annotations on a port carry over to each of the values that are
bound to that port13, using a collection of inference rules like the following:
X rdf:type Port C = fcg X has value type c
X has value v v rdf:type PortValue
v rdf:type C
This rule asserts the value v as an individual of the Janus class C. The set of
rules accounts for various annotations, for example the following rule:
X rdf:type Port X has source S X has value v v rdf:type PortValue
v has source S
annotates v with the data source of the port (for instance, the KEGG database).
As a proof of concept, the Janus ontology currently models the semantic
provenance terms that are adequate for representing the domain semantics of
our example work
ow, using less than 30 classes and properties with a DL ex-
pressivity of ALCH(D). Many of the classes, for example to model collection data
structures, have not been described as they are less relevant to our discussion in
this paper. In the future, we plan to extend Janus with the domain terms used
to annotate the default set of services in the Taverna release version.
3.3 Provenance Query Infrastructure for Janus
We now describe the Janus query infrastructure that has been implemented to
support the example provenance queries discussed in Sec. 2. The query infras-
tructure is implemented using the open source Jena ARQ tool14, and supports
provenance queries expressed in the SPARQL query language [14]. We com-
posed the SPARQL query pattern corresponding to the example query (1) from
Sec. 2: \Find all the QTL genes that are involved in KEGG pathways". The
SPARQL query pattern rst identi es port values that are individuals of class
biopax:pathway and are linked to values \KEGG", which are themselves indi-
viduals of class NCI:Data Sources, through property has source. In the next
step, the query pattern traverses the property has value binding between a
13 This assumes that the ports are strongly typed, i.e., that all values bound to the
port have the same type as the port.
14 http://jena.sourceforge.net/ARQ/
Page 9
hidden
port and a port value, followed by traversal of the property links from be-
tween individuals of class port, until it reaches individuals of class so:base pair
that represent the result QTL genes (the second provenance query proposed in
Sec. 2: \Find pathways that contain genes with speci c functions," is discussed
in the next section).
Provenance queries typically involve a recursive traversal of the graph to
compute a transitive closure, namely over the links from property. We had
two options for implementing the transitive closure function, namely a func-
tion that is tightly coupled to the RDF data store implementation, or a generic
module that can be used with any RDF data store. We chose a generic imple-
mentation using the SPARQL ASK function, which allows the provenance query
infrastructure to be used over multiple RDF stores. The SPARQL ASK function
allows \application to test whether or not a query pattern has a solution," [14]
without returning a result set or graph. The transitive closure functions starts
with the port instance linked to the input value and then recursively expands
the SPARQL query expression using the ASK function until a false value is
returned. The SPARQL ASK function, in contrast to the SELECT and CON-
STRUCT functions, does not bind the results of the query to variables in the
query pattern, and is therefore a low-overhead function for computing transitive
closures.
4 Taverna provenance and Linked Data
So far we have shown how the domain-aware extensions to Janus enable answer-
ing domain-speci c semantic provenance queries. In this section we describe how
we can, in addition, also use these semantic annotations to link Janus-compliant
provenance graphs to the open Web of Data in order to expand the range of
supported domain provenance queries.
4.1 Publishing Taverna Provenance as Linked Data
Because Janus provenance is already available as RDF graphs, we only need to
make these graphs Linked data-compliant and accessible on the Web. This means
that 1) each Janus entity URI should be derefenceable, and 2) wherever possible,
the data URIs under the Janus namespace should be mapped to other linked
data URIs on the Web. We use existing Linked Data publication tools, namely
Pubby 15, to implement the rst step. In order to connect Janus graphs with
LOD we create rdfs:seeAlso links between Janus data URIs and Bio2RDF [2]
data URIs. We use Bio2RDF data URIs because Bio2RDF is one of the earliest
linked datasets and it is regarded as a nucleus of the Life Science datasets. Using
the semantic annotations associated with Janus provenance, we de ne a set of
rules for the identity mapping. Given a Janus data item di with value value(di),
its mapping Bio2RDF URI URI(di) is determined by the type of di and the
data source where di comes from, according to the following rules:
{ IF isType(di) == Gene AND isSource(di) == Entrez THEN
 URI(di) = http://bio2rdf.org/geneid: + value(di)
{ IF isType(di) == Gene AND isSource(di) == UniProt THEN
15 http://www4.wiwiss.fu-berlin.de/pubby/
Page 10
hidden
 URI(di) = http://bio2rdf.org/uniprot: + value(di)
{ IF isType(di) == Gene AND isSource(di) == KEGG THEN
 URI(di) = http://bio2rdf.org/kegg: + value(di)
{ IF isType(di) == Pathway AND isSource(di) == KEGG THEN
 URI(di) = http://bio2rdf.org/path: + value(di)
Fig. 4. Semantic provenance for Taverna in the Linked Data context
4.2 Consuming Taverna Provenance as Linked Data
As mentioned, creating Janus Linked Data provenance that is connected to
Bio2RDF makes the provenance graphs an integral part of the Web of Life Sci-
ence data (see Figure 4). This opens the provenance graph to queries that run on
the Web of Data. Furthermore, provenance graphs that are created during di er-
ent work
ow runs are now indirectly, and automatically connected through their
common external data URIs, thus supporting queries that span across multiple
runs.
To demonstrate this, we show how we can support a semantic provenance
query that requires access to both Janus and the various Bio2RDF repositories,
by executing a single SPARQL query against SQUIN [8], a Linked Data query
engine. Instead of having to write separate SPARQL queries against each indi-
vidual data source, SQUIN allows us to treat the whole Web of Data as one single
data space. It is a query engine that applies the \follow your nose" principle of
Linked Data: it traverses the whole Web of Data to retrieve all relevant data
sources for a query by taking the URIs in the query and those in the interme-
diate results, following links of these URIs to other data sources, and applying
the querying graph pattern to the intermediate result space in order to obtain
relevant results.
Our example query below searches for work
ow data products that are En-
trez genes encoding proteins involved in ATP binding (go:0005524). The domain
knowledge about the genes is drawn from two Bio2RDF data repositories and
the knowledge about which Taverna data products are Entrez genes comes of the
domain-enhanced Janus provenance. This simple SPARQL query needs access to
Page 11
hidden
at least three linked datasets. SQUIN query engine allows us to write one single
query against these multiple data sources. The result will return data products
from any work
ow runs that are Entrez genes related to this biological process.
We can then use semantic provenance queries similar to the one presented in
Sec. 3.3 to search for KEGG pathways that contain these genes.
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX : <http://www.taverna.org.uk/janus#>
SELECT distinct ?entrezgene
WHERE {
?protein uniprot:classifiedWith <http://bio2rdf.org/go:0005524> .
?entrezgene <http://bio2rdf.org/bio2rdf_resource:xPath> ?protein .
?gene rdfs:seeAlso ?entrezgene
?gene a :port_gene
?gene :has_source :entrez_gene . }
This example shows that drawing on the domain knowledge from the Linked
Data cloud enables us to extend the kind of domain-level provenance queries
that we can implement that are more meaningful to the scientists. Finding spe-
ci c KEGG pathways that are related to genes of interesting functions will help
scientists quickly identify potential pathways from hundreds of experiment re-
sults. The above example query could enable scientists to quickly identify the
presence of pathways that consistently exist in di erent experimentations, in-
cluding those that were conducted by the scientists themselves.
5 Conclusions and further work
We have presented a semantic provenance model for work
ow data, called Janus,
and a prototype implementation for the Taverna work
ow system and prove-
nance model. The implementation demonstrates the bene ts of collecting se-
mantic provenance, by showing exemplars semantic provenance queries that can
now be answered by the system.
The main objection that is normally raised in connection with semantic an-
notations, is the annotation cost. We have noted, in Sec. 3, that the annotation
e ort is actually limited to the work
ow speci cation, and indeed, possibly just
to the services used in the work
ow, when those are annotated once and for all
as part of the service registry curation process. In turn, this observation provides
additional motivation for the development of registries like Biocatalogue.
Our investigation into the idea of publishing provenance graphs as Linked
Data is still preliminary and requires additional insight. For instance, the simple
rules used to link Janus provenance with the Web of Data do not consider the
possibility that the work
ow and Bio2RDF refer to di erent copies of the same
database. Also, some of the mapping Bio2RDF URIs might not exist at all or are
actually linked to mismatching data entities, and the precision of the mapping
between Janus and Bio2RDF data URIs needs to be evaluated. Finally, we plan
to conduct a user assessment as a way to establish the perceived value of semantic
provenance from the users' perspective.
Page 12
hidden
References
1. R S Barga and L A Digiampietri. Automatic capture and ecient storage of
e-Science experiment provenance. Concurrency and Computation: Practice and
Experience, 20:419{429, 2008.
2. F. Belleau, M.A. Nolin, N. Tourigny, P. Rigault, and J. Morissette. Bio2RDF: To-
wards a Mashup to Build Bioinformatics Knowledge Systems. Journal of Biomed-
ical Informatics, 41:706{716, 2008.
3. O Biton, S Cohen-Boulakia, and S B Davidson. Zoom*UserViews: Querying Rel-
evant Provenance in Work
ow Systems. In VLDB, pages 1366{1369, 2007.
4. Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked Data - The Story So
Far. Int. Journal on Semantic Web and Information Systems, Special Issue on
Linked Data, 2009. in press.
5. S Bowers, T M McPhillips, and B Ludascher. Provenance in collection-oriented sci-
enti c work
ows. Concurrency and Computation: Practice and Experience, 20:519{
529, 2008.
6. B Cao, B Plale, G Subramanian, P Missier, C Goble, and Y Simmhan. Semantically
Annotated Provenance in the Life Science Grid. In Juliana Freire, Paolo Missier,
and Satya S. Sahoo, editors, 1st International Workshop on the Role of Semantic
Web in Provenance Management. CEUR Proceedings, 2009.
7. S. Davidson and J. Freire. Provenance and scienti c work
ows: challenges and
opportunities. In Procs. SIGMOD Conference, Tutorial, pages 1345{1350, 2008.
8. O. Hartig, C. Bizer, and J.C. Freytag. Executing SPARQL queries over the web
of linked data. In Procs ISWC, pages 293{309, Washington D.C., USA, 2009.
9. B. Howe, P. Lawson, R. Bellinger, E. Anderson, E. Santos, J. Freire, C. Scheideg-
ger, A. Baptista, and C. Silva. End-to-end escience: Integrating work
ow, query,
visualization, and provenance at an ocean observatory. In Procs Fourth IEEE
International Conference on eScience, pages 127{134, 2008.
10. D Hull, K Wolstencroft, R Stevens, C A Goble, M R Pocock, P Li, and T Oinn.
Taverna: a tool for building and running work
ows of services. Nucleic Acids
Research, 34:729{732, 2006.
11. D. L. McGuinness, P. Fox, P. Pinheiro da Silva, S. Zednik, N. Del Rio, Li Ding,
P. West, and C. Chang. Annotating and embedding provenance in science data
repositories to enable next generation science applications. In American Geophys-
ical Union, Fall Meeting (AGU2008), Eos Trans. AGU, 89(53), Fall Meet. Suppl.,
Abstract IN11C-1052, 2008.
12. P. Missier, N. Paton, and K. Belhajjame. Fine-grained and ecient lineage query-
ing of collection-based work
ow provenance. In Procs. EDBT, Lausanne, Switzer-
land, 2010.
13. Luc Moreau, editor. The Open Provenance Model v.1.1. 2009.
14. E Prud'ommeaux Seaborne, A. SPARQL Query Language for RDF. W3C Recom-
mendation, 2008.
15. S. Sahoo and A. Sheth. Provenir ontology: Towards a Framework for eScience
Provenance Management, 2009.
16. S S Sahoo, A Sheth, and C Henson. Semantic provenance for eScience: Managing
the deluge of scienti c data. IEEE Internet Computing, 12:46{54, 2008.
17. Y Simmhan, B Plale, and D Gannon. A survey of data provenance in e-science.
SIGMOD Record, 34:31{36, 2005.
18. J Zhao, C Wroe, C Goble, R Stevens, D Quan, and M Greenwood. Using Semantic
Web Technologies for Representing e-Science Provenance. In Procs ISWC, LNCS,
pages 92{106, Hiroshima, Japan, November 2004. Springer-Verlag.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

11 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
27% Researcher (at an Academic Institution)
 
27% Ph.D. Student
 
18% Assistant Professor
by Country
 
18% Portugal
 
9% Netherlands
 
9% Germany