Deciding semantic matching of stateless services
Computer (2006)
- ISBN: 9781577352815
Available from www.aaai.org
or
Abstract
We present a novel approach to describe and reason about stateless information processing services. It can be seen as an extension of standard descriptions which makes explicit the relationship between inputs and outputs and takes into account OWL ontologies to fix the meaning of the terms used in a service description. This allows us to define a notion of matching between ser- vices which yields high precision and recall for service location. We explain why matching is decidable, and provide biomedical example services to illustrate the utility of our approach.
Available from www.aaai.org
Page 1
Deciding semantic matching of stateless services
Deciding Semantic Matching of Stateless Services∗
Duncan Hull†, Evgeny Zolin†, Andrey Bovykin‡, Ian Horrocks†, Ulrike Sattler†, and Robert Stevens†
† School of Computer Science, ‡ Department of Computer Science,
University of Manchester, UK University of Liverpool, UK
Abstract
We present a novel approach to describe and reason
about stateless information processing services. It can
be seen as an extension of standard descriptions which
makes explicit the relationship between inputs and out-
puts and takes into account OWL ontologies to fix the
meaning of the terms used in a service description. This
allows us to define a notion of matching between ser-
vices which yields high precision and recall for service
location. We explain why matching is decidable, and
provide biomedical example services to illustrate the
utility of our approach.
Introduction
Understanding the data generated from genome sequenc-
ing projects like the Human Genome Project is recognised
as a “grand challenge” both for Computer Science and
Biomedicine (Sleep 2004; Collins et al. 2003). Many of
the tools and databases for analysing this data are avail-
able via Web Service interfaces, thereby allowing biomed-
ical scientists to use the Web as a platform to perform so-
called in silico experiments. Large numbers of these exper-
iments are carried out by choosing some of these Web Ser-
vices, composing them into a workflow, and running them
(Stevens et al. 2004; Hull et al. 2006)—an approach which
shows considerable promise for molecular biology (Stein
2002) whilst challenging current Web Service approaches.
In contrast to other application areas, the structure of these
workflows is mostly determined by the biologist design-
ing the experiment, who also has a clear picture in mind
of the kind of services he or she wants to use in each ex-
perimental step. Moreover, a variety of domain ontologies
exist which capture the knowledge of biologists, see, e.g.,
http://obo.sourceforge.net/. Finally, the majority
of these services are stateless, i.e., they provide informa-
tion, but do not change the state of the world—apart from
the knowledge of the biologist running the service, which
we will ignore here. We restrict our attention to these kinds
of services since they are quite common yet easier to rep-
resent, and since it turned out that defining a semantics or
∗This work is supported by EPSRC grants GR/R67743/01 and
GR/S63168/01
Copyright c© 2006, American Association for Artificial Intelli-
gence (www.aaai.org). All rights reserved.
specifying automated reasoning algorithms for such stateful
service descriptions is basically impossible in the presence
of any expressive ontology (Baader et al. 2005). Stateless-
ness implies that we do not need to formulate pre- and post-
conditions since our services do not change the world.
The question we are interested in here is how to help the
biologist to find a service he or she is looking for, i.e., a
service that works with inputs and outputs the biologist can
provide/accept, and that provides the required functionality.
The growing number of publicly available biomedical web
services, 3 000 as of February 2006, required better match-
ing techniques to locate services. Thus, we are concerned
with the question of how to describe a service request Q
and service advertisements Si such that the notion of a ser-
vice S matching the request Q can be defined in a “useful”
way. By useful, we mean the following: (1, precision) only
those services should match the request that indeed provide
the requested functionality; (2, recall) all services providing
the requested functionality should match the request; (3) ser-
vice advertisements and requests should be formulated using
terms from existing (OWL) ontologies; and (4) such that the
matching problem can be decided automatically.
Let us illustrate the first three points using a simple ex-
ample from (Martin et al. 2004); realistic examples from
molecular biology are discussed later. Consider a service
provider who advertises a service S1 with an input of type
GeoRegion, an output of type Wine, and which returns a list
of wines produced in the region with which it was called.
Moreover, consider a service S2 which has the same input
and output, but S2 returns a list of wines that are sold in
the region with which it was called. Now, if the types of
a service’s input and output are the only information avail-
able to match a request to a service, then no matching algo-
rithm can distinguish between S1 and S2: they have iden-
tical types, and thus matching cannot be precise—see (1)
above. That is, S1 will be matched to a request whenever
S2 is, regardless of whether the request Q is for a service
that returns wines produced or sold in this region, i.e., re-
gardless of the required functionality of the service. Next,
assume that a user requests a service that takes, as input,
a FrenchGeoRegion and returns a list of FrenchWines that
are produced in this region. Even though our service S1
returns, in general, wines that may not be FrenchWines, it
returns only FrenchWines when called with a FrenchGeoRe-
1319
Duncan Hull†, Evgeny Zolin†, Andrey Bovykin‡, Ian Horrocks†, Ulrike Sattler†, and Robert Stevens†
† School of Computer Science, ‡ Department of Computer Science,
University of Manchester, UK University of Liverpool, UK
Abstract
We present a novel approach to describe and reason
about stateless information processing services. It can
be seen as an extension of standard descriptions which
makes explicit the relationship between inputs and out-
puts and takes into account OWL ontologies to fix the
meaning of the terms used in a service description. This
allows us to define a notion of matching between ser-
vices which yields high precision and recall for service
location. We explain why matching is decidable, and
provide biomedical example services to illustrate the
utility of our approach.
Introduction
Understanding the data generated from genome sequenc-
ing projects like the Human Genome Project is recognised
as a “grand challenge” both for Computer Science and
Biomedicine (Sleep 2004; Collins et al. 2003). Many of
the tools and databases for analysing this data are avail-
able via Web Service interfaces, thereby allowing biomed-
ical scientists to use the Web as a platform to perform so-
called in silico experiments. Large numbers of these exper-
iments are carried out by choosing some of these Web Ser-
vices, composing them into a workflow, and running them
(Stevens et al. 2004; Hull et al. 2006)—an approach which
shows considerable promise for molecular biology (Stein
2002) whilst challenging current Web Service approaches.
In contrast to other application areas, the structure of these
workflows is mostly determined by the biologist design-
ing the experiment, who also has a clear picture in mind
of the kind of services he or she wants to use in each ex-
perimental step. Moreover, a variety of domain ontologies
exist which capture the knowledge of biologists, see, e.g.,
http://obo.sourceforge.net/. Finally, the majority
of these services are stateless, i.e., they provide informa-
tion, but do not change the state of the world—apart from
the knowledge of the biologist running the service, which
we will ignore here. We restrict our attention to these kinds
of services since they are quite common yet easier to rep-
resent, and since it turned out that defining a semantics or
∗This work is supported by EPSRC grants GR/R67743/01 and
GR/S63168/01
Copyright c© 2006, American Association for Artificial Intelli-
gence (www.aaai.org). All rights reserved.
specifying automated reasoning algorithms for such stateful
service descriptions is basically impossible in the presence
of any expressive ontology (Baader et al. 2005). Stateless-
ness implies that we do not need to formulate pre- and post-
conditions since our services do not change the world.
The question we are interested in here is how to help the
biologist to find a service he or she is looking for, i.e., a
service that works with inputs and outputs the biologist can
provide/accept, and that provides the required functionality.
The growing number of publicly available biomedical web
services, 3 000 as of February 2006, required better match-
ing techniques to locate services. Thus, we are concerned
with the question of how to describe a service request Q
and service advertisements Si such that the notion of a ser-
vice S matching the request Q can be defined in a “useful”
way. By useful, we mean the following: (1, precision) only
those services should match the request that indeed provide
the requested functionality; (2, recall) all services providing
the requested functionality should match the request; (3) ser-
vice advertisements and requests should be formulated using
terms from existing (OWL) ontologies; and (4) such that the
matching problem can be decided automatically.
Let us illustrate the first three points using a simple ex-
ample from (Martin et al. 2004); realistic examples from
molecular biology are discussed later. Consider a service
provider who advertises a service S1 with an input of type
GeoRegion, an output of type Wine, and which returns a list
of wines produced in the region with which it was called.
Moreover, consider a service S2 which has the same input
and output, but S2 returns a list of wines that are sold in
the region with which it was called. Now, if the types of
a service’s input and output are the only information avail-
able to match a request to a service, then no matching algo-
rithm can distinguish between S1 and S2: they have iden-
tical types, and thus matching cannot be precise—see (1)
above. That is, S1 will be matched to a request whenever
S2 is, regardless of whether the request Q is for a service
that returns wines produced or sold in this region, i.e., re-
gardless of the required functionality of the service. Next,
assume that a user requests a service that takes, as input,
a FrenchGeoRegion and returns a list of FrenchWines that
are produced in this region. Even though our service S1
returns, in general, wines that may not be FrenchWines, it
returns only FrenchWines when called with a FrenchGeoRe-
1319
Page 2
gion, and thus should be matched to this request—a match-
ing algorithm that does this can clearly be viewed as having
high recall, see (2) above. To determine this match, however,
we need to take into account some background knowledge,
namely that only FrenchWines are produced in a FrenchGe-
oRegion. To see this, consider the case where the request
is for a service that takes a FrenchGeoRegion and returns a
list of FrenchWines that are sold in this region: in this case,
S2 should not be matched since some shops in this region
might sell wines that are not french.1 In general, if a service
description uses terms whose meaning is defined in an ontol-
ogy, this not only enables the reuse of these definitions and
thus makes service descriptions more succinct, but it also al-
lows their semantics to be taken into account for matching.
Another way of describing a service’s functionality
would be by using terms from a fixed vocabulary of
functionalities—such as “sold-wines”and “grown-wines”.
This does not integrate well with a background ontology: for
example, it would not allow for the above mentioned match-
ing of S1 to the request for a service that takes a FrenchGe-
oRegion and returns a list of FrenchWines that are produced
in this region.
In this paper, we will propose a framework to describe
stateless services that takes into account background ontolo-
gies and where matching of services is defined such that it
(a) yields matchings with high precision and recall and (b)
the matching problem is decidable. Moreover, our frame-
work allows to compute automatically, from the description
of atomic services, a description of their composition.
Services as queries
In this section, we present our framework for describing and
matching stateless web services. From a syntactic point of
view, it can be viewed as an extension of the way services
are described in the OWL-S Service Profile (namely, of its
part concerning description of inputs and outputs). From
a semantic viewpoint, the definition of “service matching”
introduced below will allow for service matching with high-
precision and recall.
As mentioned in the introduction, in addition to the types
of inputs and outputs, our service descriptions explicate the
relationship between inputs and outputs. Analysing numer-
ous examples of services—including those in bioinformat-
ics, see Section )—it was observed that the notion of con-
junctive query can be adopted for these purposes. Before
defining this “services as queries” approach, we illustrate
it using our wine service examples. Intuitively, when run
with a GeoRegion g, the produced-wine service S1 returns
all those wines w for which there exists some winegrower f
who produces w and who is located in g. In our framework,
this service can thus be described as follows:
INPUT: g GeoRegion
OUTPUT: w Wine
THERE IS SOME f [WineGrower(f),
LocatedIn(f,g), Produces(f,w)],
where the terms Wine, LocatedIn, etc., are defined in
some ontology. In contrast, the sold-wine service S2 returns
1Obviously, S1 should not be matched since it returns the wrong
wines.
all those wines w for which there exists some shop s who
sells w and who is located in g, and can thus be described as
follows:
INPUT: g GeoRegion
OUTPUT: w Wine
THERE IS SOME s
[Shop(s), LocatedIn(s,g), Sells(s,w)]
Given service descriptions of this kind, matching of
services can be reduced to query containment w.r.t. an
ontology—a task whose decidability and complexity is rel-
atively well understood; see, e.g., (Calvanese et al. 2005;
Calvanese, De Giacomo, & Lenzerini 1998; Horrocks et al.
2000).
Describing services
We assume the reader to be familiar with OWL-DL and
its semantics (Horrocks, Patel-Schneider, & van Harmelen
2003). Throughout this paper, we borrow the term TBox for
a class-level ontology (i.e., a finite set of OWL-DL axioms)
and ABox for a factual ontology (i.e., a finite set of OWL-
DL facts). The union of a TBox T and an ABox A is called
a knowledge base and denoted by KB = 〈T ,A〉. We use
KB |= Ψ to denote the fact that (the description logic or first
order translations of)KB and imply Ψ, i.e., Ψ holds in every
interpretation that satisfies KB.
Definition 1 (Service syntax). Let X be a set of variables.
A service description 〈 ~x : ~X; ~y : ~Y ; Φ(~x , ~y ) 〉 consists of
• a list ~x : ~X = 〈x1:X1, . . . , xm:Xm〉 of pairs of variables
xi from X and classes Xi; this list enumerates input vari-
ables and their “types”;
• a list ~y : ~Y = 〈y1:Y1, . . . , yn:Yn〉 of pairs of variables yj
from X and classes Yj ; this list enumerates output vari-
ables and their “types”;
• a relationship specification Φ(~x , ~y , ~z ) of the form
term1(~x , ~y , ~z ) ∧ . . . ∧ termk(~x , ~y , ~z ),
where each termi(~x , ~y , ~z ) is either an expression of the
form C(w) with C a class or R(w1, w2) with R a prop-
erty, and w, w1, and w2 variables from X that occur in ~x ,
~y , or ~z , or individual names.
In OWL-DL terms, Φ(~x , ~y , ~z ) is a set of
facts of the form Individual(w type(W )) and
Individual(w value(R w1)) over variables, some
of which may occur in the input or in the output. Using
the syntax from Definition 1, our wine services can be
formalised as follows:
S1 = 〈 g: GeoRegion; w: Wine; (WineGrower(f) ∧
LocatedIn(f, g) ∧ Produces(f, w)) 〉
S2 = 〈 g: GeoRegion;w : Wine; (Shop(s) ∧
LocatedIn(s, g) ∧ Sells(s, w)) 〉
We prefer the syntax given in Definition 1 to the one used
informally above since it is shorter; it is clear, however, how
to translate between these two representations, and we will
not go into any further discussion of syntax here. To enhance
readability, for a variable vector ~z = z1, . . . , z`, we use ∃~z
as an abbreviation for ∃z1, . . . , z`. Next, we define what it
means for a service to implement a service description.
1320
ing algorithm that does this can clearly be viewed as having
high recall, see (2) above. To determine this match, however,
we need to take into account some background knowledge,
namely that only FrenchWines are produced in a FrenchGe-
oRegion. To see this, consider the case where the request
is for a service that takes a FrenchGeoRegion and returns a
list of FrenchWines that are sold in this region: in this case,
S2 should not be matched since some shops in this region
might sell wines that are not french.1 In general, if a service
description uses terms whose meaning is defined in an ontol-
ogy, this not only enables the reuse of these definitions and
thus makes service descriptions more succinct, but it also al-
lows their semantics to be taken into account for matching.
Another way of describing a service’s functionality
would be by using terms from a fixed vocabulary of
functionalities—such as “sold-wines”and “grown-wines”.
This does not integrate well with a background ontology: for
example, it would not allow for the above mentioned match-
ing of S1 to the request for a service that takes a FrenchGe-
oRegion and returns a list of FrenchWines that are produced
in this region.
In this paper, we will propose a framework to describe
stateless services that takes into account background ontolo-
gies and where matching of services is defined such that it
(a) yields matchings with high precision and recall and (b)
the matching problem is decidable. Moreover, our frame-
work allows to compute automatically, from the description
of atomic services, a description of their composition.
Services as queries
In this section, we present our framework for describing and
matching stateless web services. From a syntactic point of
view, it can be viewed as an extension of the way services
are described in the OWL-S Service Profile (namely, of its
part concerning description of inputs and outputs). From
a semantic viewpoint, the definition of “service matching”
introduced below will allow for service matching with high-
precision and recall.
As mentioned in the introduction, in addition to the types
of inputs and outputs, our service descriptions explicate the
relationship between inputs and outputs. Analysing numer-
ous examples of services—including those in bioinformat-
ics, see Section )—it was observed that the notion of con-
junctive query can be adopted for these purposes. Before
defining this “services as queries” approach, we illustrate
it using our wine service examples. Intuitively, when run
with a GeoRegion g, the produced-wine service S1 returns
all those wines w for which there exists some winegrower f
who produces w and who is located in g. In our framework,
this service can thus be described as follows:
INPUT: g GeoRegion
OUTPUT: w Wine
THERE IS SOME f [WineGrower(f),
LocatedIn(f,g), Produces(f,w)],
where the terms Wine, LocatedIn, etc., are defined in
some ontology. In contrast, the sold-wine service S2 returns
1Obviously, S1 should not be matched since it returns the wrong
wines.
all those wines w for which there exists some shop s who
sells w and who is located in g, and can thus be described as
follows:
INPUT: g GeoRegion
OUTPUT: w Wine
THERE IS SOME s
[Shop(s), LocatedIn(s,g), Sells(s,w)]
Given service descriptions of this kind, matching of
services can be reduced to query containment w.r.t. an
ontology—a task whose decidability and complexity is rel-
atively well understood; see, e.g., (Calvanese et al. 2005;
Calvanese, De Giacomo, & Lenzerini 1998; Horrocks et al.
2000).
Describing services
We assume the reader to be familiar with OWL-DL and
its semantics (Horrocks, Patel-Schneider, & van Harmelen
2003). Throughout this paper, we borrow the term TBox for
a class-level ontology (i.e., a finite set of OWL-DL axioms)
and ABox for a factual ontology (i.e., a finite set of OWL-
DL facts). The union of a TBox T and an ABox A is called
a knowledge base and denoted by KB = 〈T ,A〉. We use
KB |= Ψ to denote the fact that (the description logic or first
order translations of)KB and imply Ψ, i.e., Ψ holds in every
interpretation that satisfies KB.
Definition 1 (Service syntax). Let X be a set of variables.
A service description 〈 ~x : ~X; ~y : ~Y ; Φ(~x , ~y ) 〉 consists of
• a list ~x : ~X = 〈x1:X1, . . . , xm:Xm〉 of pairs of variables
xi from X and classes Xi; this list enumerates input vari-
ables and their “types”;
• a list ~y : ~Y = 〈y1:Y1, . . . , yn:Yn〉 of pairs of variables yj
from X and classes Yj ; this list enumerates output vari-
ables and their “types”;
• a relationship specification Φ(~x , ~y , ~z ) of the form
term1(~x , ~y , ~z ) ∧ . . . ∧ termk(~x , ~y , ~z ),
where each termi(~x , ~y , ~z ) is either an expression of the
form C(w) with C a class or R(w1, w2) with R a prop-
erty, and w, w1, and w2 variables from X that occur in ~x ,
~y , or ~z , or individual names.
In OWL-DL terms, Φ(~x , ~y , ~z ) is a set of
facts of the form Individual(w type(W )) and
Individual(w value(R w1)) over variables, some
of which may occur in the input or in the output. Using
the syntax from Definition 1, our wine services can be
formalised as follows:
S1 = 〈 g: GeoRegion; w: Wine; (WineGrower(f) ∧
LocatedIn(f, g) ∧ Produces(f, w)) 〉
S2 = 〈 g: GeoRegion;w : Wine; (Shop(s) ∧
LocatedIn(s, g) ∧ Sells(s, w)) 〉
We prefer the syntax given in Definition 1 to the one used
informally above since it is shorter; it is clear, however, how
to translate between these two representations, and we will
not go into any further discussion of syntax here. To enhance
readability, for a variable vector ~z = z1, . . . , z`, we use ∃~z
as an abbreviation for ∃z1, . . . , z`. Next, we define what it
means for a service to implement a service description.
1320
Page 3
Definition 2 (Service semantics). Let T be a TBox and S
a service description as in Definition 1. A service s imple-
ments a service description S over T if, for any ABoxA and
any individuals a1, . . . , am in T and A, if T ,A |= Xi(ai)
for each 1 ≤ i ≤ m, then
1. s accepts ~a = 〈a1, . . . , am〉 as input and
2. when run with ~a as input, it returns the set of all those
tuples of individuals~b = b1, . . . , bn fromA such that
T ,A |= ~Y (~b ) ∧ ∃~z : Φ(~a ,~b , ~z ).
Intuitively, an input ~a must be an instance of ~X w.r.t. the
background ontology, and the service returns as its output
the set of all tuples of objects~b that are instances of ~Y w.r.t.
the ontology and for which we can find some ~z such that ~a ,
~b , and ~z satisfy the condition Φ().
Next, we develop the means of comparing service de-
scriptions, i.e., a notion of one service matching another.
Matching services
Matching is the problem of determining whether a given ser-
vice description S conforms to another service description
Q. Matching algorithms can be used for the location of ser-
vices, and we can think of S as being a service advertise-
ment and of Q as being a service requested by a user. As a
consequence, each definition of matching of services should
express some reasonable conditions for a service S to be
considered as an “appropriate” candidate to be returned by a
search engine to a user who specified a request Q.
As mentioned above, we assume that services are de-
scribed w.r.t. a terminological ontology (TBox) T . We will
first give a formal definition, and then provide explanations.
We use |~x | to denote the length of a vector ~x .
Definition 3. Given two service descriptions:
S = 〈 ~x : ~X; ~y : ~Y ; Φ(~x , ~y , ~u ) 〉,
Q = 〈~z : ~Z; ~w : ~W ; Ψ(~z , ~w ,~v ) 〉,
(1)
with |~x | = m = |~z | and |~y | = n = |~w |, we say that the
service S matches the request Q w.r.t. the TBox T if there
exist two permutations
pi : {1, . . . ,m} → {1, . . . ,m}
ρ : {1, . . . , n} → {1, . . . , n}
such that the following two conditions hold:
(i) T |= Zpi(i) v Xi, for all 1 6 i 6m, i.e., for each input
xi in the advertised service S, we can find a matching
input zpi(i) in the requested service Q such that Zpi(i) is a
(possibly implicit) sub-class of Xi w.r.t. the ontology T .
Intuitively, this means that we can map the inputs from S
to the inputs from Q such that all input data that the user
intends to provide will be accepted by S.
(ii) for any ABox A and any individuals ~a = 〈a1, . . . , am〉,
~b = 〈b1, . . . , bn〉 in the knowledge base KB = 〈T ,A〉, if
KB |= ~Z(~a ), then the equivalence holds:
KB |= ~Y (ρ(~b )) ∧ ∃~u Φ(pi(~a ), ρ(~b ) ~u ) iff
KB |= ~W (~b ) ∧ ∃~v Ψ(~a ,~b , ~v ),
where pi(~a ) and ρ(~b ) are the permutations of ~a and ~b
according to pi and ρ.
Intuitively, this means that, modulo some re-arrangement
of the input and output vectors, the services S and Q re-
turn the same answers on any input that conforms to the
request Q.
Some remarks are in order here. The need to permute
inputs and outputs of Q to “fit” the ones of S is by no
means new—it is present in any reasonable definition of ser-
vice matching. Thus, in order to check whether S matches
Q, a reasoning system must “guess” two appropriate per-
mutations or exhaustively explore all possible assignments.
Condition (i) is quite standard; for example, it can be
found in definitions formatching of OWL-S services (Payne,
Paolucci, & Sycara 2001). In contrast, condition (ii) is—to
the best of our knowledge—new, and it is not expressible in
terms of OWL-S service profiles. As we will see in Section ,
this condition is in fact reducible to checking containment
between two conjunctive queries w.r.t. a TBox, which is a
standard reasoning task.
The above definition covers only the case when |~x | = |~z |
and |~y | = |~w |. We can easily extend this approach to as-
sume that Q contains “redundant” input variables and S “re-
dundant” output variables. In contrast, if S contains more
input variables than Q, then one possible solution is to try to
“instantiate” some of the inputs of S with individual names
in order to decrease the number of inputs of S. Another op-
tion is to merge some inputs of S, provided that their types
are compatible. Similarly, if Q has more output variables
than S, it might be reasonable that a search engine still tries
to match S and Q by trying to check whether S produces
at least part of the requested outputs. All these cases are
discussed in more detail and illustrated by examples in the
accompanying technical report (Bovykin and Zolin 2005).
Service composition
In this section, we will show that, from the service descrip-
tions S1, . . . , Sn, we can automatically construct a descrip-
tion of the sequence of services S1 ◦ . . .◦Sn. This is another
important advantage of our approach since it decreases the
annotation workload for the web service provider. For the
beginning, suppose that we have two service descriptions in
a repository, both having only one input and one output:
S = 〈x:X ; y:Y ; Φ(x, y, ~u ) 〉,
S′ = 〈x′:X ′; y′:Y ′; Φ′(x′, y′, ~u ′) 〉.
Our task is to formulate reasonable conditions when the
composition of services S ◦ S′ (to be read as “first S runs,
then S′ runs on the output produced by S”) is meaning-
ful (i.e., when these services are compatible) and when it
matches a user’s request
Q = 〈 z:Z; w:W ; Ψ(z, w,~v ) 〉.
Definition 4. A composition of services S ◦ S′ matches a
request Q w.r.t. a TBox T if the following conditions hold:
(a) T |= Z v X . As usual, this ensures that S accepts all
inputs described in the request Q.
1321
a service description as in Definition 1. A service s imple-
ments a service description S over T if, for any ABoxA and
any individuals a1, . . . , am in T and A, if T ,A |= Xi(ai)
for each 1 ≤ i ≤ m, then
1. s accepts ~a = 〈a1, . . . , am〉 as input and
2. when run with ~a as input, it returns the set of all those
tuples of individuals~b = b1, . . . , bn fromA such that
T ,A |= ~Y (~b ) ∧ ∃~z : Φ(~a ,~b , ~z ).
Intuitively, an input ~a must be an instance of ~X w.r.t. the
background ontology, and the service returns as its output
the set of all tuples of objects~b that are instances of ~Y w.r.t.
the ontology and for which we can find some ~z such that ~a ,
~b , and ~z satisfy the condition Φ().
Next, we develop the means of comparing service de-
scriptions, i.e., a notion of one service matching another.
Matching services
Matching is the problem of determining whether a given ser-
vice description S conforms to another service description
Q. Matching algorithms can be used for the location of ser-
vices, and we can think of S as being a service advertise-
ment and of Q as being a service requested by a user. As a
consequence, each definition of matching of services should
express some reasonable conditions for a service S to be
considered as an “appropriate” candidate to be returned by a
search engine to a user who specified a request Q.
As mentioned above, we assume that services are de-
scribed w.r.t. a terminological ontology (TBox) T . We will
first give a formal definition, and then provide explanations.
We use |~x | to denote the length of a vector ~x .
Definition 3. Given two service descriptions:
S = 〈 ~x : ~X; ~y : ~Y ; Φ(~x , ~y , ~u ) 〉,
Q = 〈~z : ~Z; ~w : ~W ; Ψ(~z , ~w ,~v ) 〉,
(1)
with |~x | = m = |~z | and |~y | = n = |~w |, we say that the
service S matches the request Q w.r.t. the TBox T if there
exist two permutations
pi : {1, . . . ,m} → {1, . . . ,m}
ρ : {1, . . . , n} → {1, . . . , n}
such that the following two conditions hold:
(i) T |= Zpi(i) v Xi, for all 1 6 i 6m, i.e., for each input
xi in the advertised service S, we can find a matching
input zpi(i) in the requested service Q such that Zpi(i) is a
(possibly implicit) sub-class of Xi w.r.t. the ontology T .
Intuitively, this means that we can map the inputs from S
to the inputs from Q such that all input data that the user
intends to provide will be accepted by S.
(ii) for any ABox A and any individuals ~a = 〈a1, . . . , am〉,
~b = 〈b1, . . . , bn〉 in the knowledge base KB = 〈T ,A〉, if
KB |= ~Z(~a ), then the equivalence holds:
KB |= ~Y (ρ(~b )) ∧ ∃~u Φ(pi(~a ), ρ(~b ) ~u ) iff
KB |= ~W (~b ) ∧ ∃~v Ψ(~a ,~b , ~v ),
where pi(~a ) and ρ(~b ) are the permutations of ~a and ~b
according to pi and ρ.
Intuitively, this means that, modulo some re-arrangement
of the input and output vectors, the services S and Q re-
turn the same answers on any input that conforms to the
request Q.
Some remarks are in order here. The need to permute
inputs and outputs of Q to “fit” the ones of S is by no
means new—it is present in any reasonable definition of ser-
vice matching. Thus, in order to check whether S matches
Q, a reasoning system must “guess” two appropriate per-
mutations or exhaustively explore all possible assignments.
Condition (i) is quite standard; for example, it can be
found in definitions formatching of OWL-S services (Payne,
Paolucci, & Sycara 2001). In contrast, condition (ii) is—to
the best of our knowledge—new, and it is not expressible in
terms of OWL-S service profiles. As we will see in Section ,
this condition is in fact reducible to checking containment
between two conjunctive queries w.r.t. a TBox, which is a
standard reasoning task.
The above definition covers only the case when |~x | = |~z |
and |~y | = |~w |. We can easily extend this approach to as-
sume that Q contains “redundant” input variables and S “re-
dundant” output variables. In contrast, if S contains more
input variables than Q, then one possible solution is to try to
“instantiate” some of the inputs of S with individual names
in order to decrease the number of inputs of S. Another op-
tion is to merge some inputs of S, provided that their types
are compatible. Similarly, if Q has more output variables
than S, it might be reasonable that a search engine still tries
to match S and Q by trying to check whether S produces
at least part of the requested outputs. All these cases are
discussed in more detail and illustrated by examples in the
accompanying technical report (Bovykin and Zolin 2005).
Service composition
In this section, we will show that, from the service descrip-
tions S1, . . . , Sn, we can automatically construct a descrip-
tion of the sequence of services S1 ◦ . . .◦Sn. This is another
important advantage of our approach since it decreases the
annotation workload for the web service provider. For the
beginning, suppose that we have two service descriptions in
a repository, both having only one input and one output:
S = 〈x:X ; y:Y ; Φ(x, y, ~u ) 〉,
S′ = 〈x′:X ′; y′:Y ′; Φ′(x′, y′, ~u ′) 〉.
Our task is to formulate reasonable conditions when the
composition of services S ◦ S′ (to be read as “first S runs,
then S′ runs on the output produced by S”) is meaning-
ful (i.e., when these services are compatible) and when it
matches a user’s request
Q = 〈 z:Z; w:W ; Ψ(z, w,~v ) 〉.
Definition 4. A composition of services S ◦ S′ matches a
request Q w.r.t. a TBox T if the following conditions hold:
(a) T |= Z v X . As usual, this ensures that S accepts all
inputs described in the request Q.
1321
Page 4
(b) for any ABox A and any individuals a, b in the knowl-
edge base KB = 〈T ,A〉, if KB |= Z(a) ∧ Y (b) ∧
∃~u : Φ(a, b, ~u ), then KB |= X ′(b).
This ensures that, if S runs on inputs provided by the user,
then its outputs are accepted by S′.
(c) for any ABoxA and any individuals a, c in the knowledge
base KB = 〈T ,A〉, if KB |= Z(a), then KB |= W (c) ∧
∃~v : Ψ(a, c, ~v ) holds iff there exists an individual b in KB
such that
KB |= ∃~u Φ(a, b, ~u ) ∧ Y (b) ∧ ∃~u ′ Φ′(b, c, ~u ′) ∧ Y ′(c).
This means that, on the user’s inputs, the application of S
and then S′ yields the same answers as Q.
Observe that the notion of service composition is request
dependent in the sense that the composition of services
S ◦ S′ is built for a particular request Q. This is so because,
in general, the services S and S′ may not be compatible—S
may return outputs that the S′ cannot accept—yet on inputs
that a user is intended to provide, S returns only outputs that
are accepted by S′.
More generally, if we are given several services
S1, . . . , Sr, all having only one input and one output, then
we can easily modify Definition 4 accordingly. Condi-
tion (b) will say that the outputs of S1 (on user’s inputs)
are accepted by S2, that the outputs of S2 (on inputs coming
from S1) are accepted by S3, etc. Condition (c) is easier to
modify: its first line is unchanged, whereas in its first line we
state that KB entails the assertion that c has the type of the
output of the last service Sr the conjunction of all formulas
Φi with identification of their intermediate variables.
Finally, this notion of composition of services can be fur-
ther generalised to the case of several services having mul-
tiple inputs and outputs. No fundamental difficulties arise
here, but the notation becomes cumbersome due to permu-
tations of variables.
Example Queries and Matches
Before describing how to decidematching between services,
we discuss some examples from the biomedical domain to
illustrate the usefulness of our approach. In general, we ob-
serve that most of these services take (one or more) strings
as input and return (one or more) strings as output. Hence,
any attempt to match services based on the types of input
and output is doomed to be of rather low precision. More-
over, these services provide access to tools, databases, and
algorithms developed in the public domain, and a large part
of these services are related to the analysis of the biological
molecule DeoxyriboNucleic Acid (DNA). There are at least
20 different syntaxes2 for representing just four bases (A, C,
T, G) of the DNA code: GenBank Format and EMBL For-
mat are two commonly used formats for representing DNA
sequences. These DNA sequences and associated meta-
data are stored in a large number of public repositories, 858
databases (Galperin 2006) at the last count.
Consider the following advertisements for two “shim”
services (Hull et al. 2004) which extract the DNA sequence
from a GenBankRecord.
2http://emboss.sf.net/docs/themes/SequenceFormats.html
S1: INPUT: x GenBankRecord
OUTPUT: y DNASeqRepresentation
[ hasPart(x,y) ]
S2: INPUT: x GenBankRecord
OUTPUT: y DNASeqRepresentation
THERE IS SOME d,e [DNASequence(d),
EMBLRecord(e), about(x,d), about(e,d),
hasPart(e,y)]
They coincide on their inputs and outputs, yet they will
behave in slightly different ways. The first service simply
extracts the DNASequence from the input, whereas the sec-
ond one first extracts the DNA sequence and then translates
the syntax from GenBankForm to EMBLForm. Since there at
least 20 different formats for representing DNA sequences,
we have to distinguish between a DNA sequence and its rep-
resentation in one of these formats.
Now consider the following request, which describes ser-
vices taking a GenBankRecord and returning the corre-
sponding DNA sequence in EMBL format:
Q: INPUT: x GenBankRecord
OUTPUT: y DNASeqRepresentation
THERE IS SOME d,e
[ DNASequence(d), Record(e), about(x,d),
about(e,d), hasPart(e,y), EMBLform(y)]
First, note that, according to our definition of matching,
the service S1 does not match our request Q since it cannot
guarantee that the output is in EMBL format. In contrast, if
our TBox contains
SubClassOf(EMBLRecord restriction(hasPart
allValuesFrom EMBLform))
which ensures that all entries in an EMBLRecord are in EM-
BLform, then service S2 matches our request—which is in-
deed useful. Similarly, in the presence of the above OWL
axiom, S2 even matches the following request—despite the
fact that the output of the request is declared to be more spe-
cific than that provided by the second service.
Q1: INPUT: x GenBankRecord
OUTPUT: y IntersectionOf(
DNASeqRepresentation EMBLform)
THERE IS SOME d,e
[ DNASequence(d), Record(e), about(x,d),
about(e,d), hasPart(e,y)]
Let us point out again that our definition of matching
yields both a higher precision and a higher recall than any
comparison of inputs and outputs could possibly yield: it
matches services whose inputs or outputs do not match in
an obvious way (such as Q2 and S2 above), and it does not
match services despite their in- and outputs matching (such
as S6 and Q2). The latter point is especially important for
biomedical Web Services since many take strings as in- and
outputs—and thus all services would match on the grounds
of in- and outputs.
Deciding service matching
In this section, we show that the problem of deciding
whether a service S matches a service Q is reducible to a
standard reasoning problem, namely to containment of con-
junctive queries w.r.t. a TBox.
1322
edge base KB = 〈T ,A〉, if KB |= Z(a) ∧ Y (b) ∧
∃~u : Φ(a, b, ~u ), then KB |= X ′(b).
This ensures that, if S runs on inputs provided by the user,
then its outputs are accepted by S′.
(c) for any ABoxA and any individuals a, c in the knowledge
base KB = 〈T ,A〉, if KB |= Z(a), then KB |= W (c) ∧
∃~v : Ψ(a, c, ~v ) holds iff there exists an individual b in KB
such that
KB |= ∃~u Φ(a, b, ~u ) ∧ Y (b) ∧ ∃~u ′ Φ′(b, c, ~u ′) ∧ Y ′(c).
This means that, on the user’s inputs, the application of S
and then S′ yields the same answers as Q.
Observe that the notion of service composition is request
dependent in the sense that the composition of services
S ◦ S′ is built for a particular request Q. This is so because,
in general, the services S and S′ may not be compatible—S
may return outputs that the S′ cannot accept—yet on inputs
that a user is intended to provide, S returns only outputs that
are accepted by S′.
More generally, if we are given several services
S1, . . . , Sr, all having only one input and one output, then
we can easily modify Definition 4 accordingly. Condi-
tion (b) will say that the outputs of S1 (on user’s inputs)
are accepted by S2, that the outputs of S2 (on inputs coming
from S1) are accepted by S3, etc. Condition (c) is easier to
modify: its first line is unchanged, whereas in its first line we
state that KB entails the assertion that c has the type of the
output of the last service Sr the conjunction of all formulas
Φi with identification of their intermediate variables.
Finally, this notion of composition of services can be fur-
ther generalised to the case of several services having mul-
tiple inputs and outputs. No fundamental difficulties arise
here, but the notation becomes cumbersome due to permu-
tations of variables.
Example Queries and Matches
Before describing how to decidematching between services,
we discuss some examples from the biomedical domain to
illustrate the usefulness of our approach. In general, we ob-
serve that most of these services take (one or more) strings
as input and return (one or more) strings as output. Hence,
any attempt to match services based on the types of input
and output is doomed to be of rather low precision. More-
over, these services provide access to tools, databases, and
algorithms developed in the public domain, and a large part
of these services are related to the analysis of the biological
molecule DeoxyriboNucleic Acid (DNA). There are at least
20 different syntaxes2 for representing just four bases (A, C,
T, G) of the DNA code: GenBank Format and EMBL For-
mat are two commonly used formats for representing DNA
sequences. These DNA sequences and associated meta-
data are stored in a large number of public repositories, 858
databases (Galperin 2006) at the last count.
Consider the following advertisements for two “shim”
services (Hull et al. 2004) which extract the DNA sequence
from a GenBankRecord.
2http://emboss.sf.net/docs/themes/SequenceFormats.html
S1: INPUT: x GenBankRecord
OUTPUT: y DNASeqRepresentation
[ hasPart(x,y) ]
S2: INPUT: x GenBankRecord
OUTPUT: y DNASeqRepresentation
THERE IS SOME d,e [DNASequence(d),
EMBLRecord(e), about(x,d), about(e,d),
hasPart(e,y)]
They coincide on their inputs and outputs, yet they will
behave in slightly different ways. The first service simply
extracts the DNASequence from the input, whereas the sec-
ond one first extracts the DNA sequence and then translates
the syntax from GenBankForm to EMBLForm. Since there at
least 20 different formats for representing DNA sequences,
we have to distinguish between a DNA sequence and its rep-
resentation in one of these formats.
Now consider the following request, which describes ser-
vices taking a GenBankRecord and returning the corre-
sponding DNA sequence in EMBL format:
Q: INPUT: x GenBankRecord
OUTPUT: y DNASeqRepresentation
THERE IS SOME d,e
[ DNASequence(d), Record(e), about(x,d),
about(e,d), hasPart(e,y), EMBLform(y)]
First, note that, according to our definition of matching,
the service S1 does not match our request Q since it cannot
guarantee that the output is in EMBL format. In contrast, if
our TBox contains
SubClassOf(EMBLRecord restriction(hasPart
allValuesFrom EMBLform))
which ensures that all entries in an EMBLRecord are in EM-
BLform, then service S2 matches our request—which is in-
deed useful. Similarly, in the presence of the above OWL
axiom, S2 even matches the following request—despite the
fact that the output of the request is declared to be more spe-
cific than that provided by the second service.
Q1: INPUT: x GenBankRecord
OUTPUT: y IntersectionOf(
DNASeqRepresentation EMBLform)
THERE IS SOME d,e
[ DNASequence(d), Record(e), about(x,d),
about(e,d), hasPart(e,y)]
Let us point out again that our definition of matching
yields both a higher precision and a higher recall than any
comparison of inputs and outputs could possibly yield: it
matches services whose inputs or outputs do not match in
an obvious way (such as Q2 and S2 above), and it does not
match services despite their in- and outputs matching (such
as S6 and Q2). The latter point is especially important for
biomedical Web Services since many take strings as in- and
outputs—and thus all services would match on the grounds
of in- and outputs.
Deciding service matching
In this section, we show that the problem of deciding
whether a service S matches a service Q is reducible to a
standard reasoning problem, namely to containment of con-
junctive queries w.r.t. a TBox.
1322
Page 5
Definition 5. A conjunctive query is an expression of the
form q(~x ) ← term1(~x , ~y ) ∧ . . . ∧ termk(~x , ~y ), where
~x and ~y are lists of variables and each termi is of the form
C(w) or R(w, z), where C is a class, R a property, and w, z
are either variables from the lists ~x , ~y or individual names.
We call ~x distinguished and ~y non-distinguished variables.
As in our service descriptions, the existential quantifica-
tion of non-distinguished variables (∃~y ) is only implicit in a
query.
Given a knowledge base KB, the answer to a query
q(x1, . . . , xm) over KB is defined as follows:
q(KB)={〈a1, . . . , am〉 | all ai are individuals in KB and
KB |= ∃~y
(
term1(~a , ~y ) ∧ . . . ∧ termk(~a , ~y )
)
}.
A query q1(~x ) subsumes a query q2(~x ) w.r.t. a TBox T if,
for each ABoxA, q1(T ,A) ⊇ q2(T ,A). Two queries q1(~x )
and q2(~x ) are equivalentw.r.t. T if they subsume each other.
Theorem 1 (Reduction). Service matching w.r.t. a TBox is
reducible to conjunctive query equivalence w.r.t. a TBox.
The proof can be found in (Bovykin and Zolin 2005).
Now consider Definition 4 of service composition S ◦ S′
matching a service request Q. Again, Condition (a) is just a
concept subsumption; Condition (b) holds iff the query
q(x, y) ← Z(x) ∧ Y (y) ∧ Φ(x, y, ~u )
is subsumed by the query q′(x, y)← X ′(y). That Condi-
tion (c) is reducible to query subsumption is not straight-
forward, because of the quantification over individuals in
a knowledge base, not over arbitrary elements of models
of KB. However, it can be shown that this quantification can
be “internalized” and hence the Condition (c) is equivalent
to the following one:
(c’) for any ABoxA and any individuals a, c in the knowledge
base KB = 〈T ,A〉, if KB |= Z(a), then the condition
KB |= W (c) ∧ ∃~v : Ψ(a, c, ~v )
is equivalent to the condition
KB |= ∃t, ~u , ~u ′
(
Φ(a, t, ~u )∧Y (t)∧Φ′(t, c, ~u ′)∧Y ′(c)
)
.
For detailed proof of the equivalence of (c) and (c’)
see (Bovykin and Zolin 2005). Finally, it remains to notice
that Condition (c’) states the equivalence of two conjunctive
queries, so we are done.
Consequently, Definition 4 turns out to be modular in
the sense that a service description for a composite service
can be constructed automatically from the description of its
components (provided that we have checked their compati-
bility), and then matching is defined as usual. To be more
precise, from the above it follows that, given two services S
and S′ as in Definition 4 that are compatible on inputs of Q,
the description for the composition S ◦ S′ is the following:
S ◦ S′ = 〈x:X ; y′:Y ′; Φ(x, t, ~u ) ∧ Y (t) ∧ Φ′(t, y′, ~u ′)〉,
provided that the lists ~u and ~u ′ are disjoint—and we can
always rename these variables.
Next, we will briefly discuss results on the decidability
and complexity of query containment w.r.t. ontologies. In
general, query containment is at least as hard as concept
subsumption or satisfiability. Hence, the lower bound for its
complexity immediately follows from the lower complexity
bound of a Description Logic itself.
As for upper bounds, there are several results in this di-
rection. In (Calvanese, De Giacomo, & Lenzerini 1998),
the query containment problem for DLRreg is shown to be
exponential in size of TBox and double exponential in size
of queries. This is a logic with n-ary relations and boolean
operations on them, and regular expressions for binary re-
lations. In (Horrocks et al. 2000), the query containment
problem for the logic DLR was reduced to checking ABox
satisfiability for the same logic, which, in turn, was reduced
to knowledge base satisfiability for the Description Logic
SHIQ, which is decided by numerous DL reasoners. In
both settings, however, only so-called simple properties are
allowed in query terms. Hence it does not completely suite
our setting: most of our example services involve a tran-
sitive and thus non-simple property hasPart. In (Ortiz
de la Fuente et al. 2005), the query containment for the logic
SHIQ is shown to be 3coNExpTime, provided that the KB
has no transitive roles. Currently, the complexity is investi-
gated and practical algorithms are devised for query contain-
ment over the logic SHOIQ, the DL underlying OWL-DL.
Summing up, service matching w.r.t. OWL ontologies is
known to be decidable, and decision procedures for this
problem are available. Yet, to the best of our knowledge,
neither tight complexity bounds nor an implementation are
currently available.
Related work
WSDL descriptions The Web Services Description Lan-
guage (WSDL) 1.1 is used to describe around 80% of the
3 000 services currently available in bioinformatics. In
bioinformatics, “WSDL in the wild” is typically under-
descriptive. For example, take xembl,3 which takes an in-
put string, performs an operation getNucSeq on that input
string, and returns a string. To the WSDL-literate domain
expert, it might be clear that this service returns an XML
formatted representation of a DNA sequence identifier it re-
ceived as an input, yet this knowledge has to be deduced
solely from the terms used in the description. As mentioned
before, having inputs and outputs of type string is typical of
the biomedical domain, although these strings hide complex
flat-file and legacy formats. Directly annotating the WSDL
file itself as proposed in WSDL-S 4 is usually not possible
because the WSDL is provided as-is by a third party. Con-
sequently, any richer semantic descriptions are best stored
independently of WSDL.
RDF annotations ofWSDL In order to enable semantic dis-
covery of services, two closely related projects, BioMOBY
and myGrid Feta (Lord et al. 2004; 2005) have taken existing
WSDL descriptions and annotated them using RDF. These
projects have created a centralised registry of many services
annotated with RDF. Because of RDF’s restricted expressiv-
ity, neither of these approaches, however, would allow the
3http://www.ebi.ac.uk/xembl/XEMBL.wsdl.
4http://www.w3.org/Submission/WSDL-S/
1323
form q(~x ) ← term1(~x , ~y ) ∧ . . . ∧ termk(~x , ~y ), where
~x and ~y are lists of variables and each termi is of the form
C(w) or R(w, z), where C is a class, R a property, and w, z
are either variables from the lists ~x , ~y or individual names.
We call ~x distinguished and ~y non-distinguished variables.
As in our service descriptions, the existential quantifica-
tion of non-distinguished variables (∃~y ) is only implicit in a
query.
Given a knowledge base KB, the answer to a query
q(x1, . . . , xm) over KB is defined as follows:
q(KB)={〈a1, . . . , am〉 | all ai are individuals in KB and
KB |= ∃~y
(
term1(~a , ~y ) ∧ . . . ∧ termk(~a , ~y )
)
}.
A query q1(~x ) subsumes a query q2(~x ) w.r.t. a TBox T if,
for each ABoxA, q1(T ,A) ⊇ q2(T ,A). Two queries q1(~x )
and q2(~x ) are equivalentw.r.t. T if they subsume each other.
Theorem 1 (Reduction). Service matching w.r.t. a TBox is
reducible to conjunctive query equivalence w.r.t. a TBox.
The proof can be found in (Bovykin and Zolin 2005).
Now consider Definition 4 of service composition S ◦ S′
matching a service request Q. Again, Condition (a) is just a
concept subsumption; Condition (b) holds iff the query
q(x, y) ← Z(x) ∧ Y (y) ∧ Φ(x, y, ~u )
is subsumed by the query q′(x, y)← X ′(y). That Condi-
tion (c) is reducible to query subsumption is not straight-
forward, because of the quantification over individuals in
a knowledge base, not over arbitrary elements of models
of KB. However, it can be shown that this quantification can
be “internalized” and hence the Condition (c) is equivalent
to the following one:
(c’) for any ABoxA and any individuals a, c in the knowledge
base KB = 〈T ,A〉, if KB |= Z(a), then the condition
KB |= W (c) ∧ ∃~v : Ψ(a, c, ~v )
is equivalent to the condition
KB |= ∃t, ~u , ~u ′
(
Φ(a, t, ~u )∧Y (t)∧Φ′(t, c, ~u ′)∧Y ′(c)
)
.
For detailed proof of the equivalence of (c) and (c’)
see (Bovykin and Zolin 2005). Finally, it remains to notice
that Condition (c’) states the equivalence of two conjunctive
queries, so we are done.
Consequently, Definition 4 turns out to be modular in
the sense that a service description for a composite service
can be constructed automatically from the description of its
components (provided that we have checked their compati-
bility), and then matching is defined as usual. To be more
precise, from the above it follows that, given two services S
and S′ as in Definition 4 that are compatible on inputs of Q,
the description for the composition S ◦ S′ is the following:
S ◦ S′ = 〈x:X ; y′:Y ′; Φ(x, t, ~u ) ∧ Y (t) ∧ Φ′(t, y′, ~u ′)〉,
provided that the lists ~u and ~u ′ are disjoint—and we can
always rename these variables.
Next, we will briefly discuss results on the decidability
and complexity of query containment w.r.t. ontologies. In
general, query containment is at least as hard as concept
subsumption or satisfiability. Hence, the lower bound for its
complexity immediately follows from the lower complexity
bound of a Description Logic itself.
As for upper bounds, there are several results in this di-
rection. In (Calvanese, De Giacomo, & Lenzerini 1998),
the query containment problem for DLRreg is shown to be
exponential in size of TBox and double exponential in size
of queries. This is a logic with n-ary relations and boolean
operations on them, and regular expressions for binary re-
lations. In (Horrocks et al. 2000), the query containment
problem for the logic DLR was reduced to checking ABox
satisfiability for the same logic, which, in turn, was reduced
to knowledge base satisfiability for the Description Logic
SHIQ, which is decided by numerous DL reasoners. In
both settings, however, only so-called simple properties are
allowed in query terms. Hence it does not completely suite
our setting: most of our example services involve a tran-
sitive and thus non-simple property hasPart. In (Ortiz
de la Fuente et al. 2005), the query containment for the logic
SHIQ is shown to be 3coNExpTime, provided that the KB
has no transitive roles. Currently, the complexity is investi-
gated and practical algorithms are devised for query contain-
ment over the logic SHOIQ, the DL underlying OWL-DL.
Summing up, service matching w.r.t. OWL ontologies is
known to be decidable, and decision procedures for this
problem are available. Yet, to the best of our knowledge,
neither tight complexity bounds nor an implementation are
currently available.
Related work
WSDL descriptions The Web Services Description Lan-
guage (WSDL) 1.1 is used to describe around 80% of the
3 000 services currently available in bioinformatics. In
bioinformatics, “WSDL in the wild” is typically under-
descriptive. For example, take xembl,3 which takes an in-
put string, performs an operation getNucSeq on that input
string, and returns a string. To the WSDL-literate domain
expert, it might be clear that this service returns an XML
formatted representation of a DNA sequence identifier it re-
ceived as an input, yet this knowledge has to be deduced
solely from the terms used in the description. As mentioned
before, having inputs and outputs of type string is typical of
the biomedical domain, although these strings hide complex
flat-file and legacy formats. Directly annotating the WSDL
file itself as proposed in WSDL-S 4 is usually not possible
because the WSDL is provided as-is by a third party. Con-
sequently, any richer semantic descriptions are best stored
independently of WSDL.
RDF annotations ofWSDL In order to enable semantic dis-
covery of services, two closely related projects, BioMOBY
and myGrid Feta (Lord et al. 2004; 2005) have taken existing
WSDL descriptions and annotated them using RDF. These
projects have created a centralised registry of many services
annotated with RDF. Because of RDF’s restricted expressiv-
ity, neither of these approaches, however, would allow the
3http://www.ebi.ac.uk/xembl/XEMBL.wsdl.
4http://www.w3.org/Submission/WSDL-S/
1323
Page 6
line of reasoning we used to determine that the produced-
wine service matches the query for french wines: the an-
notation of tasks, functions, etc., does not take into account
relationships between in- or outputs.
OWL-S For the problem of service matching and location,
an OWL-S description includes a service profile. In a pro-
file, we can specify, besides others, inputs, outputs, and their
types, as well as pre- and postconditions. Now OWL-S is
written in OWL, and thus the syntactic restrictions of OWL
imply that the relationships between inputs and outputs can-
not be described. A service profile includes pre- and post-
conditions whose semantics remains unclear since OWL is
stateless, i.e., OWL does not provide mechanisms to distin-
guish different states such as “before” the execution of aser-
vice or “after” the execution of a service.
WSMO WSMO (Roman et al. 2005) has a class capability
to describe a service’s functionality, and allows the relation
of inputs to outputs using shared variables in a similar way
as the approach described here. To the best of our knowl-
edge, however, no decidability results are known for match-
ing services described in WSMO.
Conclusions
We have discussed various ways of describing stateless ser-
vices which are quite common, e.g., in the bioinformatics
domain. We propose a framework to describe the function-
ality of and reason about such services. As all other web ser-
vice description formalisms, we represent information about
the inputs and outputs of a service. To describe a service’s
functionality, we allow description of how inputs and out-
puts are related, and we use an ontology to fix the meaning
of terms used in service descriptions. Most importantly, we
provide a definition of a service matching a request which (i)
takes all this information into account and (ii) is decidable.
From a logical perspective, it is clear that our framework
can be easily combined with OWL-S and WSMO, thereby
allowing for automated reasoning approach for matching
services with high precision and recall.
References
A.Bovykin and E.Zolin. 2005. A formal framework for
describing information providing web services. Technical
report, University of Manchester. Available at http://
dynamo.man.ac.uk/publ/aaai06tr.pdf.
Baader, F.; Lutz, C.; Milicˇic´, M.; Sattler, U.; and Wolter, F.
2005. Integrating description logics and action formalisms:
First results. In Press, A. P. M., ed., Proc. of AAAI-05.
Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.;
and Rosati, R. 2005. Data complexity of query answering
in description logics. In Proc. of DL’05), volume 147.
Calvanese, D.; De Giacomo, G.; and Lenzerini, M. 1998.
On the decidability of query containment under constraints.
In Proc. of PODS’98, 149–158.
Collins, F. S.; Green, E. D.; Guttmacher, A. E.; and Guyer,
M. S. 2003. A vision for the future of genomics research:
A blueprint for the genomic era. Nature 422(6934):835–
847. US National Human Genome Research Institute.
Galperin, M. 2006. The molecular biology database col-
lection: 2006 update. Nucleic Acids Research 34(Database
issue):3–5.
Horrocks, I.; Sattler, U.; Tessaris, S.; and Tobies, S. 2000.
How to decide query containment under constraints using
a description logic. In Proc. of LPAR’00, LNAI. Springer-
Verlag.
Horrocks, I.; Patel-Schneider, P. F.; and van Harmelen, F.
2003. From SHIQ and RDF to OWL: The Making of a
Web Ontology Language. Journal of Web Semantics 1(1).
Hull, D.; Stevens, R.; Lord, P.; Wroe, C.; and Goble, C.
2004. Treating shimantic web syndrome with ontologies.
In Proc. of AKT-SWS04.
Hull, D.; Wolstencroft, K.; Stevens, R.; Goble, C.; Pocock,
M.R.; Li, P.; and Oinn, T. 2006. Taverna: A tool for build-
ing and running workflows of services. In Nucleic Acids
Research. 34 (Web Server Issue)
Lord, P.; Bechhofer, S.; Wilkinson, M. D.; Schiltz, G.;
Gessler, D.; Hull, D.; Goble, C.; and Stein., L. 2004.
Applying Semantic Web Services to bioinformatics: Ex-
periences gained, lessons learnt. In Proc. of ISWC’04.
Springer-Verlag.
Lord, P.; Alper, P.; Wroe, C.; and Goble, C. 2005. Feta: A
light-weight architecture for user oriented semantic service
discovery. In Proc. of ESWC’05. Springer-Verlag.
Martin, D.; Paolucci, M.; McIlraith, S.; Burstein, M.; Mc-
Dermott, D.; McGuinness, D.; Parsia, B.; Payne, T. R.;
Sabou, M.; Solanki, M.; Srinvasan, N.; and Sycara, K.
2004. Bringing Semantics to Web Services: the OWL-S
approach. In SWSWPC 2004. Springer-Verlag.
Ortiz de la Fuente, M. M.; Calvanese, D.; Eiter, T.; and
Franconi, E. 2005. Data Complexity of Answering Con-
junctive Queries over SHIQ Knowledge Bases. Technical
report, Faculty of Computer Science, Free University of
Bozen-Bolzano.
Payne, T. R.; Paolucci, M.; and Sycara, K. 2001. Advertis-
ing and Matching DAML-S Service Descriptions. Seman-
tic Web Working Symposium (SWWS).
Roman, D.; Keller, U.; Lausen, H.; de Bruijn, J.; Lara,
R.; Stollberg, M.; Polleres, A.; Feier, C.; Bussler, C.; and
Fensel, D. 2005. Web ServiceModelingOntology. Applied
Ontology 1(1):77–106.
Sleep, R. 2004. Grand Challenges in Computing Research.
British Computer Society. chapter GC1: In Vivo-In Silico
(iVis): The virtual worm, weed, bug. isbn:190250562X.
Stein, L. 2002. Creating a bioinformatics nation. Nature
417:119–120.
Stevens, R. D.; Tipney, H. J.; Wroe, C.; Oinn, T.; Senger,
M.; Lord, P. W.; Goble, C. A.; Brass, A.; and Tassabehji,
M. 2004. Exploring Williams-Beuren Syndrome Using
myGrid. Bioinformatics 20.
Tessaris, S. 2001. Questions and answers: reasoning and
querying in Description Logic. Ph.D. Dissertation, Univer-
sity of Manchester.
1324
wine service matches the query for french wines: the an-
notation of tasks, functions, etc., does not take into account
relationships between in- or outputs.
OWL-S For the problem of service matching and location,
an OWL-S description includes a service profile. In a pro-
file, we can specify, besides others, inputs, outputs, and their
types, as well as pre- and postconditions. Now OWL-S is
written in OWL, and thus the syntactic restrictions of OWL
imply that the relationships between inputs and outputs can-
not be described. A service profile includes pre- and post-
conditions whose semantics remains unclear since OWL is
stateless, i.e., OWL does not provide mechanisms to distin-
guish different states such as “before” the execution of aser-
vice or “after” the execution of a service.
WSMO WSMO (Roman et al. 2005) has a class capability
to describe a service’s functionality, and allows the relation
of inputs to outputs using shared variables in a similar way
as the approach described here. To the best of our knowl-
edge, however, no decidability results are known for match-
ing services described in WSMO.
Conclusions
We have discussed various ways of describing stateless ser-
vices which are quite common, e.g., in the bioinformatics
domain. We propose a framework to describe the function-
ality of and reason about such services. As all other web ser-
vice description formalisms, we represent information about
the inputs and outputs of a service. To describe a service’s
functionality, we allow description of how inputs and out-
puts are related, and we use an ontology to fix the meaning
of terms used in service descriptions. Most importantly, we
provide a definition of a service matching a request which (i)
takes all this information into account and (ii) is decidable.
From a logical perspective, it is clear that our framework
can be easily combined with OWL-S and WSMO, thereby
allowing for automated reasoning approach for matching
services with high precision and recall.
References
A.Bovykin and E.Zolin. 2005. A formal framework for
describing information providing web services. Technical
report, University of Manchester. Available at http://
dynamo.man.ac.uk/publ/aaai06tr.pdf.
Baader, F.; Lutz, C.; Milicˇic´, M.; Sattler, U.; and Wolter, F.
2005. Integrating description logics and action formalisms:
First results. In Press, A. P. M., ed., Proc. of AAAI-05.
Calvanese, D.; De Giacomo, G.; Lembo, D.; Lenzerini, M.;
and Rosati, R. 2005. Data complexity of query answering
in description logics. In Proc. of DL’05), volume 147.
Calvanese, D.; De Giacomo, G.; and Lenzerini, M. 1998.
On the decidability of query containment under constraints.
In Proc. of PODS’98, 149–158.
Collins, F. S.; Green, E. D.; Guttmacher, A. E.; and Guyer,
M. S. 2003. A vision for the future of genomics research:
A blueprint for the genomic era. Nature 422(6934):835–
847. US National Human Genome Research Institute.
Galperin, M. 2006. The molecular biology database col-
lection: 2006 update. Nucleic Acids Research 34(Database
issue):3–5.
Horrocks, I.; Sattler, U.; Tessaris, S.; and Tobies, S. 2000.
How to decide query containment under constraints using
a description logic. In Proc. of LPAR’00, LNAI. Springer-
Verlag.
Horrocks, I.; Patel-Schneider, P. F.; and van Harmelen, F.
2003. From SHIQ and RDF to OWL: The Making of a
Web Ontology Language. Journal of Web Semantics 1(1).
Hull, D.; Stevens, R.; Lord, P.; Wroe, C.; and Goble, C.
2004. Treating shimantic web syndrome with ontologies.
In Proc. of AKT-SWS04.
Hull, D.; Wolstencroft, K.; Stevens, R.; Goble, C.; Pocock,
M.R.; Li, P.; and Oinn, T. 2006. Taverna: A tool for build-
ing and running workflows of services. In Nucleic Acids
Research. 34 (Web Server Issue)
Lord, P.; Bechhofer, S.; Wilkinson, M. D.; Schiltz, G.;
Gessler, D.; Hull, D.; Goble, C.; and Stein., L. 2004.
Applying Semantic Web Services to bioinformatics: Ex-
periences gained, lessons learnt. In Proc. of ISWC’04.
Springer-Verlag.
Lord, P.; Alper, P.; Wroe, C.; and Goble, C. 2005. Feta: A
light-weight architecture for user oriented semantic service
discovery. In Proc. of ESWC’05. Springer-Verlag.
Martin, D.; Paolucci, M.; McIlraith, S.; Burstein, M.; Mc-
Dermott, D.; McGuinness, D.; Parsia, B.; Payne, T. R.;
Sabou, M.; Solanki, M.; Srinvasan, N.; and Sycara, K.
2004. Bringing Semantics to Web Services: the OWL-S
approach. In SWSWPC 2004. Springer-Verlag.
Ortiz de la Fuente, M. M.; Calvanese, D.; Eiter, T.; and
Franconi, E. 2005. Data Complexity of Answering Con-
junctive Queries over SHIQ Knowledge Bases. Technical
report, Faculty of Computer Science, Free University of
Bozen-Bolzano.
Payne, T. R.; Paolucci, M.; and Sycara, K. 2001. Advertis-
ing and Matching DAML-S Service Descriptions. Seman-
tic Web Working Symposium (SWWS).
Roman, D.; Keller, U.; Lausen, H.; de Bruijn, J.; Lara,
R.; Stollberg, M.; Polleres, A.; Feier, C.; Bussler, C.; and
Fensel, D. 2005. Web ServiceModelingOntology. Applied
Ontology 1(1):77–106.
Sleep, R. 2004. Grand Challenges in Computing Research.
British Computer Society. chapter GC1: In Vivo-In Silico
(iVis): The virtual worm, weed, bug. isbn:190250562X.
Stein, L. 2002. Creating a bioinformatics nation. Nature
417:119–120.
Stevens, R. D.; Tipney, H. J.; Wroe, C.; Oinn, T.; Senger,
M.; Lord, P. W.; Goble, C. A.; Brass, A.; and Tassabehji,
M. 2004. Exploring Williams-Beuren Syndrome Using
myGrid. Bioinformatics 20.
Tessaris, S. 2001. Questions and answers: reasoning and
querying in Description Logic. Ph.D. Dissertation, Univer-
sity of Manchester.
1324
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
11 Readers on Mendeley
by Discipline
9% Mathematics
by Academic Status
55% Ph.D. Student
18% Researcher (at an Academic Institution)
18% Post Doc
by Country
27% Germany
18% United States
18% United Kingdom


