Sign up & Download
Sign in

Sharing Choreographies in OpenKnowledge: A Novel Approach to Interoperability

by Paolo Besana, Vivek Patkar, Adam Barker, David Robertson, David Glasspool
Journal of Software (2009)

Abstract

As computer systems grow in size and complexity, their integration, while a necessity, becomes more difficult. Service oriented architectures and middleware systems in general deal with the issue by decoupling the components. However, these architectures are still designed and enacted from a centralised perspective: a single process invokes remote services, unaware of being part of a larger, more complex workflow. We claim that the orchestration-based approach does not scale well with increasing complexity and heterogeneity of the components, such as those required in the enactment of medical guidelines and workflows. Medical guidelines encode different aspects of a medical procedures, and they specify different level of abstraction in the procedure. They have to be adapted to the realities of different clinics and hospitals, to advances in knowledge, and to the changing of available resources within an institution. We address the problem of representing and enacting medical guidelines using a fully distributed approach. The framework provided by OpenKnowledge, based on sharing choreographies among actors, allows the representation and enactment of the coordination aspect of guidelines and the discovery of the medical knowledge provided by the distributed actors.

Cite this document (BETA)

Available from www.academypublisher.com
Page 1
hidden

Sharing Choreographies in OpenKnowledge: A Novel Approach to Interoperability

Sharing Choreographies in OpenKnowledge:
ANovelApproachtoInteroperability
Paolo Besana
1
,VivekPatkar
2
,AdamBarker
3
,DavidRobertson
1
and David Glasspool
1
1
University of Edinburgh,
2
University College London,
3
University of Oxford
Abstract As computer systems grow in size and complexity,
their integration, while a necessity, becomes more dif cult.
Service oriented architectures and middleware systems in general
deal with the issue by decoupling the components. However, these
architectures are still designed and enacted from a centralised
perspective: a single process invokes remote services, unaware of
being part of a larger, more complex work ow. We claim that the
orchestration-based approach does not scale well with increasing
complexity and heterogeneity of the components, such as those
required in the enactment of medical guidelines and work ows.
Medical guidelines encode different aspects of a medical
procedures, and they specify different level of abstraction in
the procedure. They have to be adapted to the realities of
different clinics and hospitals, to advances in knowledge, and
to the changing of available resources within an institution.
We address the problem of representing and enacting medical
guidelines using a fully distributed approach. The framework
provided by OpenKnowledge, based on sharing choreographies
among actors, allows the representation and enactment of the
coordination aspect of guidelines and the discovery of the medical
knowledge provided by the distributed actors.
Index terms service oriented architecture, semantic service
composition, service choreography, health informatics
I. INTRODUCTION
Software systems are getting more and more complex. An
important source of the complexity is the need to integrate
different, heterogeneous subsystems. Middleware provides the
facilities for connecting software components, reducing their
interdependencies and simplifying their reusability in different
systems.
Service oriented architectures are part of a wide family
of middleware systems: every component exposes services
accessible through the network. Complex systems can be
pulled together, invoking services belonging to different and
possibly external systems using work ow languages [4] like
the Business Process Execution Language (BPEL) [2] or Yet
Another Work ow Language (YAWL) [33]. These frameworks
are based on a centralised, imperative paradigm: a central
process controls everything, the other services are passive,
unaware of being part of a larger, more complex work ow.
In this approach, called orchestration,servicesareusually
wired together into an application at design time, leaving little
margin if, for example, one of the services is unavailable at
run-time.
We c la im tha t th i s approach does no t sca le we l l wi th the
growing complexity of systems. This is the case, as we will see
in the next sections, for medical procedures, where complex
work ows, developed by committees, involve the interaction
of many different actors such as desktop applications, web ser-
vices, databases, diagnostic devices and monitoring tools and
have to be adapted to the contingent and varying realities of
hospitals and clinics. We advocateadifferentparadigm,based
on sharing choreographies among actors. We believe that both
design and execution of complex systems can bene t from this
approach. At design-time, the choreography paradigm forces
the developers to think in terms of the actors, their roles and
their interactions, making them explicit and abstracting away
from the details of their speci c activities. While the general
approach is to share description of services, and pull them to-
gether at design-time into a speci c application, our approach
takes the opposite direction and shares the choreographies. At
execution-time, shared choreographies are the contracts that
actors can search, verify and agree to follow: the binding with
actors takes place at run-time, not at design-time.
The OpenKnowledge
1
project has allowed us to develop a
fully distributed, open, peer-to-peer framework, focussed on
shared interaction models that are executed by peers. In this
paper we focus on the application of this framework to the
coordination of medical guidelines, using as a case study the
assessment procedure (called triple assessment) followed by a
patient suspected of breast cancer.
In Section II we introduce the problem requirements of
formalising and enacting medical guidelines, exempli ed by
the breast cancer assessment case study in Section III. Then
in Section IV we discuss the advantages of a choreography-
based approach to the problem, against an orchestration-based
one. In Section V-B we cover OpenKnowledge, as a exibile,
choreography-based framework that can address some of the
requirements of medical guidelines. The evaluation of the
framework is presented in Subsection V-C. Finally other
approaches to choreography are brie y reviewed in Section
VI.
II. MEDICAL GUIDELINES
Gaps between medical theory and clinical practice are
consistently found in health service research. Care procedures
can differ signi cantly between different health centres, with
varying outcomes for patients, and medical errors could be
avoided if standard procedures were followed consistently.
One of the causes of discrepancies in care is the dif culty
in distributing and sharing ef ciently the large volume of
1
http://www.openk.org
JOURNAL OF SOFTWARE, VOL. 4, NO. 8, OCTOBER 2009 833
© 2009 ACADEMY PUBLISHER
doi:10.4304/jsw.4.8.833-842
Page 2
hidden
information continuously produced by medical research. These
issues have pushed the development of clinical practice guide-
lines: several studies [19] have shown that published guidelines
can improve the quality of care.
Guidelines are usually de ned by a committee of experts
and are provided as booklets, often hundreds of pages long,
covering relatively narrow elds or speci c pathologies. Hun-
dreds of guidelines are available, and generalist doctors are
expected to be aware of, and follow, the guidelines relevant to
each patient. The result is that guidelines are rarely followed,
and inconsistencies in medical treatments are not reduced as
much as hoped.
Information technology can improve the situation. Many
clinical guidelines provide informal descriptions of work-
ows and rules that can be translated into formal, machine-
executable, representations. Research has suggested that com-
puterised clinical supports can improve practitioner compli-
ance with guidelines [17].
Guidelines encode at least two separate types of knowledge:
they specify the overall coordination between actors, both
clinical and non-clinical (for example, the results of all the
exams are sent to a multidisciplinary team for discussion and
to the Electronic Health Record of the patient) and speci c
medical knowledge required in taking decisions (for example,
dosing drugs or classifying a melanoma). Guidelines are
developed for general adoption: the coordination speci cations
need to be adapted to the contingent realities of different
institutions and clinics, and the medical knowledge need to
be kept updated.
Finally, different guidelines specify medical procedures at
different level of abstraction, from the general framework
de ning the milestones of screening, intervention and follow-
up within the national health system, to the detail of the
dosing of a speci c drug. Depending on the condition and
on the abstraction level, there can be varying degrees of
freedom in the selection of the guideline, or in the selection
of participating clinicians. For example, a woman can choose
among several paths of care for her pregnancy, while the (more
critical) procedure for a heart attack is more stringent. In order
to enact the full pathway a patient has to go through the various
speci cations that should be integrated, selecting at each step
the best one, and adapting them to the contingencies.
In the work described in this paper we use OpenKnowledge,
atechnologydevelopedfordistributedpeer-to-peersystems,
to represent, integrate and enact the coordination aspect of
guidelines offering a level of exibility that improves porta-
bility between different institutions.
Before the details of OpenKnowledge, we rst introduce out
case study and discuss the differences between the centralised
and the distributed approaches.
III. CASE STUDY: ASSESSMENT FOR BREAST CANCER
We presen t as our case s tudy the gu ide l ine fo r a ssess ing
the presence of breast cancer, because, as we will describe in
more detail in the next section, a computer-based, centralised
work ow has already been developed, making the comparison
easier. This guideline is part of a wider work ow, that includes
periodical screening, surgical intervention and follow-up.
Breast cancer is the most commonly diagnosed cancer in
women, accounting for about 30% of all such cancers. One in
nine women will develop breast cancer at some point in their
lives. In the UK, women with symptoms that raise suspicion
of breast cancer are referred by their GP to designated breast
clinics. To increase the accuracy of diagnosis, a combination
of clinical examination, imaging and biopsy - known together
as triple assessment -isrecommended.
The rst element of triple assessment consists of gathering
the patient details and clinical examinations, and it is com-
pleted by a breast surgeon. If the clinical examination reveals
an abnormality then the patient is referred to a radiologist for
imaging (either ultrasound, mammography or both). If either
the examination or imaging ndings warrant it, then a biopsy is
performed, either by a radiologist or a surgeon, and the tissue
is sent to a pathologist for examination. The collective results
from all three tests in uence the nalmanagementofthepa-
tient. Depending on the resources available, different imaging
and pathology laboratories might be selected every time the
guideline is executed, possibly from different institutions.
Asmallnumberof worriedwell patientsmaynotqualify
for either imaging or biopsy and could be discharged straight
away. As the entire clinical process is distributed among
three different disciplines and involves a number of different
clinicians, a very close co-ordination and good communication
between those involved is essential for the smooth running of
the clinic.
IV. CENTRALISED AND DISTRIBUTED MODELS
The triple assessment model presented in [24] is designed
according to a centralised principle, and the abstract work ow
is shown in Figure 1. The centralised model has been im-
plemented
2
in PROforma [29], a process modelling language
developed within Cancer Research UK. Together with the
possibility of representing plans, sub-plans and actions, its
strength lays in its decision support system, based on argu-
mentation.
However, in PROforma it is not possible to explicitly
represent different roles in a guideline: roles are enforced
by requiring different permissions to access the activities.
Activities are queued in the todo list of a user, who nds
it after log in. When the user nishes the activity, it is marked
as completed and the work ow proceeds. While PROforma
provides a strong support for representing and reasoning about
medical knowledge, it is currently inadequate for representing
and executing coordinated activities between distributed ac-
tors.
The distributed nature of the procedure cannot be recon-
structed from its representation as a centralised work ow.
Moreover, medical knowledge speci c to different participants,
that is, the arguments and rules necessary for decisions in
various phases, is centralised in a single procedure, mak-
ing it impossible to reuse the same knowledge in different
guidelines. If the model is implemented in another work ow
language such as BPEL, the knowledge could be split amongst
agroupofservices,possiblyeachbasedonPROforma,but
2
Ademoisavailableat:http://tinyurl.com/tripleassessment
834 JOURNAL OF SOFTWARE, VOL. 4, NO. 8, OCTOBER 2009
© 2009 ACADEMY PUBLISHER
Page 3
hidden
Figure 1. Centralised representation of the triple assessment.
Figure 2. Activity diagram for the triple assessment
Figure 3. Fragment of the sequence diagram of a run of the triple assessment
JOURNAL OF SOFTWARE, VOL. 4, NO. 8, OCTOBER 2009 835
© 2009 ACADEMY PUBLISHER
Page 4
hidden
the process itself would still be represented from a single
perspective.
Amorerealisticrepresentationofthe owoftheprocedure
is provided by the UML Activity diagram of Figure 2, in
which the participants and their interactions are made explicit.
There are ve main actors: the breast surgery service (BSS), in
charge of the rst three activities in the work ow in Figure 1,
the breast imaging service (BIS), responsible for the imaging
decision and for the two alternative possible examinations,
the breast pathology service (BPS), in charge of the biopsy,
the multi-disciplinary team (MDT), responsible for the nal
decision together with the surgery service, and the patient.
The aim of this work is to allow the distribution of
knowledge and information to the actors in charge, the reuse
of knowledge in different guidelines, and the possibility of
easily adapting guidelines to the realities of actual institutions
and hospitals. We obtain this by separating the two different
aspects of guidelines into two abstraction layers. The higher
level consists in the choreography that speci es the expected
behaviour interface of the participants. The choreography is
shared, and the participants agree to follow it. The lower
level consists in the speci c knowledge and services, local
to participants, that are mapped to the choreography. The
choreography provides a speci c context and the semantics for
the interaction, giving the boundaries of the task of integrating
the different participants.
V. T HE OPENKNOWLEDGE APPROACH
A. Sharing choreographies
Achoreography-basedarchitecturecanhelpdevelopersto
think about distributed applications from a different perspec-
tive, one which scales better with an increasing number of
interacting actors. A distributed application becomes a set
of interactions between actors that assume different roles,
their internal behaviour is kept separated from their external
behaviour.
In most distributed approaches, the work ow engineer
selects the participants: the list of participants is a basic
construct both in choreography languages such as WS-CDL
[21], BPEL4Chor [11] and in orchestration languages such as
BPEL [2]. Some work ow architectures, such as Taverna [20],
let the developer specify a list of alternative services to call if
the rst fails, but the binding is still at design time. Service
descriptions are stored in shared repositories and the designer
looks up the speci c service for a task.
This approach makes portability more dif cult. As we have
seen, guidelines are usually written by a committee of experts
with a view to being generally applicable: when the plan has to
be implemented by adopting institutions, it may require heavy
customisation to adapt it to their requirements (for example,
the list of services to invoke is likely to be different, and
therefore part of the speci cation has to be changed).
In our approach, the choreographies are published in a
shared repository, similarly to the publication of a guideline
by a committee. Participants search the appropriate choreogra-
phies when they need to perform an activity that requires the
coordinated activity of other actors. The choreographies are
matched against the participants capabilities: the participants
select the choreography that best t them and advertise their
intention to perform a role in them. If all the roles are lled, the
interaction can start. In our triple assessment example, it means
that roles such as the breast imaging service or the breast
pathology service are bound at run time, when the procedure
needs to be performed, based on their availability.
We be l ieve tha t th i s a l lows increased reuse bo th o f chore -
ographies and of participants components, and allows the
deployment of distributed applications with different degrees
of freedom, maintaining the same architecture. Without alter-
ing the architecture presented in this paper, it is possible to
implement a closed system where for a task there is only one
possible choreography and a xed set of other participants that
are bound to it, and an open system where for a task a par-
ticipant has the freedom of choosing different choreographies,
and has a choice between different participants who are free
to accept.
In the choreography model, issues of heterogeneity and
brokering need to be addressed. Participants are likely to be
different and they need to understand one another. The same
services may be available from many peers, and the search
and discovery process can be complex, especially if it needs
to be performed at real time. The OpenKnowledge framework
deals with these issues.
B. The OpenKnowledge Framework
The OpenKnowledge kernel [28] provides the middleware
that assorted services can use to interact using a choreography-
based architecture able to deal both with the semantic hetero-
geneity of the actors and with their discovery. It has been
designed with the goals of lightweightness and compactness,
allowing a short development cycle for distributed applica-
tions.
The framework allows a direct translation of a choreogra-
phy oriented design into an executable application. The core
concept is the shared interaction models (IM), performed by
different applications and service providers. These actors are
the participants, called peers,oftheinteractions,andtheyplay
roles in them. In an interaction all the roles have equal weight;
the behaviour of all the peers and in particular their exchange
of messages are speci ed.
IMs are written in the Lightweight Coordination Calculus
(LCC) [26], [27], a compact, executable choreography lan-
guage based on process calculus. Its syntax is shown in Figure
4. The IMs are published by the authors on the distributed
discovery service (DDS) with a keyword-based description
[22].
An IM in LCC is a contract that de nes the expected,
externally observable, behavioural interfaces of the roles that
participants can take in the interaction: an IM is a set of
role clauses. Participants in an interaction take their entry-
role and follow the unfolding of the clause speci ed using
combinations of the sequence operator ( then ) or choice
operator ( or ) to connect messages and changes of role.
Messages are either outgoing to ( ⇒ ) or incoming from ( ⇐ )
another participant in a given role. A participant can take,
836 JOURNAL OF SOFTWARE, VOL. 4, NO. 8, OCTOBER 2009
© 2009 ACADEMY PUBLISHER
Page 5
hidden
Model := {Clause,...}
Clause := Role :: Def
Role := a (Type,Id)
Def := Role | Message | Def thenDef |
Def or Def
Message := M ⇒ Role | M ⇒ Role ← C |
M ⇐ Role | C ← M ⇐ Role
C := Constant | P (Term,...) |¬C | C ∧ C |
C ∨ C
Type := Term
Id := Constant | Variable
M := Term
Term := Constant | Variable| P (Term,...)
Constant := lower case character sequence or number
Variable := upper case character sequence or number
Figure 4. LCC syntax
a(bss, BSS)::
registration(P)⇐ a(patient, P) then
null← triage(TR, P) and examine(TR, ER, P) then


null← isImagingDecision(ER) then
doImaging(P, TI)⇒ a(bis, BIS)
← typeOfImaging(ER, TI)


or






















null← isFreeHandBiopsy(ER)
then
null← doFreeHandBiopsy(P,FHBR)


or
null← true






then


null← isBiopsy(ER) then
doSampleAnalysis(P)⇒ a(bps, BPS) then
doMDM(P, TR, ER, FHBR)⇒ a(mdt, MDT)
















then
a(bss_wait_resports([], Reports), BSS)
then
null← mngmt_decision(P, TR, ER, FHBR, Reports, Dec)
then
advise(Dec)⇒ a(patient, P)
a(bss_wait_reports(RPT, NewRPT), BSS)::
null← all_arrived(RPT) and NewRPT= RPT
or
(
biopsy_report(P, BR)⇐ a(bps, BPS) then
a(bss_wait_reports([BR|RPT], NewRPT)
)
or
(
mdm_report(P, MR)⇐ a(mdt, MDT) then
a(bss_wait_reports([MR|RPT], NewRPT)
)
or
(
image_report(P, IR)⇐ a(bis, BIS) then
a(bss_wait_reports([IR|RPT], NewRPT)
)
Figure 5. LCC clauses for the breast surgery service(BSS) role
a(bis, BIS)::
doImaging(P, TI)⇐ a(bss, BSS) then
null← acquire_image(P, TI, IMR) then
image_report(P, IMR)⇒ a(bss_wait_reports, BSS)
then




null← isBiopsyDecision(IMR) then
null← doBiopsy(P, IMR, S) then
doSampleAnalysis(P)⇒ a(bps, BPS) then
doMDM(P, TR, ER, FHBR)⇒ a(mdt, MDT)




or
null← true
Figure 6. LCC clause for the breast imaging service (BIS) role
during an interaction, more roles and can recursively take
the same role (for example when processing a list). Message
input/output or change of role is controlled by constraints. In
its de nition, LCC makes no commitment to the method used
to solve constraints - so different participants might operate
different constraint solvers.
Figures 5 and 6 show the LCC clause for the breast surgery
service (BSS)andthebreastimagingservice(BIS) roles in the
triple assessment procedure described in the activity diagram
of Figure 2. In the surgery service clause, the participant who
takes the bss role starts waiting for the registration message
from a patient. When the request message arrives, it executes
the triage and the examination on the patient. Based on the
result of the examination, the surgery service either asks the
imaging service to acquire an image, or performs a free hand
biopsy, asks the pathologist to analyse the sample, and sends
arequestforamulti-disciplinarymeeting.Thentheprocess
changes role, and waits for the reports from the contacted
specialists. When all the reports have arrived, the process
returns to the main role and makes the decision about the
patient.
In the imaging service clause, the actor taking the role
starts by waiting for a request doImaging(P,TI) from the
surgery service. Then it acquires and analyses an image using
the method speci ed in TI (either an ultrasound scan, or a
mammography), and sends the report to the surgery service.
If required it performs a biopsy, asks a pathologist to analyse
the sample and sends a report to the multi-disciplinary team
for discussion.
Most of the constraints, such as triage(P,TR) or
doFreeHandBiopsy(P,FHBR),correspondtoexternal
activities, that a doctor or a radiologist perform. They may be
completed by lling in computerised forms or by receiving
data from an external device such as an ultrasound scan
machine. Some constraints, such as acquire(P,TI,IMR)
may launch further interactions (in this case, a different one
depending on the method required, and involving a radiolo-
gist).
LCC prescribes only the ordering of the exchanged mes-
sages and their pre and post-conditions in the form of con-
straints peers need to solve. However, in OpenKnowledge
it is possible to annotate every element in the IM in order
to enrich the description of the interaction. Annotations are
mainly used to de ne the ontological type of the variables
JOURNAL OF SOFTWARE, VOL. 4, NO. 8, OCTOBER 2009 837
© 2009 ACADEMY PUBLISHER
Page 6
hidden
in messages and constraints, but time-out for messages and
constraints could be expressed using the same mechanism.
Figure 7 describes the syntax used in annotations. An example
of annotation for the variables in role bss is shown in
Figure 8. This speci c example shows how variables can be
structured terms: the variable P is, in the choreography, the
structure patient(name,surname,date_of_birth,
address(street,post_code))).Thestructureisa
short-hand for an XML-like schema, where the functions and
the parameters are nested tags in a tree. Annotations are always
relative to a speci c role-clause in the interaction: the scope
of a variable is always limited to a clause.
Annotations are conceptually separated from the IM itself.
It is possible to attach different annotations to the same
IM to adapt it to different requirements and contexts: for
example, the same abstract IM could be used in a different
community, that speci es patients by their insurance numbers,
and therefore annotates variable P differently.
Apeerthatwantstoperformsometask,suchasproviding
an imaging service for breast cancer screening, searches for
published IMs for the task by sending a keyword-based query
to the DDS. The DDS collects the published IMs matching
the description (the keywords are extended adding synonyms
to improve recall) and sends the list back to the peer, that
needs to choose the one to subscribe.
In open systems, IMs and peers may be designed by
different entities, and therefore the constraints and the peers
knowledge bases are unlikely to correspond perfectly. The
heterogeneity problem is dealt with in three phases. We have
already seen the rst phase: the DDS matches the interaction
descriptions using a simple query expansion mechanism. Then
the peers compare the constraints in the received IMs with
their own capabilities [18], and nally the peers need to map
the terms appearing in constraints and introduced by other
peers [9]. The scope of the matching problem is limited to the
speci c IM in the second phase, and to the speci c interaction
run in the third phase.
The peer capabilities are provided by plug-in components,
called OKCs (OpenKnowledge Components) that can be pub-
lished by the developers on the DDS and downloaded by
other peers. An OKC exposes a set of Java methods that are
compared to the constraints in the IMs. The arguments of
methods can be annotated similarly to variables in the IM.
Arguments, like variables in constraints, can be structured
terms. Figure 8 shows an annotated method with structured
arguments. As annotations are short-hands for XML-like trees,
values in an argument are accessed by their path, similarly
to a very simpli ed XPath: for example, to obtain the street
of a patient, the method doTriage will call the method
getValue(“/street”) of the argument P.
The peer matches the annotated signatures of the constraints
and of the methods, transforming them into trees and verifying
their distance [15], [18]. The comparison process creates
adaptors,thatbridgetheconstraintstothemethods,asshown
in Figure 9. An adaptor has a con dence level, re ecting the
distance between the constraints and the matching method,
that gives a measure of how well the peer can execute an
interaction, and it used to select the most tting IM. Once
annotation := @annotation(about, innerAnnot)
about := @role(Role)|@message(M)|
@constraint(Term)|
@variable(Variable)
innerAnnot := annotation|tree
tree := Constant|tree|Constant, tree
Figure 7. Annotations syntax
Constraint annotations
@annotation(@role(bss),
@annotation(@variable(TR),
risk_level)
)
@annotation(@role(bss),
@annotation(@variable(P),
patient(name,surname,date_of_birth,
address(street,post_code)) )
)
Java method annotation
@MethodSemantic(language=”tag”,
params={
“patient(family_name,birthday,street,post_code)”,
“risk(assessed_level,confidence)”
})
public boolean doTriage(Argument P,
Argument TR)
{...}
Figure 8. Annotations for the constraint triage(P,TR) and for a corresponding
method
the peer has selected an IM, it subscribes to its role in the
discovery service. Figure 10 shows a snapshop of network
status when roles in an interaction are subscribed by at least
one peer.
When all the roles are lled, the discovery service chooses
randomly a peer in the network as coordinator for the interac-
tion, and hands over the IM together with the list of involved
peers in order to execute it.
The coordinator rst asks each peer to select the peers
they want to interact with, forming a mutually compatible
group of peers out of the replies. The selection process is
subjective to the peers. All the participants receive the list
of peers subscribed to all the roles, and they can check
the subscriptions, selecting the preferred ones. A peer can
also select none of the participants, excluding itself from a
particular run of the interaction, for instance due to overload.
The framework provides only an interface for the selection
Figure 9. Adaptor for constraint deliver(M,P)
838 JOURNAL OF SOFTWARE, VOL. 4, NO. 8, OCTOBER 2009
© 2009 ACADEMY PUBLISHER
Page 7
hidden
Figure 10. OpenKnowledge architecture
method, as its implementation is delegated to the application
developer. However, different strategies have been tested, as
we will see in the evaluation section.
While different implementations are possible, in the current
version of the OpenKnowledge kernel the coordinator executes
the interaction, instantiatingalocalproxyforeachpeer.The
remote peers are contacted only to solve constraints in the role
they have subscribed.
C. Framework evaluation
The preceding sections have demonstrated how clinical
guidelines (typically designed to operate in centralised sys-
tems) can be reconstructed in a de-centralised style. Im-
portantly, we have done this without requiring a special-
purpose representation language or enactment system - we
have used the generic OpenKnowledge kernel system. This
means that the mechanisms developed and evaluated for the
generic kernel apply, as do the evaluations already performed
on those mechanisms. In this section we summarise what we
know to date about the scalability and performance of these
mechanisms.
Beyond the basic mechanisms of enactment, the key ingre-
dients necessary to build large scale systems of peer to peer
choreography using the approach described above are:
• Discovery: It has to be possible to discover interaction
models ef ciently in the peer network.
• Ontology matching: It has to be possible to adapt to dif-
ferences in naming the objects described in interactions.
• Reputation: Peers, which may not have interacted with
others previously, must have ways of judging with which
of the many other peers it might be useful to interact.
We summar i se cur ren t p rogress in eva lua t ion on each of
these three dimensions. Further details of the evaluative ex-
periments, plus the OpenKnowledge kernel system, can be
obtained at www.openk.org.
Discovery:OpenKnowledgereliesonkeywordmatching
(via distributed hash tables) for the discovery of interaction
models so we were concerned to evaluate whether this sort
of mechanism is likely to be effective when interactions
are choreographing large populations of Web services. Large
collections of interaction models do not yet exist (since this
is new technology) but Web service technology is suf ciently
mature that there do exist large repositories of Web service
interface speci cations (harvested from the Web). Since a
choreographed collection of services is analogous, itself, to
aservicethentheserepositoriesgivetheclosestapproxima-
tion currently available to a naturally occurring population.
Researchers in Amsterdam performed a large scale evaluation
of the OpenKnowledge discovery service operating real-time
on a large number of WSDL les [3]. The WSDL corpus
(believed to be the largest such corpus in existence at the
time) contains 54,245 unique WSDL documents that were
obtained during a focused crawl performed in March 2008. 400
instances of the system were run simultaneously, each pub-
lishing numerous descriptors so as to give a combined system
publishing throughput of one description per millisecond. To
bring an element of peer unavailability into the experiments,
aproportionofrandomlychosenpeersweremadetofail,
without notifying other peers. Each live node made a series
of queries and collected the results, so as to give a combined
system query throughput of one query per millisecond. Our
approach showed signi cant performance gain, compared to
two reference approaches: it either maintained very high recall
with four times less messages or used the same low number
of messages for a 29% gain in recall.
Ontology matching:Althoughmanychoreographiesmay
require no ontology matching (because we want to use in-
teraction models analogously to traditional work ow lan-
guages) some interactions can be made more accessible to
abroaderrangeofusersbyalowingontologymatching.
The OpenKnowledge kernel therefore allows extensions to
perform ontology mapping, allowing different mapping algo-
rithms to be accessed via the kernel as either OpenKnowledge
components (if the mapping implementations are available as
software components that can be shared) or as services (if
the mapping implementations are hosted remotely). We then
exploit OpenKnowledge s use of interaction models to allow
different forms of dynamic ontology matching at runtime.
These different forms are are: semantic matching of structures
within an interaction model; prediction of word use based
on statistical analysis of interactions; and ontology mapping
via alignment between interaction models. To evaluate these,
the OpenKnowledge matching component was applied to two
selected test sets: real-world ontologies (we used different ver-
sions of the Standard Upper Merged Ontology, SUMO, and the
Advanced Knowledge Technologies, AKT, ontologies); and a
large-scale synthesized set obtained from real world GIS web
services [32]. The results con rm the robustness of the OK
matching component (in terms of standard relevance metrics:
precision, recall and f-measure) against variation of internal
parameters (i.e. thresholds) and probability of alterations. The
OpenKnowledge matching plug-in (integrated in the Open-
Knowledge kernel) is capable of reproducing results similar to
state-of-the-art baseline matchers for syntactic variation while
outperforming it when semantic variations are applied
Reputation:Thesimplestmeasureofreputationinservice
JOURNAL OF SOFTWARE, VOL. 4, NO. 8, OCTOBER 2009 839
© 2009 ACADEMY PUBLISHER
Page 8
hidden
choreography is (arguably) the level of successes or failures
of peers when involved in interacting with others. This is used
by the OpenKnowldge kernel to give basic statistical measures
of the reputation of a peer (with respect to its performance
in earlier interactions), provided at no cost to the users of
the system and giving a baseline mechanism for choosing
one peer over another (analogous to the use of page ranks
in choosing one page over another in the Web). Other levels
of sophistication in trust and reputation can be built on top
of this, allowing for diversity in approaches. Similarly to
the ontology mapping approach (with which this interacts)
speci c algorithms for assessment of good enough answers
and trust assessment can then be plugged in to extend the
basic mechanism. We have evaluated this mechanism in two
contrasting domains. In bioinformatics we applied it to match-
ing protein spectra returned from three of the most common
proteomics data services. Our reputation mechanism allowed
us to rank automatically these services appropriately in terms
of the quality of data provided [25]. In emergency response,
researchers at Trento applied the mechanism to the trust over
peers in the evacuation phase of the simulated emergency,
using GIS data from the Trentino region of Italy. Simulations
using a centralised plan for response were compared to the
same system in which decision making was decentralised
(hence more robust to failure of individual components) [31].
It was possible, using our peer ranking system to detect
failing peers, to maintain levels of response comparable to
acentralisedsysteminthepresenceofsigni cantlevelsof
disruption in the peer network (simulated by randomly failing
aproportionofpeers).
The evaluations summarised above do not, of course, predict
how effective our system will be in some speci c, new
application. Evaluation of that requires speci c testing with the
demands of that application in mind. It does, however, show
that the central elements of this style of open choreography
have desirable scaling properties that skilled engineers might
harness to adapt the general framework to speci c applications
-forexamplethehealthcaredomainofthispaper.
VI. RELATED WORK
The majority of work ow research has focused on designing
languages for and implementing service orchestrations, from
the view of a single participant. Examples can be seen in the
Business Process Modelling community through BPEL [30]
and life sciences community through Taverna [23]. However,
there are relatively few languages targeted speci cally at
service choreography, the most widely known are:
• WS-CDL: The Web Services Choreography Description
Language (WS-CDL) is the proposed standard for service
choreography, currently at the W3C Candidate Recommen-
dation stage. However, WS-CDL has met criticism [7], [13]
through the Web services community. It is not within the scope
of this paper to provide a detailed analysis of the constructs
of WS-CDL, this research has already been presented [16].
However, it is useful to point out the key criticisms with the
language: WS-CDL choreographies are tightly bound to spe-
ci c WSDL interfaces, WS-CDL has no multi-party support,
no agreed formal foundation, no explicit graphical support and
few or incomplete implementations.
• Let s Dance [34]: is a language that supports service
interaction modelling both from a global and local viewpoint.
In a global (or choreography) model, interactions are described
from the viewpoint of an ideal observer who oversees all
interactions between a set of services. Local models, on the
other hand focus on the perspective of a particular service,
capturing only those interactions that directly involve it. Using
Let s Dance, a choreography consists of a set of interrelated
service interactions which correspond to message exchanges.
Communication is performed by an actor playing a role. Inter-
action is speci ed using one of three Let s Dance constructs:
precedes —thesourceinteractioncanonlyoccurafterthetarget
interaction has occurred; inhibits —denotesthatafterthesource
interaction has occurred, the target interaction can no longer
occur, and nally weak precedes —denotesthatthetarget
interaction can only occur after the source interaction has
reached a nal status, e.g. completed or skipped. A complete
overview of the Let s Dance language is presented in [34],
including solutions to the Service Interaction Patterns [8].
• BPEL4Chor [12]: is a proposal for adding an additional
layer to BPEL to shift its emphasis from an orchestration
language to a complete choreography language. BPEL4Chor is
asimple,collectionofthreeartifacttypes:participant behavior
descriptions de ne the control ow dependencies between
activities, in particular between communication activities, at
agivenparticipant.Aparticipant topology describes the
structural aspects of a choreography by specifying participant
types, participant references and message links; this serves as
the glue between the participant behavior descriptions. Finally
participant groundings de ne the technical con guration de-
tails, the choreography becomes Web service speci c, concrete
links to WSDL de nitions and XSD types are established.
BPEL4Chor is an effective proposal and importantly conforms
to standards [4] by enhancing the industrially supported BPEL
speci cation. BPEL4Chor encourages reuse by only providing
aspeci cWebservicemappingintheparticipantgrounding.
Furthermore, unknown numbers of participants can be mod-
elled, not possible with WS-CDL.
There are several proposals for extending the Business
Process Modelling Notation [1]; the de-facto standard for
business process modelling. Although the BPMN allows an
engineer to de ne choreographies through a swimlane concept
and a distinction between control ow and message ow, it
only provides direct support for a limited set of the Service
Interaction Patterns and not some of the more advanced
choreography scenarios. [10] introduces a set of extensions
for BPMN which facilitate an interaction modelling approach
as opposed to modelling interconnected interface behaviour
models. Authors claim that choreography designers can un-
derstand models more effectively, introduce less errors and
build models more ef ciently. Evaluation concludes that the
majority of the Service Interaction Patterns can be expressed
with the additional extensions. [14] discusses the de ciencies
of the BPMN for choreography modelling and proposes a
number of direct extensions for the BPMN which overcome
these limitations.
840 JOURNAL OF SOFTWARE, VOL. 4, NO. 8, OCTOBER 2009
© 2009 ACADEMY PUBLISHER
Page 9
hidden
Hybrid orchestration/choreography approaches exist, in par-
ticular the Circulate architecture [6] maintains the robust-
ness and simplicity of centralised orchestration, but facilitates
choreography by allowing services to exchange data directly
with one another. Performance analysis [5] concludes that a
substantial reduction in communication overhead results in a
2—4 fold performance bene t across all work ow patterns.
VII. CONCLUSION AND FUTURE WORK
In this paper we have argued that a distributed,
choreography-based paradigm has a number of practical ben-
e ts when applied to design and implementation of complex
systems, such as distributed clinical work ows. The chore-
ography paradigm provides a clean approach to issues of
portability of work ow speci cation (allowing a high-level
guideline to be implemented at different sites which are likely
to have different local procedures for carrying out tasks within
the guideline), re-use of process speci cations (allowing site-
speci c implementation detail tobere-usedwithindifferent
high-level process speci cations), and abstraction away from
the speci c entities providing services, which might change
both between sites and between runs of the process.
OpenKnowledge provides an operational framework for
quickly setting up such systems. In OpenKnowledge systems
are composed around interaction models that coordinate the
peers behaviours by specifying the roles they can take, the
exchange of messages between the roles and the constraints
of the messages. Peers participateintheinteractiontakingone
(or more) roles: in order to participate they need to compare
the constraint in the roles with their available services and
subscribe to the interaction on a distributed discovery service,
that initiates interactions when their roles are lled.
In the current version of OpenKnowledge, constraints in the
choreography are semantically matched with the capabilities
of the peers. However, while feasible using the annotations,
there is still no support for specifying requirements on how
the constraints should be solved, or on what requirements
the participants should have (for example, the peer taking the
doctor role should be certi ed by the competent institution).
Improving the speci cations can help the peers both in se-
lecting the proper interactions for their goals and in matching
their capabilities with those required by the choreography.
VIII. ACKNOWLEDGEMENTS
This research was partially funded by the European Com-
mission within the 6th Framework project OpenKnowledge
(number 027253), by Cancer Research UK and by EPSRC
grant EP/F057326/1 (the Safe and Sound
3
project).
REFERENCES
[1] Business Process Modeling Notation (BPMN) Speci cation.
Technical report, Object Management Group (OMG),
2006, http://www.omg.org/spec/BPMN/1.1/PDF
[16/12/2008].
[2] Web services business process execution language version 2.0,
April 2007.
3
http://www.clinicalfutures.org.uk/
[3] G. Anadiotis, S. Kotoulas, H. Lausen, and R. Siebes. Massively
scalable web service discovery. In Proceedings of the 23rd
Intl. Conference on Advanced Information Networking and
Applications (AINA 09),2009.
[4] A. Barker and J. van Hemert. Scienti c Work ow: A Survey
and Research Directions. In R. Wyrzykowski and et al., editors,
Seventh International Conference on Parallel Processing and
Applied Mathematics, Revised Selected Papers,volume4967
of LNCS,pages746—753.Springer,2008.
[5] A. Barker, J. B. Weissman, and J. van Hemert. Eliminating the
Middle Man: Peer-to-Peer Data ow. In HPDC 08: Proceedings
of the 17th International Symposium on High Performance
Distributed Computing,pages55—64.ACM,June2008.
[6] A. Barker, J. B. Weissman, and J. van Hemert. Orchestrating
Data-Centric Work ows. In The 8th IEEE International Sym-
posium on Cluster Computing and the Grid (CCGrid),pages
210—217. IEEE Computer Society, May 2008.
[7] A. Barros, M. Dumas, and P. Oaks. A Critical Overview of the
Web Services Choreography Descript ion Language (WS-CDL).
BPTrends Newsletter 3, March 2005.
[8] A. Barros, M. Dumas, and A. ter Hofstede. Service Interaction
Patterns. In Proceedings of the 3rd International Conference on
Business Process Management,pages302—318.Springer,2005.
[9] P. Besana and D. Robertson. How service choreography
statistics reduce the ontology mapping problem. In ISWC2007,
2007.
[10] G. Decker and A. Barros. Interaction Modeling Using BPMN.
In Proceedings of the 1st International Workshop on Collabo-
rative Business Processes (CBP),volume4928ofLNCS,pages
206—217. Springer, 2007.
[11] G. Decker, O. Kopp, F. Leymann, and M. Weske. BPEL4Chor:
extending BPEL for modeling choreographies. In I. C. Society,
editor, Proceedings of the IEEE 2007 International Conference
on Web Services (ICWS 2007), Salt Lake City, Utah, USA, July
2007,pages296—303,SaltLakeCity,Juli2007.IEEEComputer
Society.
[12] G. Decker, O. Kopp, F. Leymann, and M. Weske. BPEL4Chor:
Extending BPEL for Modeling Choreographies. In Proceedings
of the IEEE 2007 International Conference on Web Services
(ICWS 2007),pages296—303,2007.
[13] G. Decker, H. Overdick, and J. M. Zaha. On the Suitability
of WS-CDL for Choreography Modeling. In Proceedings
of Methoden, Konzepte und Technologien f r die Entwicklung
von dienstebasierten Informationssystemen (EMISA), LNI P-95,
Hamburg, Germany,pages21—34,October2006.
[14] G. Decker and F. Puhlmann. Extending BPMN for Mod-
eling Complex Choreographies. In Proceedings of the 15th
International Conference on Cooperative Information Systems
(CoopIS),volume4803ofLNCS,pages24—40.Springer,2007.
[15] G. F., Y. M., and M. F. Structure preserving semantic matching.
In Proceedings of the ISWC+ASWC International workshop on
Ontology Matching (OM),Busan(KR),2007.
[16] L. Fredlund. Implementing WS-CDL. In Proceedings of the
second Spanish workshop on Web Technologies (JSWEB 2006),
Universidade de Santiago de Compostela, November 2006.
[17] A. Garg, N. Adhikari, H. McDonald, P. Rosas-Arellano, P. De-
vereaux, J. Beyene, J. Sam, and R. Haynes. Effects of
computerised clinical decision support systems on practictioner
performance and patient outcome: a systematic review. JAMA,
293:1223—1238, 2005.
[18] F. Giunchiglia, F. McNeill, M. Yatskevich, J. Pane, P. Besana,
and P. Shvaiko. Approximate structure preserving semantic
matching. In On the Move to Meaningful Internet Systems:
OTM 2008,pages1217—1234,2008.
[19] J. Grimshaw and I. Russel. Effect of clinical guidelines on
medical practice: A systematic review of rigorous evaluations.
Lancet,342:1317—1322,1993.
[20] D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock,
P. L i , and T. O inn . Tave rna : a t oo l f o r bu i l d i ng and runn i ng
JOURNAL OF SOFTWARE, VOL. 4, NO. 8, OCTOBER 2009 841
© 2009 ACADEMY PUBLISHER
Page 10
hidden
work ows of services. Nucleic Acids Res,34(WebServer
issue):W729—W732, Jul 2006.
[21] N. Kavantzas, D. Burdett, G. Ritzinge, T. Fletcher, Y. Lafon,
and C. Barreto. Web services choreography description lan-
guage version 1.0. http://www.w3.org/TR/2005/CR-ws-cdl-10-
20051109/, November 2005.
[22] S. Kotoulas and R. Siebes. Deliverable 2.2: Adaptive routing in
structured peer-to-peer overlays. Technical report, OpenKnowl-
edge.
[23] T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Green-
wood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and
P. L i . Tave rna : a t oo l f o r t he compos i t i on and enac tmen t o f
bioinformatics work ows. Bioinformatics,20:3045—3054,2004.
[24] V. Patkar, C. Hurt, R. Steele, A. Purushotham, M. Williams,
R. Thomson, and J. Fox. Evidence-based guidelines and
decision support services: a discussion and evaluation in triple
assessment of suspected breast cancer. British Journal of
Cancer,95:1490—1496,2006.
[25] A. Perreau de Pinninck, C. Sierra, C. Walton, D. De la Cruz,
D. Robertson, D. Gerloff, E. Jaen, Q. Li, J. Sharman, J. Abian,
M. Schorlemmer, P. Besana, S. Leung, and X. Quan. D6.4:
Summative report on bioinformatics case studies. Technical
report, University of Edinburgh, 2008.
[26] D. Robertson. Multi-agent coordination as distributed logic pro-
gramming. In International Conference on Logic Programming,
Sant-Malo, France, 2004.
[27] D. Robertson, C. Walton, A. Barker, P. Besana, Y. Chen-
Burger, F. Hassan, D. Lambert, G. Li, J. McGinnis, N. Osman,
A. Bundy, F. McNeill, F. van Harmelen, C. Sierra, and G. F.
Models of interaction as a grounding for peer to peer knowledge
sharing. Advances in Web Semantics I: Ontologies, Web Services
and Applied Semantic Web,4891/2009:81—129,2009.
[28] R. Siebes, D. Dupplaw, S. Kotoulas, A. P. de Pinninck, F. van
Harmelen, and D. Robertson. The openknowledge system:
an interaction-centered approach to knowledge sharing. In
Proceedings of the 15th Intl. Conference on Cooperative In-
formation Systems (CoopIS),2007.
[29] R. Sutton and J. Fox. The syntax and semantics of the proforma
guideline modeling language. Journal of the American Medical
Informatics Association,10(5):433—443,Sep/Oct2003.
[30] The OASIS Committee. Web Services Business Process Exe-
cution Language (WS-BPEL) Version 2.0, 2007.
[31] G. Trecarichi, V. Rizzi, L. Vaccari, M. Marchese, and P. Besana.
Openknowledge at work: exploring centralized and decentral-
ized information gathering in emergency contexts. In ISCRAM
2009,2009.
[32] L. Vaccari, P. Shvaiko, and M. Marchese. An emergent seman-
tics approach to semantic integration of geo-services and geo-
metadata in spatial datainfrastructures. International Journal
of Spatial Data Infrastructured Research,4:24—51,2009.
[33] W. var der Aalst, L. Aldred, M. Dumas, and ter Hofst-
ede A.H.M. Design and implementation of the yawl system. In
Proceedings of the 16th International conference on Advanced
Information Systems Engineering (CAISE 2004),2004.
[34] J. M. Zaha, A. Barros, M. Dumas, and A. ter Hofstede.
Let s Dance: A Language for Service Behavior Modelling. In
R. Meersman and T. Z, editors, OTM Conferences (1),volume
4275 of LNCS,pages145—162.SpringerVerlag,2006.
Paolo Besana is a post doctoral researcher at the University
of Edinburgh. He is currently working on Safe and Sound
project funded by EPSRC and in collaboration with Cancer
Research UK, aiming at creating a distributed approach to
medical decision support systems. Previously he worked in
the EU-funded OpenKnowledge project, whose target was
the dynamic integration of the components and services in
open, peer-to-peer systems. He holds a PhD in Informatics
from the University of Edinburgh and a degree equivalent
to an BSc+MSc in Telecommunication Engineering from the
Politecnico of Milano. Before the PhD, he worked as software
engineer for 5 years.
Vivek Patkar is a clinical research fellow at department
of academic oncology at Royal Free Hospital. His research
interests include evidence-basedmedicaldecisionmakingand
clinical decision support systems. He received his master s
degree in surgery from Bombay University.
Adam Barker is a postdoctoral Research Assistant in the
Department of Engineering Science, University of Oxford.
Prior to joining Oxford, he was employed as at the National
e-Science Centre (NeSC), in Edinburgh. Adam holds a PhD
in Informatics from the University of Edinburgh, MSc in
Distributed Systems and BSc in Software Engineering from
Newcastle University. Complementing his academic experi-
ence, Adam has completed internships at Hewlett-Packard and
BAe Systems. For further information and list of publications
please refer to http://www.adambarker.org
David Glasspool is a senior research fellow in the School
of Informatics at the University of Edinburgh. His research
interests include executive control, routine behaviour, and
planning in the face of uncertainty, in both computational
systems and human cognitive psychology. He has applied this
theoretical work in the development of clinical computing sys-
tems. He received his PhD in computational modelling from
the psychology department of University College London.
David Robertson is the Director of the Centre for In-
telligent Systems and their Applications, part of the School
of Informatics at the University of Edinburgh. His current
research is on formal methods for coordination and knowl-
edge sharing in distributed, open systems - the long term
goal being to develop theories, languages and tools that
out-perform conventional software engineering approaches in
these arenas. He was coordinator of the OpenKnowledge
project (www.openk.org) and was a principal investigator on
the Advanced Knowledge Technologies research consortium
(www.aktors.org), which are major EU and UK projects in
this area. His earlier work was primarily on program synthesis
and on the high level speci cation of programs, where he built
some of the earliest systems for automating the construction
of large programs from domain-speci c requirements. He has
contributed to the methodology of the eld by developing the
use of "lightweight" formal methods - traditional formal meth-
ods made much simpler to use in an engineering context by
tailoring them to a speci c type of task. As an undergraduate
he trained as a biologist and continues to prefer biology-related
applications of his research, although methods from his group
have been applied to other areas such as astronomy, simulation
of consumer behaviour and emergency response.
842 JOURNAL OF SOFTWARE, VOL. 4, NO. 8, OCTOBER 2009
© 2009 ACADEMY PUBLISHER

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

7 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
29% Student (Master)
 
14% Post Doc
 
14% Ph.D. Student
by Country
 
43% United Kingdom
 
29% Brazil
 
14% Australia

Groups

Informatics