Sign up & Download
Sign in

An ActOn-based semantic information service for EGEE

by W Xing, O Corcho, C Goble, M D Dikaiakos
2007 8th IeeeAcm International Conference on Grid Computing (2007)

Abstract

We describe an information service that aggregates metadata available in hundreds of information sources of the EGEE Grid infrastructure. It uses an ontology-based information integration architecture (ActOn), which is suitable the highly dynamic distributed information sources available in Grid systems, where information changes frequently and where the information of distributed sources has to be aggregated in order to solve complex queries. These two challenges are addressed by a metadata cache that works with an update-on-demand policy and by an information source selection module that selects the most suitable source at a given point in time, respectively. We have evaluated the quality of this information service, and compared it with other similar services from the EGEE production testbed, with promising results.

Cite this document (BETA)

Available from Oscar Corcho's profile on Mendeley.
Page 1
hidden

An ActOn-based semantic information service for EGEE

An ActOn-based Semantic Infor
EGEE
Wei Xing #1, Oscar Corcho #2, Carole Goble #3, M
#School of Computer Science, University o
Oxford Road Manchester, UK
1wxing@cs.man.ac.uk
2ocorcho@cs.man.ac.uk 3carole.goble
∗Department of Computer Scien
University of Cyprus, CYPRUS
4mdd@cs.ucy.ac.cy
Abstract—We describe an information service that aggregates
metadata available in hundreds of information sources of the
EGEE Grid infrastructure. It uses an ontology-based infor-
mation integration architecture (ActOn), which is suitable the
highly dynamic distributed information sources available in Grid
systems, where information changes frequently and where the
information of distributed sources has to be aggregated in order
to solve complex queries. These two challenges are addressed by a
metadata cache that works with an update-on-demand policy and
by an information source selection module that selects the most
suitable source at a given point in time, respectively. We have
evaluated the quality of this information service, and compared
it with other similar services from the EGEE production testbed,
with promising results.
I. INTRODUCTION AND MOTIVATION
EGEE [1] provides a production quality grid infrastructure
spanning more than 30 countries with over 160 sites to a
myriad of applications from various scientific domains, in-
cluding Earth Sciences, High Energy Physics, Bioinformatics
and Astrophysics. In such a large-scale Grid system, there are
thousands of heterogeneous, loosely coupled resources, ser-
vices, and applications, which are distributed geographically
in a wide range. The current EGEE production testbed includes
over 200 sites, 35,000 CPUs, 13 Petabytes of storage space
in hundreds of storage elements, and an average of 40,000
concurrent jobs per day on behalf of 100 Virtual Organisations
(VOs).
Having information about those heterogeneous entities is
critical for the EGEE gLite middleware [2]. This information
is used for tasks such as resource discovery, workflow or-
chestration, meta-scheduling, and security. Such information
is normally aggregated and provided by information services,
which can be defined as “databases of attribute metadata
about resources” [3]1 Examples of information services are
BDII [4] and MDS [5], focused on hardware and software
resources; and RGMA [6], focused on jobs, services and
running environments.
The main limitations of existing information services are
that they do not provide enough information about large-scale
1In the rest of the paper, we will use the terms information and metadata
interchangeably.
distributed sy
specific aspec
provide accur
Grid resource
To overcom
of an inform
different infor
using an on
ture [7]. The
following cha
nature of Grid
• Metadata
tributes,
heteroge
sources.
sources c
a resourc
the most
a specifi
• Metadata
quently,
availabil
to. This
metadata
instance,
network
• Different
overlapp
and form
formatio
These chal
mation servic
based inform
generate and
large-scale di
First, ActO
which inform
pressive mod
exploited with
1-4244-1560-8/07/$25.00 © 2007 IEEE 81mation Service for
arios D. Dikaiakos ∗4
f Manchester
@cs.man.ac.uk
ce
stems like EGEE, since they only focus on a few
ts of such systems, and that they do not always
ate information about the actual status of the
s that they refer to.
e these two limitations, we propose the creation
ation service that aggregates information from
mation services in the EGEE production testbed,
tology-based information integration architec-
aggregation of distributed information poses the
llenges, due to the dynamic and heterogeneous
s:
of a Grid entity consists of multiple at-
whose values can be normally obtained from
neous and geographically-distributed information
In a large-scale Grid system, several information
an provide the same piece of information about
e. And it may be difficult to identify and locate
suitable (and available) information source for
c information need.
about most Grid entities may be updated fre-
so as to reflect the current status (capability and
ity) of the services and resources that it refers
makes it hard to create and maintain up-to-date
about all the resources available in a Grid. For
the usage level of a CPU, storage space, and
connection may change every few minutes.
information sources or services may provide
ing views of the Grid state, in different schemas
ats, and with different characteristics of their in-
n provenance (update frequency, quality-related).
lenges are addressed in our ActOn-based infor-
e. ActOn (Active Ontology) [8] is an ontology-
ation integration approach that can be used to
maintain up-to-date metadata for a dynamic,
stributed system.
n uses ontologies to describe the domain for
ation will be aggregated. This provides an ex-
el to describe that information, which can be
query languages and use for validation purposes
8th Grid Computing Conference
Page 2
hidden
(e.g., to detect inconsistencies in the aggregated information)
and for deriving new information. It also provides an extensi-
ble data model where changes in the descriptions of resources
and services, or in the information sources (update frequency,
information quality, etc.) are automatically reflected in the
behaviour of the system.
Second, ActOn incorporates two modules that are not
commonly found in other ontology-based information inte-
gration architectures: a cache, which provides fast access to
information that has been already integrated and materialised
and which is still valid, and an information source selector,
which is used during the generation of the execution plan for
retrieving information from the information sources and allows
the system to adapt to changing conditions of the infrastructure
and to add new information services easily.
The remaining of this paper is organised as follows. Section
2 introduces related work, namely existing Grid information
services. Section 3 presents the architecture of ActOn, focus-
ing on its different knowledge and software components, and
on the main interactions between them, and describing how
each of them are instantiated for the implementation of our
EGEE Grid information service. Section 4 gives the results
of our evaluation on the information quality of our approach,
and compares them with other two EGEE Grid information
services. Finally, Section 5 provides conclusions, and describes
open issues and our planned future work.
II. RELATED WORK: GRID INFORMATION SERVICES
Currently, there are several well-known and widely-used
Grid information services: Monitoring and Discovery System
(MDS), Berkeley DB Information Index (BDII), and RGMA
[5], [4], [6]. These services are deployed in most Grid systems,
such as Europe Data Grid, Crossgrid, NASA Grid, and Open
Science Grid [9], [10], [1], [11], [12], and widely used by Grid
middleware and applications running on them.
MDS [5] is the information service component of the
Globus platform. In MDS2.x, information about Grid re-
sources is extracted by ”information providers”, which are
software programs that collect and organise information from
individual Grid entities, either by executing local operations
or by contacting third-party information sources (e.g., the
Network Weather Service, SNMP, etc.). Extracted information
is organised according to the LDAP (Lightweight Directory
Access Protocol) data model in LDIF format and uploaded
into LDAP-based servers of the Grid Resource Information
Service (GRIS). GRIS servers can register themselves in the
Grid Index Information Services (GIIS) in order to aggregate
directories, using a soft-state registration protocol called Grid
Registration Protocol (GRRP). One of the disadvantages of
MDS is that it is based on the LDAP data model, which is too
rigid to be adopted or to represent the heterogeneous infor-
mation in/on Grids. It also lacks in the ability of supporting
complex queries.
BDII [4] is an improvement of MDS [5], designed to
improve its query performance. It uses the MDS informa-
tion model and access API and caches information with the
Berkeley DB.
based servers
an ldapsearch
that generates
database. BDI
expression an
RGMA [6]
information s
implemented
Architecture (
models the in
core types of
mation; (ii) c
a single regis
producers and
properties ove
the registry in
supply or rece
registry. And
relational data
current imple
can be access
of RGMA is t
tion about tim
that comprise
servlet-based
III. ACT
ActOn (Ac
mation integr
and maintain
distributed sy
characteristics
use as a runni
service that w
The develo
ments that ar
needs that we
the EGEE Gr
• We need
metadata
of a larg
• We need
a continu
for a larg
• We need
tion sour
heteroge
of inform
available
• We need
those asp
Although t
veloping an
Grid infrastru
82An update process is used to populate LDAP-
. It consists in obtaining LDIF, either by doing
on LDAP URLs or by running a local script
LDIF. Then the LDIF is inserted into the LDAP
I has the same problems as MDS for information
d query.
is a framework that combines monitoring and
ervices based on a relational model, which is
with XML. It implements the Grid Monitoring
GMA) proposed by the Open Grid Forum. GMA
formation infrastructure of the Grid using three
components: (i) producers, which provide infor-
onsumers, which request information; and (iii)
try, which mediates the communication between
consumers. RGMA implements two additional
r GMA. First, consumers and producers handle
a transparent way; thus, anyone using RGMA to
ive information does not need to know about the
second, all the information appears as one large
base and can be queried as such (anyway, in the
mentation, the database is centralised). RGMA
ed using the RGMA API. The main drawback
hat it cannot easily manage the dynamic informa-
e-sensitive Grid resources, due to its architecture
s a central registry and distributed information
information producers.
IVE ONTOLOGY (ACTON) AND THE EGEE
INFORMATION SERVICE
tive Ontology) [8] is an ontology-based infor-
ation approach that can be used to generate
up-to-date metadata for a dynamic, large-scale
stem. In this section we will describe the main
of this approach and its architecture, and will
ng example the details of the EGEE information
e have built with this approach.
pment of ActOn was based on a list of require-
e based on the actual information integration
re identified in dynamic, distributed systems like
id, Crossgrid, and Unicore [1], [10], [13].
to deal with frequent changes of parts of the
, caused by the dynamic features of the entities
e-scale distributed system.
to have an efficient and economic way to avoid
ous metadata update process, which is expensive
e-scale distributed system.
to be able to select the most suitable informa-
ce from a set of geographically-distributed and
neous ones, which provide overlapping pieces
ation, in different formats, and which can be
or unavailable at a given point in time.
to create/update the metadata that captures only
ects that we are interested in.
hese requirements arise in the context of de-
aggregated information service for the EGEE
cture, similar requirements can be also found
Page 3
hidden
Metadata Scheduler
Distr
ibute
d Inf
orma
tion
Sour
cesDGA
S
Infomation Source Selector
Mediator
W ra p
p er RGMA
InfoSource Ontology
Domain Ontology
Events
BDII
Metadata Cache
<<uses>>
Fig. 1. Overview of the Active Ontology architecture [8]
in other application domains (e.g., the stock market, currency
exchange, etc.). Therefore, ActOn provides a generic solution
that can be easily adapted to different application domains.
ActOn is comprised of a set of knowledge components,
which represent knowledge from the application domain and
from the information sources; and software components, such
as a metadata scheduler (MSch), an information source se-
lector (ISS), a metadata cache (MC), and a set of informa-
tion wrappers. Figure 1 shows how these components are
interrelated and how they are related to the corresponding
information sources where data is taken from.
The EGEE information service that has been developed
using ActOn uses Globus Toolkit 4 (GT4) [14] and the S-
OGSA Semantic Binding Service [15]. The latter is used
to bind semantic metadata with the ontologies it refers to
and with the resources that the metadata describes, so that
metadata can be managed as a resource, with its own lifetime,
authorisation policies, etc. All the source code of ActOn and
of the information service that we have described is available
under Open Source license at the OntoGrid CVS [16].
A. ActOn Knowledge Components
The knowledge components used in ActOn include a (set
of) domain ontology(ies) and an ontology of the information
sources. Domain ontologies describe the metadata information
model in the form of domain concepts and properties for
which instances will be generated, and restrictions about
them. In our service these are resources, components, services,
and applications of the EGEE Grid. The Information Source
Ontology provides information about the characteristics of
information sources, which are used for the information source
selection process. In our service they describe information
services deployed in EGEE. The two ontologies are related
by means of mappings that specify which domain concepts
and which o
information s
1) Grid Do
fine the globa
hence they ar
not put any
implement th
mentation we
RDF Schema
We have c
ties, resources
These ontolog
extend the G
descriptions a
middleware s
works, and us
have different
For example,
infrastructure
puter Element
Resource Bro
2) Informa
locating suitab
need. It desc
to be used b
independent p
a domain-spe
of informatio
well as specifi
The most im
the ontology
with four pro
(i) access
informat
informat
can be “
(ii) access
be used
CERN B
bdii.cern
(iii) belong
infrastruc
is availab
release b
be differ
(iv) withSc
an inform
BDII ser
The domain
tions of the fo
BDII (with th
tributed BDII
of them are s
Besides, we h
of RGMA, 5
83f their properties can be generated by which
ources, as we will explain later.
main Ontologies: These domain ontologies de-
l information model used to represent metadata,
e completely application dependant. ActOn does
constraint about the language to be used to
ese ontologies, although in our current imple-
assume that ontologies are described either in
[17] or OWL [18].
reated OWL ontologies that describe Grid enti-
, capabilities and the relationships among them.
ies are based on the one described in [19] and
rid ontology described in [20], which include
bout virtual organisations, users, applications,
ervices, computing and storage resources, net-
age policies. Besides the core Grid ontology, we
ontologies for each specific Grid infrastructure.
the EGEE Grid Ontology describes the EGEE
and its entities, including concepts like Com-
, Storage Element, User Interface, Worker Node,
ker, Logging and Booking Service, and Site.
tion Source Ontology: This ontology assists in
le information sources for a specific information
ribes the features of the information sources
y the system and is divided into a domain-
art, with five classes and forty properties, and
cific part that contains descriptions of the types
n sources that can be used in an application, as
c instances of those classes.
portant class in the domain-independent part of
is InformationSource, which is described
perties:
API: it defines the information model and the
ion access methods to be used. For instance, the
ion model of BDII is LDAP, and its accessAPI
ldapsearch” in C and “JNDI” in Java;
Point: it defines the server and port names to
to obtain the information from. For instance, the
DDII server can be described as “ldap://prod-
.ch:2170”;
ToMiddleware: it specifies the middleware
ture (e.g., EGEE) where the information service
le, since depending on the middleware type and
eing used the information access methods will
ent;
hema: it indicates the kind of information that
ation source provides. For instance, the EGEE
vers use the Glue Schema.
-dependent part for our service contains descrip-
llowing four main EGEE information providers:
e class BDIIIP being used to represent dis-
servers), RGMA, GridICE, and Unix-scripts. All
ubclasses of the class InformationSource.
ave defined 36 instances of BDIIIP, 10 instances
GridICE, and 10 Unix-script.
Page 4
hidden
egee:CE
egee:hasName
xsd:string egee:runningService
egee:freeCPUsxsd:int
egee:Service
acton:HouseKeeping
info:InformationSource
acton:lifeTime acton:timeStamp
acton:propertyName
acton:generatedBy info:accessPoint
xsd:string xsd:string
xsd:timeDuration xsd:timeInstant
Domain Ontology (EGEE)
ActOn Information Service Ontology
Information SourceOntology
info:withSchema
owl:DatatypeProperty
owl:ObjectProperty
owl:annotationProperty
xsd:QName
acton:hasMapping
......
Fig. 2. Graphical overview of the association between domain and informa-
tion source ontologies
An example of the information contained in one of the
BDIIIP instances is:
* server name: ldap://prod-bdii.cern.ch
* server port: 2170
* access API: BDIIRet.class
* information schema: glueschema
* grid middleware: gLite middleware
As shown in Figure 2, the association between domain and
information source ontologies is expressed by means of house-
keeping mappings. Each domain ontology class or property
is connected to the HouseKeeping class with the property
hasMapping2. The property generatedBy represents the
means to be used to extract information from the source and
transform it into the domain ontology(ies) components. This
is expressed with the class InformationSource. Each of
the mappings specifies, as well, the timestamp and lifetime of
the information retrieved from the information sources. This
information is used by the Metadata Scheduler to control the
Metadata Cache, as explained later.
B. ActOn Software Components
We will now describe the software components that com-
prise the ActOn architecture, as shown in Figure 1.
1) Metadata Scheduler (MSch): It is designed to apply
an update-on-demand policy to cache metadata. That is, the
cached metadata is not updated until it is stale when being
queried, so as to avoid unnecessary updates. We adopt event-
driven mechanisms to cope with that policy. We have defined
three types of events that can trigger the update process,
2When using OWL to implement the ontologies, we use an OWL annotation
property so as not to interfere with the domain and information source
knowledge representation.
though we ha
They are:
(i) Applicat
lifetime
process
instance,
specific
(ii) Query ev
queried.
queried i
informat
contact t
(iii) System-r
Grid enti
ple is a j
the value
of the cl
The MSch
scheduler rece
data that has
since its exp
of the other t
three steps: 1
select the mos
metadata from
sources, using
the metadata
information a
An examp
workflow. W
metadata for t
will first che
which is store
its lifetime. If
If it is out of
Selector servi
one EGEE re
puting Elemen
suitable infor
prod-bdii.cern
Wrapper serv
query, and the
the metadata
relevant prope
back to the m
Our approa
metadata on a
and gLite BD
data every 6-
imprecise, pa
the one hand,
metadata is m
3In the case tha
source, this will s
will be always u
84ve only implemented the first one in our service.
ion-specific events. They are application-based
control events. The MSch can force an update
based on specific application requirements. For
an external application may require to update a
piece of metadata at a given point in time.
ents. They are raised when metadata is being
As we will show below, if the metadata being
s available in the metadata cache and valid, the
ion sources are not contacted. If not, then we
hem to get fresh metadata3.
elated events. They can cause changes of the
ties that the metadata refers to. A typical exam-
ob-finished event, which can cause the change of
of the runningJob property of an instance
ass JobQueue.
acts upon receiving events. When the metadata
ives a query event that involves retrieving meta-
never been retrieved before or that is not valid
iry time has passed, or when it receives any
ypes of events, the metadata scheduler follows
) it contacts the Information Source Selector to
t suitable information source where to obtain the
; 2) it retrieves the metadata from the selected
the corresponding wrappers; and 3) it updates
cache, assigns a time-stamp to the retrieved
nd sends back the results to the requester.
le can illustrate a typical procedure of MSch
hen a query event is triggered that requests
he Computing Element ce101.cern.ch, the MSch
ck the time-stamp of its associated metadata,
d by the Metadata Cache, and compare it with
it is valid, then it will just give back the results.
date, then it will invoke the Information Source
ce to select a suitable information source (i.e.,
gion or site BDII server) for updating the Com-
t metadata. After getting the information about a
mation source (for example, lxb2086.cern.ch or
.ch), it invokes the corresponding Information
ice to fetch the information with an ldapsearch
n invokes the Metadata Cache to update (refresh)
by modifying the values and time-stamp of the
rties. At the same time the new metadata is sent
etadata requestor.
ch has clear advantages over others that update
regular time-scale basis, such as Globus MDS
II. These systems keep updating all their meta-
8 minutes. This approach is too expensive and
rticularly in large-scale distributed systems. On
there are many useless updates: a lot of updated
ost likely not being used (queried) in hours
t the latency is bigger than the update time of the information
till provide out-of-date metadata, but in the rest of cases data
p-to-date
Page 5
hidden
although it is updated every few minutes. On the other hand,
some of the metadata may not be accurate in the case that
the values of the metadata change more frequently than the
regular update time. In fact, some of the dynamic metadata
of BDII, such as freeCPU number, runningJobs or networking
bandwidth, is usually incorrect as it is never updated on time.
2) Information Source Selector (ISS): The Information
Source Selector (ISS) is used to find the most suitable in-
formation source from the set of available sources, which are
described as instances of the Information Source Ontology.
Information sources can be any system (database, file, service,
etc.) that contains relevant information. In Grid systems there
are many redundant and geographically-distributed informa-
tion sources available. For example, over 20 region BDII
servers can be used to fetch information about the EGEE
Computing Elements.
The selection is based on a set of retrieval conditions, in-
cluding the actual information needed (specified as a SPARQL
query), and other aspects like the geographical proximity of
the source. For example, in our prototype we have defined
the class ComputingElement that represents EGEE com-
puting elements. This class has a property freeCPU that is
generatedBy the information source BDII.
Since in our ontology we have defined over 30 BDII servers
(as instances of the class BDIIIP), the ISS service sends a
query to select the most suitable one for fetching the needed
value. The query is done in SPARQL, and retrieves those
instances of BDIIIP that belongToMiddleware EGEE
Grid, whose schema is GlueSchema and whose version
is 3.0. Also the middleware is gLite, and the release version
3.1.5. Below is a SPARQL query for a BDIIIP instance in our
implementation:
PREFIX onG : <h t t p : / /www. cs . man . ac . uk / img / o n t o g r i d />
FROM <EGEEGridInfo . v0 . 3 . owl>
SELECT ?BDIIIP
WHERE { ?x onG : r u n n i n gS e r v i c e b d i i i p ? .
OPTIONAL { ?x onG : belongTo ”EGEE” .
?y onG : i n s t a l l e dO n ‘ ‘ gL i t e ’ ’ .
? z onG : withSchema ‘ ‘ GlueSchema ’ ’ . }
The selected BDIIIP instances are ranked according to
their geographical proximity, quality of the service, and the
capabilities of the BDII server machine.
3) Information Wrappers: After an information source is
selected, the Metadata Scheduler contacts the corresponding
Information Wrapper in order to retrieve the relevant up-to-
date information. Normally there is an Information Wrapper
per type of information source accessed (that is, one for MDS,
another one for BDDII, etc.). We have developed four kinds of
wrappers: the BDII server wrapper, the RGMA server wrapper,
the GridICE wrapper, and the Unix-script wrapper.
The wrappers are used to fetch information from dif-
ferent information sources. First, the Information Wrapper
gets information from the information source ontology about
the data model of the specific source to be accessed, and
about its access API and access point. Then it fetches
the information from its source. For instance, a BDIIIP
information source can be queried using an LDAP query
based on the
“ldapsearch -
name=CERN-
the results a
ComputingE
ActOn does
ing Informatio
manner, by h
and the transf
They can be
languages an
R2O [23], etc
4) Metadat
stores and m
mation sourc
information, s
are still valid
query event th
The metad
information m
information a
ments (SE), V
above, the M
implementatio
timestamp and
IV. INF
In our eva
provided by
users, and ho
We are interes
obtain the sam
the same cond
want to chec
how many of
check this, w
information r
their definitio
Precision: Th
out of all the
Precision =
(r
Recall: The
trieved, out o
Recall =
(r
A. Experimen
We have d
information q
on a real Gri
at the time of
middleware. T
85information from a BDII individual, such as
x -H ldap://prod-bdii.cern.ch:2170 -b mds-vo-
PROD,o=grid”. Once the query is answered,
re transformed into instances of the concept
lement of the domain ontology.
not impose any specific technology for generat-
n Wrappers. They can be generated in an ad-hoc
ard-coding the access to the information source
ormation into the application domain ontology.
also generated with generic wrapper-generation
d technologies, such as WSL [21], D2R [22],
.
a Cache (MC): The Metadata Cache (MC)
anages the metadata obtained from the infor-
es, together with its timestamp and lifetime
o that it can check whether such property values
or not (e.g., lifetime control) when it receives a
at involves them.
ata cache uses the domain ontologies as its
odel. For instance, in our service the MC caches
bout Computing Elements (CE), Storage Ele-
irtual Organisations (VO), etc. As commented
C uses the S-OGSA semantic binding service
n in order to store the values together with their
lifetime, using the mappings shown in Figure 2.
ORMATION QUALITY EVALUATION AND
COMPARISON
luation we want to know whether the results
our service conform to the expectations of the
w it compares with the other available services.
ted in knowing whether all information services
e results when answering the same query, given
itions in the EGEE production testbed. We also
k how many of those answers are correct and
the existing answers are actually retrieved. To
e have selected two metrics, commonly used in
etrieval: precision and recall. Below we provide
ns and the formulae used to calculate them:
e proportion of relevant information retrieved,
information retrieved.
elevant information) ∩ (retrieved information)
retrieved information
(1)
proportion of relevant information that is re-
f all the relevant information available.
elevant information) ∩ (retrieved information)
relevant information
(2)
t setup and design
esigned a set of experiments for measuring the
uality criteria selected. Measurements are taken
d testbed, the EGEE production testbed, which
the experiments, has gLite 3.0.1 installed as its
he user interfaces used to access the EGEE Grid
Page 6
hidden
are the UI machines at the University of Manchester4, United
Kingdom, and at the Institute of Physics of Belgrade5, Serbia.
To carry out the experiments and record their results, we
have developed a set of Java-based client software and Unix
shell scripts, available at the IST OntoGrid project CVS [16].
The key aspects upon which we compare different in-
formation services are: i) the information model that each
information service adopts; and ii) the expressiveness of its
query language. In order to evaluate these two features, we
have proposed six representative queries that cover a wide
range of Grid systems, including Grid hardware resources,
software resources, middleware environment, services, appli-
cations, etc., and show increasing complexity. These queries
can be normally issued by middleware systems like schedulers,
resource brokers or by more complex applications:
• Query 1: Find all the Computing Elements (CEs) that
support the BIOMED Virtual Organisation (VO).
• Query 2: Find all the CEs that support the BIOMED VO
and have more than 100 CPUs available.
• Query 3: Find all the CEs that support the MPI running
environment.
• Query 4: Find all the CEs that support the BIOMED VO,
have more than 100 CPUs available, and support the MPI
running environment.
• Query 5: Find all the CEs where GATE (Geant4 Appli-
cation for Tomographic Emission) can be run.
• Query 6: Find all the CEs that support the BIOMED VO,
have more than 100 CPUs available, and where GATE
can be run.
Information Service
BDII(LDAP search)
RGMA(SQL query)
ActOn Based(SPARQL query)
Query1 (Find all the CEs that support the BIOMED VO )
ldapsearch -x -H ldap://lcg-bdii.cern.ch:2170 -b mds-vo name=local,o=grid '(&(objectClass=GlueVOView) (GlueVOViewLocalID=biomed))' GlueCEAccessControlBaseRule
Select GlueCEVOViewUniqueID,Value from GlueCEVOViewAccessControlBaseRuleWHERE Value='VO:biomed'
PREFIX egeeOnto: <http://www.cs.man.ac.uk/img/ontogrid#>SELECT ?ceid ?ceID ?VOWHERE ?ceid egeeOnto:CEUniqueID ?ceID . ?ceid egeeOnto:hasVO ?VO .OPTIONAL { ?ceid egeeOnto:VO ?ceID . FILTER ( ?vo = ``biomed'')}
Fig. 3. An Example of the Query 1 in BDII, RGMA, and ActON
Each of these six queries has been translated into the query
languages of the three information services. Figure 3 shows
an example for Query 1. We use different clients to execute
these queries and extract the results obtained (e.g., ldapsearch
4ui.tier2.hep.manchester.ac.uk
5ce.phy.bg.ac.yu
for BDII, the
based ActOn
Not only q
obtained in d
information m
query is a s
set of table
RDF triples.
the same Gri
ce02.tier2.hep
ment). Even
experiment w
That is, we u
as the basic u
to calculate p
| ceid | <http://img.cs.man.ac
# biomed, ce02.tier2.hedn: GlueVOViewLocalID=biomed,mds-vo-name=GlueCEAccessControlBa
+-------------------| GlueCEVOViewUniqueI+-------------------|ce02.tier2.hep.manche
Query results of BDII
Query results of RGM
Query results of ActO
Fig. 4. Results
Computing Elem
B. Experimen
The experi
trieved for ea
estimate their
Precision i
manually by
In all cases, w
is, each piece
irrelevant for
Recall is m
the amount o
testbed chang
way to get ac
Grid resource
services that w
we execute e
between exec
utes. Then w
executions as
used to calcu
Tables I, II
surements ob
described abo
BDII, RGMA
86gLite RGMA client tools for RGMA and a Java-
client for the ActOn-based information service).
ueries are different, but also query results are
ifferent manners, due to the differences in the
odels of each service. The result of a BDII
et of LDAP entries, of an RGMA query a
rows, and of an ActOn-based query a set of
Figure 4 shows three different ways to show
d resource in the three services evaluated (i.e.,
.manchester.ac.uk, an EGEE Computing Ele-
if they have different syntax and size, in our
e count them as one piece of information each.
se each “Grid resource” obtained from a query
nit for counting information, which will be used
recision and recall, as described in Section IV-B.
| ceID | VO |.uk/ontogrid1234423456> | "ce02.tier2.hep.manchester.ac.uk" | "biomed" |
p.manchester.ac.uk:2119/jobmanager-lcgpbs-biomed, UKI-NORTHGRID-MAN-HEP, local, gridbiomed,GlueCEUniqueID=ce02.tier2.hep.manchester.ac.uk:2119/jobmanager-lcgpbs-UKI-NORTHGRID-MAN-HEP,mds-vo-name=local,o=gridseRule: VO:biomed
---------------------------------------------------------------+D | Value |--------------------------------------------------------------- +ster.ac.uk :2119/jobmanager-lcgpbs-biomed/biomed | VO:biomed |
:
A:
n:
of BDII, RGMA, and ActOn for the the same Grid resource
ent at University of Manchester (ce02.manchester.ac.uk)
tal Results Measurement and Analysis
ment consists in examining the information re-
ch of the six queries aforementioned, so as to
corresponding precision and recall measures.
s easy to determine, since it can be computed
looking at the results obtained from each query.
e assume binary relevancy of information, that
of information retrieved is either relevant or
the issued query.
ore difficult to determine, due to the fact that
f information available in the EGEE production
es frequently in these systems and there is no
curate information about the actual state of the
s that are available without using the information
e are evaluating. To get a good approximation,
ach query 100 times, with a 4-minute interval
utions, monitoring the testbed during 400 min-
e use the highest value obtained from this 100
the total number of relevant information to be
late recall.
and III provide the precision and recall mea-
tained after the execution of the experiments
ve for the three information services selected:
and the ActOn-based information service. The
Page 7
hidden
values provided in the tables show the average of executing
the queries 100 times.
TABLE I
BDII RECALL & PRECISION MEASUREMENT (100 TIMES)
Query Retrieved Info. Relevant Info. Precision Recall
Q1 14,999 15,200 1 0.987
Q2 242,517 19,708 0.082 0.918
Q3 7174 7300 1 0.983
Q4 485034 4600 0.010 0.990
Q5 - - - -
Q6 - - - -
TABLE II
RGMA RECALL & PRECISION MEASUREMENT (100 TIMES)
Query Retrieved Info. Relevant Info. Precision Recall
Q1 3417 15200 1 0.225
Q2 6321 6321 1 1
Q3 6568 7300 1 0.900
Q4 11245 4914 0.437 0.563
Q5 - - - -
Q6 - - - -
TABLE III
ACTON RECALL & PRECISION MEASUREMENT (100 TIMES)
Query Retrieved Info. Relevant Info. Precision Recall
Q1 15200 15200 1 1
Q2 34100 34100 1 1
Q3 6568 7300 1 0.900
Q4 6568 7300 1 0.900
Q5 24 24 1 0.900
Q6 6 6 1 1
As a general comment about these results, we can highlight
the fact that BDII shows in general poor results with respect to
recall and precision, while ActOn and RGMA present better
results. This is mainly related to the repository that BDII uses
(LDAP), which is too lightweight and hence provides weak
information process and query capabilities; while RGMA’s is
based on relational databases and ActOn’s is based on RDF,
which both have better query capabilities.
Now we will analyse with more detail some of the system
behaviours over specific queries, and derive more conclusions
from these values:
BDII has weak query capabilities. Table I shows that BDII
has extremely bad precision results for queries 2 and 4, while
the results for queries 1 and 3 are excellent. This is related to
its weak query ability, as aforementioned. LDAP-based queries
are string-based, and hence they cannot be used to support
queries over numerical values, such as “greater than or lower
than”. If we want to improve this precision value, we need to
fetch all the information about CE CPUs as a string value first
(as we have done to get these results), and then post-process
(filter) those results on the client side. RGMA and the ActOn-
based information services do not have that problem, since
their query abilities are better.
RGMA is not able to relate information available in
different tables. Table II shows that RGMA has bad precision
results in query 4. RGMA contains information to solve this
query, but th
(GlueCE and
GlueSubClu
and the quer
making a join
to the previou
side by post-
from each sep
RGMA is
ability of inf
Table II show
This is becau
that is availab
due to the fac
RGMA regist
were not confi
BDII and the
to this, due to
do not depen
querying.
Some com
of informatio
that BDII and
They cannot a
providers can
combined. Th
to share thei
the ActOn-ba
existing infor
and aggregate
answer such c
V.
In this pap
for EGEE th
integration ap
overcomes so
proaches whe
redundant inf
quality, availa
are important
We adopt a
tion, where w
information s
update freque
from. The ma
that is update
the system an
Besides, in
large set of so
on criteria su
geographical
The results
quality of me
promising, su
87e information comes from two different tables
sterSoftwareRunTimeEnvironment),
y language used by RGMA does not allow
of both tables. Hence the situation is similar
s case: this problem can be solved on the client
processing the results that have been obtained
arate query.
very sensitive to the registering and avail-
ormation providers at a given point in time.
s that RGMA has bad recall results in query 1.
se the amount of Computing Element producers
le during the experiment is not always stable,
t that either producers were not registered in the
ry at that specific moment, or that the producers
gured correctly or available at that point in time.
ActOn-based information service are more robust
the fact that they store information locally and
d on their information providers at the time of
plex queries cannot be answered by one type
n service in isolation. Tables I and II show
RGMA can only answer the first four queries.
nswer queries 5 and 6 because their information
not provide enough information and should be
is shows that the ability of BDII and RGMA
r data resources is weak. On the other hand,
sed information service has the ability to adopt
mation sources as its information providers,
information from these information sources to
omplex queries.
CONCLUSIONS AND FUTURE WORK
er we have presented an information service
at is based on an ontology-based information
proach, Active Ontology (ActOn). This approach
me of the limitations of current similar ap-
n dealing with highly dynamic, distributed and
ormation sources in the cases where information
bility and robustness, as well as response time,
non-functional requirements.
data warehouse approach to information integra-
e materialise relevant information from different
ources and assign it a lifetime based on the
ncy of the information sources where it is taken
terialised information acts as a metadata cache
d only when an information request is sent to
d the materialised information has expired.
formation sources are selected at run-time from a
urces that provide redundant information, based
ch as their information coverage, availability,
proximity, etc.
of the experiments executed to analyse the
tadata and the response time of our system are
ggesting that it can increase the metadata quality
Page 8
hidden
and robustness of currently-deployed information systems, and
decrease the cost of system resources.
In summary, our main contribution over the state of the
art in Grid information systems is that we have proposed
a Grid information service that performs an ontology-based
integration of information from existing services, what allows
creating automatically execution plans for retrieving informa-
tion from sources that are overlapping in the information that
they publish and have different provenance constraints, and
maintain a cache of relevant information as long as it is valid
given its lifetime constraints.
As for the integration of features from other systems, we
plan to work on the integration and extension of (semi-
)automatic wrapper generation systems like D2R and R2O
(currently these systems are only available to access databases,
but we plan to extend them for accessing information services
such as those present in Grid systems), and on the integration
of query reformulation and planning techniques, such as those
of Theseus [24], with the metadata cache approach that we
have proposed.
We also plan to take full advantage of following an
ontology-based approach for information integration, allowing
us to perform tasks that cannot be done easily with the services
currently available, such as detecting inconsistencies in the
metadata that is available or deriving new information. For
example, a common problem with current information services
is their level of trustiness. There are many cases where a
computing element specifies that it gives support to MPI but
does not comply with the requirements for running an MPI
job, which are that it must be a CE server, must have an
sshd service running on it, must have the libraries mpirun
and libmpi.so in its file system, and must have at least
two worker nodes. Similarly, we could derive that a computing
element gives support to MPI if the previous conditions apply,
since this is a necessary and sufficient condition.
ACKNOWLEDGEMENTS
This work is supported by the EU FP6 OntoGrid project
(STREP 511513), by the Marie Curie fellowship RSSGRID
(FP6-2002-Mobility-5-006668), and by the EU FP6 CoreGrid
Network of Excellence (FP6-004265). We also thank Pinar
Alper (IMG group), Antun Balaz and Laurence Field (EGEE),
and Georges Da Costa and Anastasios Gounaris (CoreGrid
WP2), for their helpful comments.
REFERENCES
[1] “Enabling Grids for E-sciencE (EGEE),” http://public.eu-egee.org/.
[2] “EGEE gLite,” http://glite.web.cern.ch/glite.
[3] I. Foster, H. Kishimoto, A. Savva, D. Berry, A. Grimshaw, B. Horn,
F. Maciel, F. Siebenlist, R. Subramaniam, J. Treadwell, and J. V. Reich,
The Open Grid Services Architecture, Version 1.5, gfd-i.080 ed., GGF,
July 2006, http://forge.gridforum.org/projects/ogsa-wg.
[4] “Berkeley Database Information Index
(BDII),” http://lfield.home.cern.ch/lfield/cgi-
bin/wiki.cgi?area=bdiipage=documentation.
[5] K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman, “Grid infor-
mation services for distributed resource sharing,” in Proceedings of the
Tenth IEEE International Symposium on High-Performance Distributed
Computing (HPDC-10). IEEE Press, August 2001.
[6] “EDG RGM
guide.pdf.
[7] H. Wache,
H. Neuman
— a survey
and Inform
[8] W. Xing, O
An Informa
Sources,” i
Innsbruck,
[9] “European
[10] J. Marco a
Proceeding
LNCS 2970
pp. 67–77.
[11] “Globus To
[12] “The Spalla
[13] P. Wieder
Services,” i
Germany, A
[14] “Globus too
[15] O´. Corcho,
C. A. Gobl
Architectur
2006.
[16] “OntoGrid
[17] D. Brickley
1.0: RDF S
[18] P. Patel-Sc
Language S
February 20
[19] W. Xing, M
for the Sem
Symposium
gapore: IEE
[20] M. Parkin,
“The Know
6th Cracow
[21] H. Garcia-M
J. Ullman,
to Mediatio
Systems, vo
[22] C. Bizer, “
Internation
[23] J. Barrasa,
and Seman
in In Pro
Databases(
[24] G. Barish,
plan execu
Internation
and J. S. R
138–139.
88A,” www.marianne.in2p3.fr/datagrid/documentation/rgma-
T. Vo¨gele, U. Visser, H. Stuckenschmidt, G. Schuster,
n, and S. Hu¨bner, “Ontology-based integration of information
of existing approaches,” in IJCAI–01 Workshop: Ontologies
ation Sharing, H. Stuckenschmidt, Ed., 2001, pp. 108–117.
. Corcho, C. Goble, and M. Dikaiakos, “Active Ontology:
tion Integration Approach for Highly Dynamic Information
n Europe Semantic Web Conference 2007 (ESWC-2007),
Austria, June 2007, Poster.
DataGrid,” http://eu-datagrid.web.cern.ch/eu-datagrid/.
nd et al., “First Prototype of the Crossgrid Testbed,” in
s of First European AcrossGrids Conference (AXGrids 2003),
. Santiago de Compostela, Spain: Springer-Verlag, 2003,
olkit,” http://www.globus.org/toolkit/.
tion Neutron Source (SNS) project,” http://www.sns.gov/.
and D. Mallmann, “UniGrids - Uniform Interface to Grid
n 7th HLRS Metacomputing and Grid Workshop, Stuttgart,
pril 2004.
lkit,” http://www.globus.org.
P. Alper, I. Kotsiopoulos, P. Missier, S. Bechhofer, and
e, “An Overview of S-OGSA: A Reference Semantic Grid
e,” Journal of Web Semantics, vol. 4, no. 2, pp. 102–115,
CVS,” http://www.ontogrid.net/ontogrid/downloads.jsp.
and R. G. (editors), “RDF Vocabulary Description Language
chema.” February 2004, http://www.w3.org/TR/rdf-schema/.
hneider, P. Hayes, and I. Horrocks, OWL Web Ontology
emantics and Abstract Syntax, World Wide Web Consortium,
04.
. D. Dikaiakos, and R. Sakellariou, “A Core Grid Ontology
antic Grid,” in Proceedings of the 6th IEEE International
on Cluster Computing and the Grid (CCGrid 2006). Sin-
E Computer Society, May 2006, pp. 178–184.
S. van den Burghe, O. Corcho, D. Snelling, and J. Brooke,
ledge of the Grid: A Grid Ontology,” in Proceedings of the
Grid Workshop, Cracow, Poland, October 2006.
olina, Y. Papakonstantinou, A. R. a. D. Quass, Y.Sagiv,
V. Vassalos, and J. Widom, “The TSIMMIS Approach
n: Data Models and Languages,” Intelligent Information
l. 8, no. 2, pp. 117–132, 1997.
D2R MAP: A DB to RDF Mapping Language,” in 12th
al World Wide Web Conference, Budapest, May 2003.
O. Corcho, and A. Gomez-Perez, “R2O, an Extensible
tically based Database-to-Ontology Mapping Language,”
ceedings of the 2nd Workshop on Semantic Web and
SWDB2004), Toronto, Canada, 2004.
D. DiPasquo, C. A. Knoblock, and S. Minton, “Dataflow
tion for software agents,” in Proceedings of the Fourth
al Conference on Autonomous Agents, C. Sierra, M. Gini,
osenschein, Eds. Barcelona, Spain: ACM Press, 2000, pp.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

1 Reader on Mendeley
by Discipline
 
by Academic Status
 
100% Senior Lecturer
by Country
 
100% Spain