The semantic connectivity map: an adapting self-organising knowledge discovery method in data bases. Experience in gastro-oesophageal reflux disease.
International journal of data mining and bioinformatics (2008)
- PubMed: 19216342
Available from www.ncbi.nlm.nih.gov
or
Abstract
We describe here a new mapping method able to find out connectivity traces among variables thanks to an artificial adaptive system, the Auto Contractive Map (AutoCM), able to define the strength of the associations of each variable with all the others in a dataset. After the training phase, the weights matrix of the AutoCM represents the map of the main connections between the variables. The example of gastro-oesophageal reflux disease data base is extremely useful to figure out how this new approach can help to re-design the overall structure of factors related to complex and specific diseases description.
Available from www.ncbi.nlm.nih.gov
Page 1
The semantic connectivity map: an adapting self-organising knowledge discovery method in data bases. Experience in gastro-oesophageal reflux disease.
14 15
De Connectie
nummer 4, jaargang 4, Juni 2010
The emergence of self-organising algorithms for the Semantic Web
Christophe Guéret, Computational Intelligence group and Knowledge Representation group, Artificial Intelligence section
Vrije Universiteit Amsterdam, (cgueret@few.vu.nl)
The Web as we know it is undergoing a profound modification. In addition to linked documents in the World Wide Web, linked structured data has
started to popup at several places. This data provides
semantically rich information about people, places,
movies and every other concept one can think of. This
Web of Data (WoD) is growing at an amazing rate, mak-
ing it no longer feasible to deal with it in a centralised
way. Besides, this Web made of billions of facts created
by many different persons is messy and contradicting.
Self-organising algorithms provide the adaptiveness,
robustness and scalability that will be required to rea-
son with this ever growing amount of dynamic Seman-
tic Web data.
The creation of a Web of Data
Nowadays, discussions around the future of the Internet and it’s
main application, the Web, usually turns around the terms “Web
.0”, “Semantic Web” and “Internet of Things”.
The “Internet of Things” holds the promises of a future where
every device, from clothes to cell phones to domestic appliances
like fridges, will be connected and be able to exchange informa-
tion with other devices. As a result, there will be a service pro-
viding an uniform information layer to devices making use of it
[10]. This new service-oriented Web has been coined as “Web
.0”, following the interaction-focussed Web 2.0 that itself suc-
ceeded to the Web 1.0 client/server content model. Using Wiki’s
(e.g. Wikipedia) and Content portals (e.g. Youtube) as flagship
technologies, Web 2.0 allows anyone to easily become a pub-
lisher on the Web.
For this Web .0 to work, the devices need to understand what
they are talking about. So far, the Web is suitable for consump-
tion by humans, but not by machines. The Semantic Web research
effort focusses on the definition of standards and tools enabling
the creation of Web content suitable for both humans and ma-
chines [1]. This research lead to the creation of standards to rep-
resent facts about things (RDF), relations amongst these things
(RDFS, OWL) and protocols to interact with the created data
(SPARQL).
These new standards, in combination with some others
already well established standards such as HTTP, have been ap-
plied to publish and interlink information that used to be buried
in web pages and closed knowledge systems. For instance, the
fact that Amsterdam is located in the Netherlands is information
that one can easily read from the Wikipedia page of Amsterdam,
but that remains hidden to the eyes of a machine. The DBPedia
project extracts such information and publishes it as an RDF
triple <Amsterdam, isLocatedIn, Netherlands>. That is, a
combination of a subject (Amsterdam), predicate (isLocatedIn)
and object (Netherlands) expressed using RDF. A predicate can
itself be the subject of other triples and thus it can also be de-
scribed. As a result, RDF allows to construct a network of facts
that embeds both relations between things and the meaning of
these relations, constituting a “Web of Data” (WoD). An RDF
triple <Christophe, isMarriedTo, Jennifer> will implicitly
mean that <Jennifer, isA, Person>.
The rapidly increasing size of the WoD is expected
to reach a trillion triples in 2010 [9], with contributions from
crowdsourcing1 projects like DBPedia, as well as UK govern-
mental data [6] and, recently, Facebook [4].
The DBPedia project is a community effort to extract struc-
tured information from Wikipedia and to make this information
available on the Web. It allows you to ask sophisticated queries
against Wikipedia, and to link other data sets on the Web to
Wikipedia data. The originators hope this will make it easier for
the amazing amount of information in Wikipedia to be used in
new and interesting ways.
The UK government also opened up. They provide
government statistics about transportation, environment and
crime on their website data.gov.uk, much like CBS does in the
Netherlands (though CBS doesn’t provide it as linked data). Us-
ers can run queries to find information, and submit ideas and
applications to combine that data.
More examples of online resources can be found in
figure 1 and table 1. It is impossible to get a complete picture
of the Web of Data, but links to a significant part of it can be
seen on the Linked Open Data project website. The site indexes
a lot of online data sources, and generated the overview picture
below.
Application perspectives
Thanks to the addition of semantics to the Web, it is now pos-
sible for computers to understand what they display rather that
just showing it. One area of application of structured data is
the enhancement of search results. For example, Google shows
structured data fetched from a LinkedIn profile to their search
results, indicating basic information such as the amount of con-
nections the person has.
Because facts are linked and the meaning of these links
can be easily retrieved, the WoD also gives the opportunity to
combine data in new ways. Mashup applications combine two or
more datasets to highlight some invisible underlying informa-
tion. Many mashup applications are shown on UK data web-
site[6], such as survey information mapped geographically, or
an iPhone app that lists more than 000 prospective parliamen-
tary candidates for the 2010 general elections, along with their
contact details.
Recently, Facebook added the possibility to embed a “Like”
button on any web page [4]. When pressing it, a user indicates
that he likes this page and that information is displayed in his
Facebook profile. Without semantics, the story would stop here,
as Facebook scripts would have no way to know what this page
is about. Semantic data incorporated into the code of the Web
page (using RDF), is used to express in a machine readable way
what is displayed on the page. For instance, the code in listing
1 says that the page is about a movie entitled “The Rock” and
which image to use as a thumbnail:
<meta property = “og:title“ content = “The Rock“ />
<meta property = “og: type” content = “movie” />
<meta property =“og:image “content =“http://ia.media-
imdb.com/images /rock.jpg”/>
From a structural point of view, the WoD does not differ much
from the WWW; it’s a decentralised network of connected con-
cepts (rather than documents) free to be enriched by anyone.
But, unlike the WWW, the real potential of the WoD can only
be reached when data facts are linked and combined to produce
knowledge. Paradoxically, the main approaches currently used
to deal with this decentralized network are centralised: all the
data is gathered in one place before being used. However, it is
widely recognised that new adaptive approaches towards robust
and scalable reasoning are required to exploit the full value of
ever growing amounts of dynamic Semantic Web data [5].
2.1 Advantages of self-organising algorithms
We think that algorithms dealing with the WoD will have to ex-
hibit features such as self-organisation, simplicity and interac-
tivity. Evolutionary Computing and Collective Intelligence are
two promising research areas for the design of these algorithms.
They provide particular advantages:
• Learning and adaptation: the performance of algorithms
improves during their execution. As good results are found, the
algorithm learns why these results are good and how to find
similar ones. This learning can also cope with changing condi-
tions and consequently adapt.
• Design simplicity: for instance, an evolutionary algorithm
is based on a simplified ‘survival of the fittest’: in an iterative
process, solutions are guessed, verified and deleted if they are
not. The expected result - an optimal solution to the problem -
Figure 1: Part of the linked open data cloud containing an estimated 14 billion interlinked facts [8]. Nodes are data sources and connections triples
expressed across different data sources.
Table 1: Online data resources, forming nodes in the Linked Open
Data cloud [8]
1 When a company has a problem, it can outsource it to the crowd, thus the name ‘crowdsourcing’. By holding a competition, users are motivated to create their
own solution, thereby benefiting the company. Contributions can also be voluntary, such as with wikipedia, lazyweb or the ESP game.
De Connectie
nummer 4, jaargang 4, Juni 2010
The emergence of self-organising algorithms for the Semantic Web
Christophe Guéret, Computational Intelligence group and Knowledge Representation group, Artificial Intelligence section
Vrije Universiteit Amsterdam, (cgueret@few.vu.nl)
The Web as we know it is undergoing a profound modification. In addition to linked documents in the World Wide Web, linked structured data has
started to popup at several places. This data provides
semantically rich information about people, places,
movies and every other concept one can think of. This
Web of Data (WoD) is growing at an amazing rate, mak-
ing it no longer feasible to deal with it in a centralised
way. Besides, this Web made of billions of facts created
by many different persons is messy and contradicting.
Self-organising algorithms provide the adaptiveness,
robustness and scalability that will be required to rea-
son with this ever growing amount of dynamic Seman-
tic Web data.
The creation of a Web of Data
Nowadays, discussions around the future of the Internet and it’s
main application, the Web, usually turns around the terms “Web
.0”, “Semantic Web” and “Internet of Things”.
The “Internet of Things” holds the promises of a future where
every device, from clothes to cell phones to domestic appliances
like fridges, will be connected and be able to exchange informa-
tion with other devices. As a result, there will be a service pro-
viding an uniform information layer to devices making use of it
[10]. This new service-oriented Web has been coined as “Web
.0”, following the interaction-focussed Web 2.0 that itself suc-
ceeded to the Web 1.0 client/server content model. Using Wiki’s
(e.g. Wikipedia) and Content portals (e.g. Youtube) as flagship
technologies, Web 2.0 allows anyone to easily become a pub-
lisher on the Web.
For this Web .0 to work, the devices need to understand what
they are talking about. So far, the Web is suitable for consump-
tion by humans, but not by machines. The Semantic Web research
effort focusses on the definition of standards and tools enabling
the creation of Web content suitable for both humans and ma-
chines [1]. This research lead to the creation of standards to rep-
resent facts about things (RDF), relations amongst these things
(RDFS, OWL) and protocols to interact with the created data
(SPARQL).
These new standards, in combination with some others
already well established standards such as HTTP, have been ap-
plied to publish and interlink information that used to be buried
in web pages and closed knowledge systems. For instance, the
fact that Amsterdam is located in the Netherlands is information
that one can easily read from the Wikipedia page of Amsterdam,
but that remains hidden to the eyes of a machine. The DBPedia
project extracts such information and publishes it as an RDF
triple <Amsterdam, isLocatedIn, Netherlands>. That is, a
combination of a subject (Amsterdam), predicate (isLocatedIn)
and object (Netherlands) expressed using RDF. A predicate can
itself be the subject of other triples and thus it can also be de-
scribed. As a result, RDF allows to construct a network of facts
that embeds both relations between things and the meaning of
these relations, constituting a “Web of Data” (WoD). An RDF
triple <Christophe, isMarriedTo, Jennifer> will implicitly
mean that <Jennifer, isA, Person>.
The rapidly increasing size of the WoD is expected
to reach a trillion triples in 2010 [9], with contributions from
crowdsourcing1 projects like DBPedia, as well as UK govern-
mental data [6] and, recently, Facebook [4].
The DBPedia project is a community effort to extract struc-
tured information from Wikipedia and to make this information
available on the Web. It allows you to ask sophisticated queries
against Wikipedia, and to link other data sets on the Web to
Wikipedia data. The originators hope this will make it easier for
the amazing amount of information in Wikipedia to be used in
new and interesting ways.
The UK government also opened up. They provide
government statistics about transportation, environment and
crime on their website data.gov.uk, much like CBS does in the
Netherlands (though CBS doesn’t provide it as linked data). Us-
ers can run queries to find information, and submit ideas and
applications to combine that data.
More examples of online resources can be found in
figure 1 and table 1. It is impossible to get a complete picture
of the Web of Data, but links to a significant part of it can be
seen on the Linked Open Data project website. The site indexes
a lot of online data sources, and generated the overview picture
below.
Application perspectives
Thanks to the addition of semantics to the Web, it is now pos-
sible for computers to understand what they display rather that
just showing it. One area of application of structured data is
the enhancement of search results. For example, Google shows
structured data fetched from a LinkedIn profile to their search
results, indicating basic information such as the amount of con-
nections the person has.
Because facts are linked and the meaning of these links
can be easily retrieved, the WoD also gives the opportunity to
combine data in new ways. Mashup applications combine two or
more datasets to highlight some invisible underlying informa-
tion. Many mashup applications are shown on UK data web-
site[6], such as survey information mapped geographically, or
an iPhone app that lists more than 000 prospective parliamen-
tary candidates for the 2010 general elections, along with their
contact details.
Recently, Facebook added the possibility to embed a “Like”
button on any web page [4]. When pressing it, a user indicates
that he likes this page and that information is displayed in his
Facebook profile. Without semantics, the story would stop here,
as Facebook scripts would have no way to know what this page
is about. Semantic data incorporated into the code of the Web
page (using RDF), is used to express in a machine readable way
what is displayed on the page. For instance, the code in listing
1 says that the page is about a movie entitled “The Rock” and
which image to use as a thumbnail:
<meta property = “og:title“ content = “The Rock“ />
<meta property = “og: type” content = “movie” />
<meta property =“og:image “content =“http://ia.media-
imdb.com/images /rock.jpg”/>
From a structural point of view, the WoD does not differ much
from the WWW; it’s a decentralised network of connected con-
cepts (rather than documents) free to be enriched by anyone.
But, unlike the WWW, the real potential of the WoD can only
be reached when data facts are linked and combined to produce
knowledge. Paradoxically, the main approaches currently used
to deal with this decentralized network are centralised: all the
data is gathered in one place before being used. However, it is
widely recognised that new adaptive approaches towards robust
and scalable reasoning are required to exploit the full value of
ever growing amounts of dynamic Semantic Web data [5].
2.1 Advantages of self-organising algorithms
We think that algorithms dealing with the WoD will have to ex-
hibit features such as self-organisation, simplicity and interac-
tivity. Evolutionary Computing and Collective Intelligence are
two promising research areas for the design of these algorithms.
They provide particular advantages:
• Learning and adaptation: the performance of algorithms
improves during their execution. As good results are found, the
algorithm learns why these results are good and how to find
similar ones. This learning can also cope with changing condi-
tions and consequently adapt.
• Design simplicity: for instance, an evolutionary algorithm
is based on a simplified ‘survival of the fittest’: in an iterative
process, solutions are guessed, verified and deleted if they are
not. The expected result - an optimal solution to the problem -
Figure 1: Part of the linked open data cloud containing an estimated 14 billion interlinked facts [8]. Nodes are data sources and connections triples
expressed across different data sources.
Table 1: Online data resources, forming nodes in the Linked Open
Data cloud [8]
1 When a company has a problem, it can outsource it to the crowd, thus the name ‘crowdsourcing’. By holding a competition, users are motivated to create their
own solution, thereby benefiting the company. Contributions can also be voluntary, such as with wikipedia, lazyweb or the ESP game.
Page 2
16 17
De Connectie
nummer 4, jaargang 4, Juni 2010Kathrin Dentler, Vrije Universiteit te Amsterdam
k.dentler@few.vu.nl
The Semantic Web is a web of linked (open) data. There is lots
of data we all use every day,
but it is controlled by appli-
cations that keep it to them-
selves. Examples are social
networking sites such as Face-
book, MySpace or LinkedIn.
In an ideal world, you should
be able to connect to your
friends regardless of the kind of application they are
using. Considering this, the Semantic Web is about two
things. It is about common formats for the integration
and combination of data drawn from diverse sources. It
is also about language for recording how the data relates
to real world objects1.
RDF
The Semantic Web is based on resources described in the open data
model RDF (Resource Description Framework). RDF is intended
for situations in which information needs to be processed by appli-
cations, rather than being only displayed to people. It provides a
common framework for expressing this information so that it can
be exchanged between applications without loss of meaning.
RDF is based on the idea of identifying things using web
identifiers (URIs), and describing resources in terms of properties
and property values. This enables RDF to represent simple state-
ments about resources as a graph of nodes and arcs representing
the resources and their properties and values2. RDF statements are
triples in the form of a subject, a predicate and an object, with
the subject and object being nodes in the graph and the predicate
being an edge directed from the subject to the object, expressing a
relation between them.
Let us consider an example: “Anakin Skywalker is Luke Skywalk-
er’s father.” This statement can be represented as the following RDF
triple:
RDFS
A vocabulary about family relations can
be formalized in RDFS (RDF Schema).
The basic concepts of RDFS are class-
es and properties as well as hierarchies
among them. For properties, domain
and range restrictions can be specified.
For example, one can express in RDFS
that property ex:fatherOf has class foaf:
Person as rdfs:domain and rdfs:range. This
means that both subject and object of
the statement are of rdf:type foaf:Person. It can also be expressed
that every foaf:Person is an rdfs:subClassOf foaf:Agent. FOAF (friend
of a friend) is an RDF schema that is frequently used to describe
persons.
Semantic Web Reasoning
Semantic Web reasoning is the application of the formal seman-
tics of RDFS, OWL or general rules to infer logical consequences,
making implicit information explicit. An inference is the result of
such a process.
Regarding the Star Wars example, knowing that Anakin
Skywalker is ex:fatherOf Luke Skywalker, and considering the
information in our RDFS vocabulary, possible inferences are that
both Luke and Anakin Skywalker are of rdf:type foaf:Person and
thus foaf:Agent. This is new information that has not been explic-
itly expressed before. A similar but more detailed example is given
below.
RDFS Entailment
The formal semantics of RDFS enables the automation of reason-
ing by the application of the RDFS entailment rules. All rules
have the form: “add a triple to a graph when the graph contains
triples that conform to a given pattern”. Table 1 shows exemplary
RDFS entailment rules. The second column shows the conditions
for a rule to be applied and the third column the corresponding
triples that are to be added to the graph.
Semantic Web Reasoning by Swarm Intelligence
Modern Semantic Web reasoning systems
are confronted with the task to process
growing amounts of distributed, dynamic
resources. My thesis presents a novel, na-
ture-inspired way of tackling this challen-
ge by exploiting the advantages of Swarm
Intelligence. The central idea of the ap-
proach is that self-organising swarms of
autonomous, light-weight entities traverse
RDF graphs by following paths, aiming to
instantiate pattern-based inference rules.
Figure 1. Anakin Skywalker is father of Luke Skywalker
1. http://www.w3.org/2001/sw/
2. http://www.w3.org/TR/rdf-primer/
. http://www.w3.org/TR/rdf-mt/
Table 1: Some RDFS entailment rules; all triples are in the form: sub-
ject (s) predicate (p) object (o). x is a variable and c refers to a class.
comes as an emerging consequence of the basic mechanism. This
simple bottom-up approach differs from the complex top-down
approaches commonly used to deal with hard problems.
• Interactivity: these techniques are used in a continuously run-
ning, typically iterative, process. This implies that, at any time, the
best result found so far can be returned as an answer to the posed
problem. Besides, these algorithms can be made interactive and
incorporate a user into the loop.
• Scalability, robustness and parallelisation: all of these three
advantages result from using a population of co-evolving ele-
ments. Because each member of the population is independent,
algorithms are easy to parallelise. Also, the bad performance of
some members will be compensated by the global efficiency of
the entire population, making the population robust against indi-
vidual failures.
• Self-organisation[13]: this bottom-up principle where a set of
reasoning elements establish the best organisation on their own by
means of local interactions, brings flexibility to the system using
them. Single elements are more able to cope with changes than
global structures. The following example shows how these princi-
ples are applied.
Self-organised vocabulary mappings
The predicates used to create triples are defined in specific vo-
cabularies2 that are meant to be shared accross data sources.
Even considering that some ontologies like FOAF and Dublin
Core are already very well established, every WoD contributor is
free to create his own vocabulary; and this is what Facebook did
with it’s Open Graph schema. The problem is that querying the
WoD implies knowing what vocabularies are used by the triples.
If someone wants to get a list of movies, the query will look like
listing 2: get the title (?title) of something (?s) that is a movie.
So, what happens if rdf:type is used instead of og:type? Nothing is
returned, despite the fact that rdf:type is the equivalent of og:type
in the RDF vocabulary. Rdf and the facebook vocabularies are not
semantically interoperable.
?s og:title ?title
?s og:type “movie”
In order to alleviate this problem, mappings needs to be estab-
lished across the different vocabularies. A centralised approach
would consist in establishing mappings to all the vocabularies
from one central metavocabulary, or the inverse. On the oppo-
site, a decentralised approach would focus on establishing pairwise
mappings directly among the vocabularies. This can be done by
self-organising algorithms, thereby leading to the emergence of
semantic interoperability [2].
Further research going on
The application of evolutionary computing and collective intelli-
gence to design algorithms for the WoD is a recent field of studies
that received a limited but increasing amount of attention so far.
Apart from the illustrated problem of vocabulary alignment, these
techniques have also been applied to solving queries over RDF
data[11], distributing triples among a set of triple stores[7] and
leveraging implicit triples from existing facts[3]. This emerging re-
search field was the main topic of the SOKS symposium that was
organised at the VU on May 28/29. It will also be main the topic
of the 1st International Workshop of Self-Organization and Ap-
proximation Techniques for the Web of Data (SOAT2010) to be
held in September 2010, as part of the rd Future Internet Sym-
posium (FIS2010). If after reading this article you are interested by
the topic, please feel free to join us for SOAT2010!
References
[1] T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Sci-
entific American, May 2001.
[2] Philippe Cudré-Mauroux. Emergent Semantics. EPFL & CRC
Press, 2008.
[3] Kathrin Dentler, Christophe Guéret, and Stefan Schlobach. Se-
mantic Web Reasoning by Swarm Intelligence. In Poster and Demon-
stration Session of the 8th International Semantic Web Conference
(ISWC2009), 2009.
[4] Facebook. The open graph protocol. http:// opengraphprotocol.org.
[5] Dieter Fensel and Frank van Harmelen. Unifying reasoning and
search to web scale. IEEE Internet Computing, 11(2):9695, 2007.
[6] United Kingdom Government. Unlocking innovation. http://data.
gov.uk.
[7] Daniel Graff. Implementation and evaluation of a Swarmlinda sys-
tem. Technical Report Report B-08-06, STI Berlin, June 2008.
[8] Tom Heath. Linked data - connect distributed data across the web,
July 2009. http://linkeddata. org.
[9] Tom Ilube. Introduction to the semantic web at Davos world eco-
nomic forum, January 2009. http: //www.youtube.com/watch?v=k_
zoEeWOBuo.
[10] STI International. The future internet: Service web 3.0, October
2009. http://www.youtube.com/watch?v=off08As3siM.
[11] Eyal Oren, Christophe Guéret, and Stefan Schlobach. Anytime
Query Answering in RDF through Evolutionary Algorithms. In Inter-
national Semantic Web Conference (ISWC), volume 5318, pages 98-
113. Springer Berlin Heidelberg, 2008.
[12] Manu Sporny. Intro to the semantic web, December 2007. http://
www.youtube.com/watch?v=OGg8A2zfWKg.
[13] Tom De Wolf and Tom Holvoet. Emergence versus self-organisa-
tion: Different concepts but promising when combined. In Engineering
Self-Organising Systems, pages 1-15, 2004.
2 The words “Ontology”, “Schema” and “Thesaurus” are used to denote different notions of vocabularies.
3. http://www.few.vu.nl/soks/symposium
De Connectie
nummer 4, jaargang 4, Juni 2010Kathrin Dentler, Vrije Universiteit te Amsterdam
k.dentler@few.vu.nl
The Semantic Web is a web of linked (open) data. There is lots
of data we all use every day,
but it is controlled by appli-
cations that keep it to them-
selves. Examples are social
networking sites such as Face-
book, MySpace or LinkedIn.
In an ideal world, you should
be able to connect to your
friends regardless of the kind of application they are
using. Considering this, the Semantic Web is about two
things. It is about common formats for the integration
and combination of data drawn from diverse sources. It
is also about language for recording how the data relates
to real world objects1.
RDF
The Semantic Web is based on resources described in the open data
model RDF (Resource Description Framework). RDF is intended
for situations in which information needs to be processed by appli-
cations, rather than being only displayed to people. It provides a
common framework for expressing this information so that it can
be exchanged between applications without loss of meaning.
RDF is based on the idea of identifying things using web
identifiers (URIs), and describing resources in terms of properties
and property values. This enables RDF to represent simple state-
ments about resources as a graph of nodes and arcs representing
the resources and their properties and values2. RDF statements are
triples in the form of a subject, a predicate and an object, with
the subject and object being nodes in the graph and the predicate
being an edge directed from the subject to the object, expressing a
relation between them.
Let us consider an example: “Anakin Skywalker is Luke Skywalk-
er’s father.” This statement can be represented as the following RDF
triple:
RDFS
A vocabulary about family relations can
be formalized in RDFS (RDF Schema).
The basic concepts of RDFS are class-
es and properties as well as hierarchies
among them. For properties, domain
and range restrictions can be specified.
For example, one can express in RDFS
that property ex:fatherOf has class foaf:
Person as rdfs:domain and rdfs:range. This
means that both subject and object of
the statement are of rdf:type foaf:Person. It can also be expressed
that every foaf:Person is an rdfs:subClassOf foaf:Agent. FOAF (friend
of a friend) is an RDF schema that is frequently used to describe
persons.
Semantic Web Reasoning
Semantic Web reasoning is the application of the formal seman-
tics of RDFS, OWL or general rules to infer logical consequences,
making implicit information explicit. An inference is the result of
such a process.
Regarding the Star Wars example, knowing that Anakin
Skywalker is ex:fatherOf Luke Skywalker, and considering the
information in our RDFS vocabulary, possible inferences are that
both Luke and Anakin Skywalker are of rdf:type foaf:Person and
thus foaf:Agent. This is new information that has not been explic-
itly expressed before. A similar but more detailed example is given
below.
RDFS Entailment
The formal semantics of RDFS enables the automation of reason-
ing by the application of the RDFS entailment rules. All rules
have the form: “add a triple to a graph when the graph contains
triples that conform to a given pattern”. Table 1 shows exemplary
RDFS entailment rules. The second column shows the conditions
for a rule to be applied and the third column the corresponding
triples that are to be added to the graph.
Semantic Web Reasoning by Swarm Intelligence
Modern Semantic Web reasoning systems
are confronted with the task to process
growing amounts of distributed, dynamic
resources. My thesis presents a novel, na-
ture-inspired way of tackling this challen-
ge by exploiting the advantages of Swarm
Intelligence. The central idea of the ap-
proach is that self-organising swarms of
autonomous, light-weight entities traverse
RDF graphs by following paths, aiming to
instantiate pattern-based inference rules.
Figure 1. Anakin Skywalker is father of Luke Skywalker
1. http://www.w3.org/2001/sw/
2. http://www.w3.org/TR/rdf-primer/
. http://www.w3.org/TR/rdf-mt/
Table 1: Some RDFS entailment rules; all triples are in the form: sub-
ject (s) predicate (p) object (o). x is a variable and c refers to a class.
comes as an emerging consequence of the basic mechanism. This
simple bottom-up approach differs from the complex top-down
approaches commonly used to deal with hard problems.
• Interactivity: these techniques are used in a continuously run-
ning, typically iterative, process. This implies that, at any time, the
best result found so far can be returned as an answer to the posed
problem. Besides, these algorithms can be made interactive and
incorporate a user into the loop.
• Scalability, robustness and parallelisation: all of these three
advantages result from using a population of co-evolving ele-
ments. Because each member of the population is independent,
algorithms are easy to parallelise. Also, the bad performance of
some members will be compensated by the global efficiency of
the entire population, making the population robust against indi-
vidual failures.
• Self-organisation[13]: this bottom-up principle where a set of
reasoning elements establish the best organisation on their own by
means of local interactions, brings flexibility to the system using
them. Single elements are more able to cope with changes than
global structures. The following example shows how these princi-
ples are applied.
Self-organised vocabulary mappings
The predicates used to create triples are defined in specific vo-
cabularies2 that are meant to be shared accross data sources.
Even considering that some ontologies like FOAF and Dublin
Core are already very well established, every WoD contributor is
free to create his own vocabulary; and this is what Facebook did
with it’s Open Graph schema. The problem is that querying the
WoD implies knowing what vocabularies are used by the triples.
If someone wants to get a list of movies, the query will look like
listing 2: get the title (?title) of something (?s) that is a movie.
So, what happens if rdf:type is used instead of og:type? Nothing is
returned, despite the fact that rdf:type is the equivalent of og:type
in the RDF vocabulary. Rdf and the facebook vocabularies are not
semantically interoperable.
?s og:title ?title
?s og:type “movie”
In order to alleviate this problem, mappings needs to be estab-
lished across the different vocabularies. A centralised approach
would consist in establishing mappings to all the vocabularies
from one central metavocabulary, or the inverse. On the oppo-
site, a decentralised approach would focus on establishing pairwise
mappings directly among the vocabularies. This can be done by
self-organising algorithms, thereby leading to the emergence of
semantic interoperability [2].
Further research going on
The application of evolutionary computing and collective intelli-
gence to design algorithms for the WoD is a recent field of studies
that received a limited but increasing amount of attention so far.
Apart from the illustrated problem of vocabulary alignment, these
techniques have also been applied to solving queries over RDF
data[11], distributing triples among a set of triple stores[7] and
leveraging implicit triples from existing facts[3]. This emerging re-
search field was the main topic of the SOKS symposium that was
organised at the VU on May 28/29. It will also be main the topic
of the 1st International Workshop of Self-Organization and Ap-
proximation Techniques for the Web of Data (SOAT2010) to be
held in September 2010, as part of the rd Future Internet Sym-
posium (FIS2010). If after reading this article you are interested by
the topic, please feel free to join us for SOAT2010!
References
[1] T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Sci-
entific American, May 2001.
[2] Philippe Cudré-Mauroux. Emergent Semantics. EPFL & CRC
Press, 2008.
[3] Kathrin Dentler, Christophe Guéret, and Stefan Schlobach. Se-
mantic Web Reasoning by Swarm Intelligence. In Poster and Demon-
stration Session of the 8th International Semantic Web Conference
(ISWC2009), 2009.
[4] Facebook. The open graph protocol. http:// opengraphprotocol.org.
[5] Dieter Fensel and Frank van Harmelen. Unifying reasoning and
search to web scale. IEEE Internet Computing, 11(2):9695, 2007.
[6] United Kingdom Government. Unlocking innovation. http://data.
gov.uk.
[7] Daniel Graff. Implementation and evaluation of a Swarmlinda sys-
tem. Technical Report Report B-08-06, STI Berlin, June 2008.
[8] Tom Heath. Linked data - connect distributed data across the web,
July 2009. http://linkeddata. org.
[9] Tom Ilube. Introduction to the semantic web at Davos world eco-
nomic forum, January 2009. http: //www.youtube.com/watch?v=k_
zoEeWOBuo.
[10] STI International. The future internet: Service web 3.0, October
2009. http://www.youtube.com/watch?v=off08As3siM.
[11] Eyal Oren, Christophe Guéret, and Stefan Schlobach. Anytime
Query Answering in RDF through Evolutionary Algorithms. In Inter-
national Semantic Web Conference (ISWC), volume 5318, pages 98-
113. Springer Berlin Heidelberg, 2008.
[12] Manu Sporny. Intro to the semantic web, December 2007. http://
www.youtube.com/watch?v=OGg8A2zfWKg.
[13] Tom De Wolf and Tom Holvoet. Emergence versus self-organisa-
tion: Different concepts but promising when combined. In Engineering
Self-Organising Systems, pages 1-15, 2004.
2 The words “Ontology”, “Schema” and “Thesaurus” are used to denote different notions of vocabularies.
3. http://www.few.vu.nl/soks/symposium
Page 3
18 19
Let us regard rule rdfs in detail. In the Star Wars example, if
we find the two statements: ex:fatherOf rdfs:range foaf:Person, and
dbpedia:Anakin_Skywalker ex:fatherOf dbpedia:Luke_Skywalker,
we can add the following triple (i.e. inference) to the graph:
dbpedia:Luke_Skywalker rdf:type foaf:Person.
To calculate the RDFS closure over an RDF graph (i.e. all
triples that can be inferred from the RDFS semantics by rea-
soning), the set of entailment rules has to be applied repeatedly
to the triples in the graph.
Swarm Intelligence
Inspired by the collective behaviour of flocks of birds, schools
of fish or social insects such as ants or bees, Swarm Intelligence
is a research field of Collective Intelligence that investigates self-
optimising, complex and highly structured systems.[1] Mem-
bers of a swarm perform tasks in co-operation that go beyond
the capabilities of single individuals, who function by applying
basic stimulus 1response decision rules. Emergent properties
of such systems often cannot be explained by its components
alone. Swarms are characterised by a number of properties,
most importantly:
• lack of central control.
• locality: swarm-members communicate only locally,
either with their direct neighbours or via the environment;
their decisions are based on local information only.
• simplicity: individuals are relatively small in size and
typically simple in behaviour; their decisions, actions and
interactions between each other and with the environment
usually require only few computations.
In addition, swarms may have enormous sizes regarding the
number of their members. These properties result in advan-
tageous characteristics of swarms, such as adaptiveness, flex-
ibility, robustness, scaleability, decentralisation, parallelism and
intelligent system behaviour. These qualities are also desirable
in distributed multi-agent systems, making swarms an attractive
model.
The swarms of my thesis are inspired by Swarm Intel-
ligence as observable in ant colonies. Ants are social insects that
live in colonies of sizes between a few individuals and several
millions. They have colonised all continents except Antarctica
and contribute around 15 to 20% to the total terrestrial animal
biomass.[2] Ant colonies appear as super-organisms because
co-operating individuals with tiny and short-lived minds oper-
ate as a unified entity.[3] Large colonies are highly efficient due
to the self-organisation and functional specialisation of their
members, which leads to a division of labour within the colony.
Various functions may be carried out by individuals of differ-
ent morphology (e.g. minor, median and major workers, sol-
diers, drones and queens) and different ages. Specialisation has
evolved over time as a means of diversifying the community in
order to adapt to the environment.[4]
A characteristic property of ant colonies is their ability to
find shortest paths by indirect communication based on chemi-
cal pheromones that they drop in their environment, e.g. to mark
the shortest route whilst returning from a food-location. They
employ different types of pheromones, which they perceive with
their antennae. Attractive pheromones lead other individuals to
points of interest, alarm pheromones are employed to call for
help, and repulsive propaganda pheromones confuse enemies.
Some pheromones evaporate slower (e.g. best route) than others
(e.g. alarm). Ants have a special colony scent. Pheromones that
have been deposited on the ground act as a shared extended
memory and enable the spontaneous, indirect and asynchro-
nous communication of co-operating insects. This mechanism
is known as stigmergy[stigmergy], a form of self-organisation
which allows for efficient collaboration between extremely sim-
ple agents.
A characteristic property of ant colonies is their ability to
find shortest paths by indirect communication based on chemi-
cal pheromones that they drop in their environment, e.g. to mark
the shortest route whilst returning from a food-location. They
employ different types of pheromones, which they perceive with
their antennae. Attractive pheromones lead other individuals to
points of interest, alarm pheromones are employed to call for
help, and repulsive propaganda pheromones confuse enemies.
Some pheromones evaporate slower (e.g. best route) than others
(e.g. alarm). Ants have a special colony scent. Pheromones that
have been deposited on the ground act as a shared extended
memory and enable the spontaneous, indirect and asynchro-
nous communication of co-operating insects. This mechanism
is known as stigmergy [5], a form of self-organisation which
allows for efficient collaboration between extremely simple
agents.
Reasoning by Swarm Intelligence
It is widely recognised that new adaptive approaches towards
robust, scaleable and distributed reasoning are required to
exploit the full value of ever growing amounts of dynamic
Semantic Web data. Adaptiveness, robustness and scaleability
are among the main properties of swarms. That is the reason
why the combination of reasoning and Swarm Intelligence can
be a promising approach to obtain optimised reasoning perfor-
mance by basic means.
The idea is simple: an RDF graph is seen as a network,
in which each subject and object is a node and each property
an edge. A path is composed of several nodes that are con-
nected by properties (edges). Each of the beasts (members of
the swarm) represents an active reasoning rule (as shown in
Table 1) which is partially instantiated. Swarms of indepen-
dent light-weight beasts now travel from RDF node to RDF
node and from location to location, checking whether they can
derive new information according to the information that they
find on their way. Members of the swarms communicate purely
locally and indirectly. Just like ants do on the soil, beasts drop
pheromones on each visited triple.
Beasts traverse RDF graphs autonomously. When a beast
reaches a node, it has to choose its next node. This decision
can be taken based on pheromones that have been dropped
by previous beasts, or based on the elements of the triples. A
beast prefers triples that match the pattern of its rule. If such
a matching triple is found, the rule will be fired and the new
inferred triple added locally to the graph. Otherwise, an ongo-
ing path is chosen based on pheromones, preferring paths with
little or no pheromones.
Reasoning Model
The RDFS reasoning task is decomposed by distributing com-
plementary entailment rules on the members of the swarm, so
that each individual is only responsible for the application of
one rule. Therefore, one type of beast per RDF(S) entailment
rule is introduced.
Beasts are automatically instantiated by considering the
schema information in the graph. When a concrete schema tri-
ple of a certain pattern is found, a reasoning beast is generated.
Take for example the rule rdfs for range restrictions: whenever
in the schema an triple p rdfs:range x is encountered, a beast
of type rb is created with memory {p,x}. This new beast will
walk the graph, searching for the property p. Whenever prop-
erty p is found, its rule can be applied. Table 2 lists some RDFS
entailment rules, with the patterns that are to be recognised
in column 2, and the generated beasts with their memory in
column .
To continue with our previous example: if we find a triple
ex:fatherOf rdfs:range foaf:Person in the schema, a range-beast
rb with memory {ex:fatherOf, foaf:Person} is created.
Table shows two beasts needed for RDFS reasoning with
their pattern-based inference rules. Reasoning beast rb applies
the semantics of rdfs:range, while rb9 generates the inferences
of rdfs:subClassOf (the beasts have been instantiated as shown
in the table above). They are referred to as range-beast and
subclass-beast.
A beast rb3 is defined as a function rb3 with memory
{p,x}. Let us assume that while traversing a graph, the instanti-
ated range-beast arrives at node o from a node s via an edge
e (s, e, o). If e = p (if the edge is equal to the property in its
memory), it adds the triple (o, rdf:type, x) to G. It moves on to a
node n via an edge f, where (o, f, n) is an element of the graph.
In terms of our example: every time the range-beast finds a
triple containing the property in its memory (in this case: ex:
fatherOf), it adds a new triple to the graph, which states that
the object of the found statement is of the rdf:type of the class
in its memory. So if it finds the triple dbpedia:Anakin_Skywalker
ex:fatherOf dbpedia:Luke_Skywalker, it will add a new inference
dbpedia:Luke_Skywalker rdf:type foaf:Person to the graph.
Another Example
To illustrate the idea, let us consider two simple RDF graphs
about publications cg:ISWC08 (Christophe Guéret at the
International Semantic Web Conference 2008) and fvh:SWP
(Semantic Web Primer by Frank van Harmelen), both mem-
bers of the VU, maintained separately by the respective authors
and linked to public ontologies pub and people about publica-
tions and people.
Table 3: Inference patterns of reasoning beasts.
Table 2: Instantiation of reasoning beasts (rb stands for reasoning beast)
Let us regard rule rdfs in detail. In the Star Wars example, if
we find the two statements: ex:fatherOf rdfs:range foaf:Person, and
dbpedia:Anakin_Skywalker ex:fatherOf dbpedia:Luke_Skywalker,
we can add the following triple (i.e. inference) to the graph:
dbpedia:Luke_Skywalker rdf:type foaf:Person.
To calculate the RDFS closure over an RDF graph (i.e. all
triples that can be inferred from the RDFS semantics by rea-
soning), the set of entailment rules has to be applied repeatedly
to the triples in the graph.
Swarm Intelligence
Inspired by the collective behaviour of flocks of birds, schools
of fish or social insects such as ants or bees, Swarm Intelligence
is a research field of Collective Intelligence that investigates self-
optimising, complex and highly structured systems.[1] Mem-
bers of a swarm perform tasks in co-operation that go beyond
the capabilities of single individuals, who function by applying
basic stimulus 1response decision rules. Emergent properties
of such systems often cannot be explained by its components
alone. Swarms are characterised by a number of properties,
most importantly:
• lack of central control.
• locality: swarm-members communicate only locally,
either with their direct neighbours or via the environment;
their decisions are based on local information only.
• simplicity: individuals are relatively small in size and
typically simple in behaviour; their decisions, actions and
interactions between each other and with the environment
usually require only few computations.
In addition, swarms may have enormous sizes regarding the
number of their members. These properties result in advan-
tageous characteristics of swarms, such as adaptiveness, flex-
ibility, robustness, scaleability, decentralisation, parallelism and
intelligent system behaviour. These qualities are also desirable
in distributed multi-agent systems, making swarms an attractive
model.
The swarms of my thesis are inspired by Swarm Intel-
ligence as observable in ant colonies. Ants are social insects that
live in colonies of sizes between a few individuals and several
millions. They have colonised all continents except Antarctica
and contribute around 15 to 20% to the total terrestrial animal
biomass.[2] Ant colonies appear as super-organisms because
co-operating individuals with tiny and short-lived minds oper-
ate as a unified entity.[3] Large colonies are highly efficient due
to the self-organisation and functional specialisation of their
members, which leads to a division of labour within the colony.
Various functions may be carried out by individuals of differ-
ent morphology (e.g. minor, median and major workers, sol-
diers, drones and queens) and different ages. Specialisation has
evolved over time as a means of diversifying the community in
order to adapt to the environment.[4]
A characteristic property of ant colonies is their ability to
find shortest paths by indirect communication based on chemi-
cal pheromones that they drop in their environment, e.g. to mark
the shortest route whilst returning from a food-location. They
employ different types of pheromones, which they perceive with
their antennae. Attractive pheromones lead other individuals to
points of interest, alarm pheromones are employed to call for
help, and repulsive propaganda pheromones confuse enemies.
Some pheromones evaporate slower (e.g. best route) than others
(e.g. alarm). Ants have a special colony scent. Pheromones that
have been deposited on the ground act as a shared extended
memory and enable the spontaneous, indirect and asynchro-
nous communication of co-operating insects. This mechanism
is known as stigmergy[stigmergy], a form of self-organisation
which allows for efficient collaboration between extremely sim-
ple agents.
A characteristic property of ant colonies is their ability to
find shortest paths by indirect communication based on chemi-
cal pheromones that they drop in their environment, e.g. to mark
the shortest route whilst returning from a food-location. They
employ different types of pheromones, which they perceive with
their antennae. Attractive pheromones lead other individuals to
points of interest, alarm pheromones are employed to call for
help, and repulsive propaganda pheromones confuse enemies.
Some pheromones evaporate slower (e.g. best route) than others
(e.g. alarm). Ants have a special colony scent. Pheromones that
have been deposited on the ground act as a shared extended
memory and enable the spontaneous, indirect and asynchro-
nous communication of co-operating insects. This mechanism
is known as stigmergy [5], a form of self-organisation which
allows for efficient collaboration between extremely simple
agents.
Reasoning by Swarm Intelligence
It is widely recognised that new adaptive approaches towards
robust, scaleable and distributed reasoning are required to
exploit the full value of ever growing amounts of dynamic
Semantic Web data. Adaptiveness, robustness and scaleability
are among the main properties of swarms. That is the reason
why the combination of reasoning and Swarm Intelligence can
be a promising approach to obtain optimised reasoning perfor-
mance by basic means.
The idea is simple: an RDF graph is seen as a network,
in which each subject and object is a node and each property
an edge. A path is composed of several nodes that are con-
nected by properties (edges). Each of the beasts (members of
the swarm) represents an active reasoning rule (as shown in
Table 1) which is partially instantiated. Swarms of indepen-
dent light-weight beasts now travel from RDF node to RDF
node and from location to location, checking whether they can
derive new information according to the information that they
find on their way. Members of the swarms communicate purely
locally and indirectly. Just like ants do on the soil, beasts drop
pheromones on each visited triple.
Beasts traverse RDF graphs autonomously. When a beast
reaches a node, it has to choose its next node. This decision
can be taken based on pheromones that have been dropped
by previous beasts, or based on the elements of the triples. A
beast prefers triples that match the pattern of its rule. If such
a matching triple is found, the rule will be fired and the new
inferred triple added locally to the graph. Otherwise, an ongo-
ing path is chosen based on pheromones, preferring paths with
little or no pheromones.
Reasoning Model
The RDFS reasoning task is decomposed by distributing com-
plementary entailment rules on the members of the swarm, so
that each individual is only responsible for the application of
one rule. Therefore, one type of beast per RDF(S) entailment
rule is introduced.
Beasts are automatically instantiated by considering the
schema information in the graph. When a concrete schema tri-
ple of a certain pattern is found, a reasoning beast is generated.
Take for example the rule rdfs for range restrictions: whenever
in the schema an triple p rdfs:range x is encountered, a beast
of type rb is created with memory {p,x}. This new beast will
walk the graph, searching for the property p. Whenever prop-
erty p is found, its rule can be applied. Table 2 lists some RDFS
entailment rules, with the patterns that are to be recognised
in column 2, and the generated beasts with their memory in
column .
To continue with our previous example: if we find a triple
ex:fatherOf rdfs:range foaf:Person in the schema, a range-beast
rb with memory {ex:fatherOf, foaf:Person} is created.
Table shows two beasts needed for RDFS reasoning with
their pattern-based inference rules. Reasoning beast rb applies
the semantics of rdfs:range, while rb9 generates the inferences
of rdfs:subClassOf (the beasts have been instantiated as shown
in the table above). They are referred to as range-beast and
subclass-beast.
A beast rb3 is defined as a function rb3 with memory
{p,x}. Let us assume that while traversing a graph, the instanti-
ated range-beast arrives at node o from a node s via an edge
e (s, e, o). If e = p (if the edge is equal to the property in its
memory), it adds the triple (o, rdf:type, x) to G. It moves on to a
node n via an edge f, where (o, f, n) is an element of the graph.
In terms of our example: every time the range-beast finds a
triple containing the property in its memory (in this case: ex:
fatherOf), it adds a new triple to the graph, which states that
the object of the found statement is of the rdf:type of the class
in its memory. So if it finds the triple dbpedia:Anakin_Skywalker
ex:fatherOf dbpedia:Luke_Skywalker, it will add a new inference
dbpedia:Luke_Skywalker rdf:type foaf:Person to the graph.
Another Example
To illustrate the idea, let us consider two simple RDF graphs
about publications cg:ISWC08 (Christophe Guéret at the
International Semantic Web Conference 2008) and fvh:SWP
(Semantic Web Primer by Frank van Harmelen), both mem-
bers of the VU, maintained separately by the respective authors
and linked to public ontologies pub and people about publica-
tions and people.
Table 3: Inference patterns of reasoning beasts.
Table 2: Instantiation of reasoning beasts (rb stands for reasoning beast)
Page 4
20 21
These two graphs denote two publications cg:ISWC08 and fvh:
SWP by different authors. The graphs are physically distributed
over the network and can be reasoned and queried over directly.
Their information is extended with schema information:
Given the standard RDF(S) semantics, one can derive that cg:
ISWC08 is a publication, and that the authors are instances
of class people:Person and thus people:Agent. Fig.2 shows the
RDF graph for the first publication cg:ISWC08. Grey lines
denote implicit links derived by reasoning.
For the three schema triples of the previous example,
beasts are created. For the triple people:Person rdfs:subClassOf
people:Agent a beast rb9 is created, which is instantiated with
memory people:Person and people:Agent. The subclass-beast
for the triple pub:InProceedings rdfs:subClassOf pub:Publication is
generated accordingly. For the range-triple pub:author rdfs:range
people:Person, a beast rb is created with memory pub:author
and people:Person. In this example, only one beast per instanti-
ated type is created, in practice there are more. The beasts are
randomly distributed over the graph, say rb to node fvh:SWP,
and similarly for the other two beasts. Beast rb has now two
options to walk. Moving to “SW Primer” will get it to a dead
end, which means it has to walk back via cg:ISWC08 towards,
e.g. person:Oren. At node person:Oren, the walked path is cg:
ISWC08 pub:author person:Oren which means rb’s memory
condition matches with this pattern, and it will add a triple
person:Oren rdf:type people:Person to the graph. When, after
walking other parts of the graph, the subclass-beast rb9 choos-
es to follow the rdf:type link from person:Oren to people:Person, it
finds its memory condition matched and adds person:Oren rdf:
type people:Agent to the graph.
This example highlights the obvious challenges faced
by the approach, most importantly that unnecessary paths need
to be investigated and that the order of visiting beasts is impor-
tant (rb had to be at person:Oren first, before rb9 could find
anything).
Reasoning on distributed data
The proposed paradigm envisages the Semantic Web as a
connected collection of networks of data, which is constantly
updated by beasts. In this set-up, only the active reasoning rules
are moving in the network and not the data, minimising net-
work traffic as schema-data is typically far less numerous than
instance-data. Given some added transition capability between
graph-boundaries, the method converges towards closure (all
possible inferences are inferred). As recurrently revisiting beasts
can more easily deal with added (and even deleted) information
than index-based approaches, it can be claimed that swarm-
based reasoning is more adaptive and robust than other Seman-
tic Web reasoning approaches.
An ideal scenario for Semantic Web reasoning would be
a large-scale distributed set-up, in which interested participants
provide their Semantic Web data in a beast-friendly way on
their own machines. Beast-friendly means that the servers
(being referred to as dataproviders) are enabled to host beasts,
recommend ongoing locations, store pheromones and inferred
triples, and respond to queries for ongoing options. This sup-
ports a distributed publication model in which the granular-
Figure 2: An RDF graph
ity of peer data is shifted from triples to entire datasets and
each peer maintains control of its own data, as proposed in [6].
Data-owners could for example configure their dataproviders
to only answer queries or add inferences that originate from
trusted beasts. In order to enable dataproviders to recommend
suitable destinations to beasts that intend to migrate, a routing
mechanism based on Bloom filters[7] as partial routing tables
can be utilised.
Self-Organisation of Swarms by Pheromones
To avoid redundant graph traversal, beasts choose ongoing
paths based on pheromones. The environment provides beasts
with information about who has visited their current location
before. This can be utilised by the swarm to act as a whole, to
avoid loops and to spread out evenly, which is useful because in
the majority of cases, the swarm has no gain if members of the
same rule-instantiation traverse the same path more than once.
Therefore, paths with little pheromones are preferred.
When a beast reaches a node it has to choose a next ongo-
ing edge. It parses its options, and while no application of its
inference-pattern is found (which would cause it to choose the
corresponding path and fire its rule to add a new triple to the
graph), it applies a pheromone-based heuristic. It categorises all
possible paths into sets of triples: (1) the ones which are promis-
ing because no individual of its rule-instantiation has walked
them before, and (2) the ones that already have been visited.
If unwalked paths are available, one of them is chosen at ran-
dom. Otherwise the ongoing path is chosen probabilistically,
preferring less visited triples. Only pheromones of beasts with
the same rule-instantiation are considered. Further possibilities,
such as considering pheromones of other beasts, are subject to
future research.
Implementation and Experiments
I implemented the proposed reasoning method based on the
AgentScape platform. Each beast is implemented as an agent
that is able to migrate, and each distributed graph managed by
a non-migrating agent called dataprovider, which is linked to
other dataproviders. Beasts communicate with the local data-
provider by exchanging messages.
Based on this implementation, the feasibility and major
characteristics of the approach are evaluated on the basis of
simulation experiments in which beasts calculate the deductive
closure of RDF graphs with respect to RDFS Semantics. These
experiments have two goals: proof of concept, and to obtain a
better understanding of the intrinsic potential and challenges
of the new method. To prove the concept, I ran an RDF(S) rea-
soning system based on fully decentralised agents to calculate
the semantic closure of a number of distributed RDF(S) data-
sets. Regarding the challenges of the new method, the results
basically confirm that tuning a system based on computational
intelligence is a highly complex problem. However, the experi-
ments gave crucial insights into how to proceed in future work:
most importantly on how to improve attract/repulse methods
for guiding swarms to interesting locations within the graph.
Conclusion
The main contribution of my thesis is to introduce a new
swarm-based paradigm for distributed reasoning as graph tra-
versal on the Semantic Web. The major feature of this idea is
that many light-weight autonomous beasts collectively traverse
RDF graphs and match patterns with the triples they are cur-
rently visiting, with the goal to add new triples to the graph.
Robustness is achieved by redundancy, and scalability by dis-
tributing the data towards several locations, or better stated by
leaving the datasets at their natural locations with their own-
ers, and by distributing the reasoning task towards a number of
beasts which communicate purely locally and indirectly. Anoth-
er advantage of this approach is its intrinsic adaptiveness due to
its capability to deal with dynamic data on distributed sources.
Such a reasoning procedure seems ideal for the Semantic Web.
A second contribution of my thesis are various ideas to extend
the framework. For further information, visit
http://beast-reasoning.net.
References:
[1] Blum, C. and Merkle, D., Swarm Intelligence: Introduction and
Applications, 2008
[2] Schultz, T.R., In search of ant ancestors, 2000
[3] Hölldobler, B. and Wilson, E. O., The Superorganism: The
Beauty, Elegance, and Strangeness of Insect Societies, 2008
[4] Seligmann, H., Resource partition history and evolutionary spe-
cialization of subunits in complex systems, 1999
[5] Dorigo, M. and Bonabeau, E. and Theraulaz, G. and others, Ant
algorithms and stigmergy, 2000
[6] Anadiotis, G. and Kotoulas, S. and Siebes, R., An Architecture
for Peer-to-peer Reasoning, 2007
[7] Bloom, B.H, Space/Time Trade-offs in Hash Coding with Allow-
able Errors, 1970
These two graphs denote two publications cg:ISWC08 and fvh:
SWP by different authors. The graphs are physically distributed
over the network and can be reasoned and queried over directly.
Their information is extended with schema information:
Given the standard RDF(S) semantics, one can derive that cg:
ISWC08 is a publication, and that the authors are instances
of class people:Person and thus people:Agent. Fig.2 shows the
RDF graph for the first publication cg:ISWC08. Grey lines
denote implicit links derived by reasoning.
For the three schema triples of the previous example,
beasts are created. For the triple people:Person rdfs:subClassOf
people:Agent a beast rb9 is created, which is instantiated with
memory people:Person and people:Agent. The subclass-beast
for the triple pub:InProceedings rdfs:subClassOf pub:Publication is
generated accordingly. For the range-triple pub:author rdfs:range
people:Person, a beast rb is created with memory pub:author
and people:Person. In this example, only one beast per instanti-
ated type is created, in practice there are more. The beasts are
randomly distributed over the graph, say rb to node fvh:SWP,
and similarly for the other two beasts. Beast rb has now two
options to walk. Moving to “SW Primer” will get it to a dead
end, which means it has to walk back via cg:ISWC08 towards,
e.g. person:Oren. At node person:Oren, the walked path is cg:
ISWC08 pub:author person:Oren which means rb’s memory
condition matches with this pattern, and it will add a triple
person:Oren rdf:type people:Person to the graph. When, after
walking other parts of the graph, the subclass-beast rb9 choos-
es to follow the rdf:type link from person:Oren to people:Person, it
finds its memory condition matched and adds person:Oren rdf:
type people:Agent to the graph.
This example highlights the obvious challenges faced
by the approach, most importantly that unnecessary paths need
to be investigated and that the order of visiting beasts is impor-
tant (rb had to be at person:Oren first, before rb9 could find
anything).
Reasoning on distributed data
The proposed paradigm envisages the Semantic Web as a
connected collection of networks of data, which is constantly
updated by beasts. In this set-up, only the active reasoning rules
are moving in the network and not the data, minimising net-
work traffic as schema-data is typically far less numerous than
instance-data. Given some added transition capability between
graph-boundaries, the method converges towards closure (all
possible inferences are inferred). As recurrently revisiting beasts
can more easily deal with added (and even deleted) information
than index-based approaches, it can be claimed that swarm-
based reasoning is more adaptive and robust than other Seman-
tic Web reasoning approaches.
An ideal scenario for Semantic Web reasoning would be
a large-scale distributed set-up, in which interested participants
provide their Semantic Web data in a beast-friendly way on
their own machines. Beast-friendly means that the servers
(being referred to as dataproviders) are enabled to host beasts,
recommend ongoing locations, store pheromones and inferred
triples, and respond to queries for ongoing options. This sup-
ports a distributed publication model in which the granular-
Figure 2: An RDF graph
ity of peer data is shifted from triples to entire datasets and
each peer maintains control of its own data, as proposed in [6].
Data-owners could for example configure their dataproviders
to only answer queries or add inferences that originate from
trusted beasts. In order to enable dataproviders to recommend
suitable destinations to beasts that intend to migrate, a routing
mechanism based on Bloom filters[7] as partial routing tables
can be utilised.
Self-Organisation of Swarms by Pheromones
To avoid redundant graph traversal, beasts choose ongoing
paths based on pheromones. The environment provides beasts
with information about who has visited their current location
before. This can be utilised by the swarm to act as a whole, to
avoid loops and to spread out evenly, which is useful because in
the majority of cases, the swarm has no gain if members of the
same rule-instantiation traverse the same path more than once.
Therefore, paths with little pheromones are preferred.
When a beast reaches a node it has to choose a next ongo-
ing edge. It parses its options, and while no application of its
inference-pattern is found (which would cause it to choose the
corresponding path and fire its rule to add a new triple to the
graph), it applies a pheromone-based heuristic. It categorises all
possible paths into sets of triples: (1) the ones which are promis-
ing because no individual of its rule-instantiation has walked
them before, and (2) the ones that already have been visited.
If unwalked paths are available, one of them is chosen at ran-
dom. Otherwise the ongoing path is chosen probabilistically,
preferring less visited triples. Only pheromones of beasts with
the same rule-instantiation are considered. Further possibilities,
such as considering pheromones of other beasts, are subject to
future research.
Implementation and Experiments
I implemented the proposed reasoning method based on the
AgentScape platform. Each beast is implemented as an agent
that is able to migrate, and each distributed graph managed by
a non-migrating agent called dataprovider, which is linked to
other dataproviders. Beasts communicate with the local data-
provider by exchanging messages.
Based on this implementation, the feasibility and major
characteristics of the approach are evaluated on the basis of
simulation experiments in which beasts calculate the deductive
closure of RDF graphs with respect to RDFS Semantics. These
experiments have two goals: proof of concept, and to obtain a
better understanding of the intrinsic potential and challenges
of the new method. To prove the concept, I ran an RDF(S) rea-
soning system based on fully decentralised agents to calculate
the semantic closure of a number of distributed RDF(S) data-
sets. Regarding the challenges of the new method, the results
basically confirm that tuning a system based on computational
intelligence is a highly complex problem. However, the experi-
ments gave crucial insights into how to proceed in future work:
most importantly on how to improve attract/repulse methods
for guiding swarms to interesting locations within the graph.
Conclusion
The main contribution of my thesis is to introduce a new
swarm-based paradigm for distributed reasoning as graph tra-
versal on the Semantic Web. The major feature of this idea is
that many light-weight autonomous beasts collectively traverse
RDF graphs and match patterns with the triples they are cur-
rently visiting, with the goal to add new triples to the graph.
Robustness is achieved by redundancy, and scalability by dis-
tributing the data towards several locations, or better stated by
leaving the datasets at their natural locations with their own-
ers, and by distributing the reasoning task towards a number of
beasts which communicate purely locally and indirectly. Anoth-
er advantage of this approach is its intrinsic adaptiveness due to
its capability to deal with dynamic data on distributed sources.
Such a reasoning procedure seems ideal for the Semantic Web.
A second contribution of my thesis are various ideas to extend
the framework. For further information, visit
http://beast-reasoning.net.
References:
[1] Blum, C. and Merkle, D., Swarm Intelligence: Introduction and
Applications, 2008
[2] Schultz, T.R., In search of ant ancestors, 2000
[3] Hölldobler, B. and Wilson, E. O., The Superorganism: The
Beauty, Elegance, and Strangeness of Insect Societies, 2008
[4] Seligmann, H., Resource partition history and evolutionary spe-
cialization of subunits in complex systems, 1999
[5] Dorigo, M. and Bonabeau, E. and Theraulaz, G. and others, Ant
algorithms and stigmergy, 2000
[6] Anadiotis, G. and Kotoulas, S. and Siebes, R., An Architecture
for Peer-to-peer Reasoning, 2007
[7] Bloom, B.H, Space/Time Trade-offs in Hash Coding with Allow-
able Errors, 1970
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
4 Readers on Mendeley
by Discipline
50% Medicine
by Academic Status
25% Post Doc
25% Ph.D. Student
25% Researcher (at a non-Academic Institution)
by Country
50% Netherlands
50% United States



