Sign up & Download
Sign in

Modeling concept evolution: a historical perspective

by Flavio Rizzolo, Yannis Velegrakis, John Mylopoulos, Siarhei Bykau
ER (2009)

Abstract

The world is changing, and so must the data that describes its history. Not surprisingly, considerable research effort has been spent in Databases along this direction, covering topics such as temporal models and schema evolution. A topic that has not received much attention, however, is that of concept evolution. For example, Germany (instance-level concept) has evolved several times in the last century as it went through different governance structures, then split into two national entities that eventually joined again. Likewise, a caterpillar is transformed into a butterfly, while a mother becomes two (maternally-related) entities. As well, the concept of Whale (a class-level concept) changed over the past two centuries thanks to scientific discoveries that led to a better understanding of what the concept entails. In this work, we present a formal framework for modeling, querying and managing such evolution. In particular, we describe how to model the evolution of a concept, and how this modeling can be used to answer historical queries of the form How has concept X evolved over period Y. Our proposal extends an RDF-like model with temporal features and evolution operators. Then we provide a query language that exploits these extensions and supports historical queries.

Cite this document (BETA)

Available from www.springerlink.com
Page 1
hidden

Modeling concept evolution: a historical perspective

Modeling concept evolution: a historical perspective
Flavio Rizzolo, Yannis Velegrakis, John Mylopoulos, and Siarhei Bykau
University of Trento, Trento, 38100, Italy
fflavio, velgias, jm, bykau g@disi.unitn.eu
Abstract. The world is changing, and so must the data that describes its history.
Not surprisingly, considerable research effort has been spent in Databases along
this direction, covering topics such as temporal models and schema evolution.
A topic that has not received much attention, however, is that of concept evolu-
tion. For example, Germany (instance-level concept) has evolved several times
in the last century as it went through different governance structures, then split
into two national entities that eventually joined again. Likewise, a caterpillar is
transformed into a butterfly, while a mother becomes two (maternally-related)
entities. As well, the concept of Whale (a class-level concept) changed over the
past two centuries thanks to scientific discoveries that led to a better understand-
ing of what the concept entails. In this work, we present a formal framework for
modeling, querying and managing such evolution. In particular, we describe how
to model the evolution of a concept, and how this modeling can be used to an-
swer historical queries of the form “How has concept X evolved over period Y”.
Our proposal extends an RDF-like model with temporal features and evolution
operators. Then we provide a query language that exploits these extensions and
supports historical queries.
1 Introduction
Conceptual modelling languages – including the ER Model, UML class diagrams and
Description Logics – are all founded on a notion of “entity” that represents a “thing” in
the application domain. Although the state of an entity can change over its lifetime, enti-
ties themselves are atomic and immutable. Unfortunately, this feature prevents existing
modeling languages from capturing phenomena that involve the evolution of an entity
into something else, such as a caterpillar becoming a butterfly, or Germany splitting off
into two Germanies right after WWII. In these cases, there is general agreement that an
entity evolves into one or more different entities. Moreover, there is a strong relationship
between the two, which is not captured by merely deleting one entity and then creating
another. Evolution applies not only at the instance level, but also at a class-level. For
example, the concept of Whale evolved over the past two centuries in the sense that
whales were considered some sort of fish, but are now recognized as mammals. We are
interested in modeling this kind of evolution relationship, both at the instance- and at
the class-level.
In Databases, considerable amount of research effort has been spent on the develop-
ment of models, techniques and tools for modeling and managing data changes. These
range from data manipulation languages, and maintenance of views under changes [1],
Page 2
hidden
2to schema evolution [2] and mapping adaptation [3]. To cope with the history of data
changes, temporal models have been proposed for the relational [4] and ER [5] models,
for semi-structured data [6], XML [7] and for RDF [8]. Almost in its entirety, existing
work on data changes is based on a data-oriented point of view. It aims at recording
and managing changes that are taking place at the values of the data. What has been
completely overlooked are other types of changes, such as an entity evolving/mutating
into another, or an entity “splitting off” into several others.
In this work, we propose a framework for modeling the evolution of an entity into
one or more different entities, as well as the inter-dependencies among the pre- and post-
evolution entities. The framework allows posing new kinds of queries that previously
could not have been expressed. For instance, we aim at supporting queries of the form:
How has an entity evolved over time? From what other entities has it evolved and into
what others has it resulted? What other entities have affected its evolution over time?
What concepts are indirectly related to it and how? These kinds of queries are of major
importance for many interesting areas:
Historical Studies. Modern historians are interested in studying the history of human
achievements, events and important persons. In addition, they want to understand how
systems, tools, concepts, and techniques have evolved throughout the history. For them
it is not enough to query a data source for a specific moment in history. They need to ask
questions on how concepts and the relationships that exist between them have changed
over time. Historians may be interested in the evolution of countries like Germany,
with respect to territory, political division, etc. Alternatively, they may want to study
the evolution of scientific topics, e.g., how the concept of biotechnology has evolved
from its beginnings as an agricultural technology to the current notion that is coupled
to genetics and molecular biology.
Entity Management. Web application and integration systems are progressively mov-
ing from tuple and value-based towards entity-based solutions, i.e., systems in which the
basic data unit is an entity, independently of its logical modeling [9]. Furthermore, web
integration systems, in order to achieve interoperability, may need to provide unique
identification for the entities in the data they exchange [10]. Unfortunately, entities do
not remain static over time. They evolve, merge, split, get created and disappear. Know-
ing the history of each entity, i.e., how it has been formed and from what, paves the
ground for successful entity management solutions and affective information exchange.
Life Sciences One of the Biology fields is the study of the evolution of the species since
life started on earth. To better understand the secrets of nature, it is important to model
how the different species have evolved, from what, if and how they have disappeared,
and when.
In this work we present a framework for modeling the evolution of concepts over
time and the evolving relationships among them. In particular, our contributions are the
following: (i) we consider a conceptual model enhanced with the concept of a lifetime
of a class/individual and even of a relationship, as first class citizen, (ii) we further
extend the temporal model [8] with consistency conditions and additional constructs
to model merges, splits, and other forms of evolution among concepts and individuals;
(iii) we introduce a query language that allows the answering of queries regarding the
lifetime of concepts as well as the way they have evolved over time, along with other
Page 3
hidden
3Fig. 1. The concepts of Germany, in a temporal model (a) and in our evolutionary model (b)
associated (via evolution) concepts; and finally, (iv) we present a case study in which
we have applied our framework and we describe our findings.
2 A Motivating Example
Consider the knowledge base of a historian that records information about countries and
their political governance. A fraction of that information modeling a part of the history
of Germany is illustrated in Figure 1. In it, the status of Germany at different times
in history has been been modeled through different individuals or through different
instantiations. In particular, since 1871 and until 1945, Germany has been an empire
and later a republic. This change is modeled by the multiple instantiation of Germany
to Empire and Republic respectively. Shortly after the end of the World War II, Germany
was split in four zones1 that in 1949 formed the East and West Germany. These two parts
lasted until 1990, when they were merged to form the republic of Germany as we know
it today, which is modeled through the individual Reunified Germany.
To model the validity of each state of Germany at the different periods, a temporal
model similar to temporal RDF [8] can be used. The model associates to each concept or
individual a specific time frame. The time frames assigned to the individuals that model
Germany are illustrated in Figure 1 through the intervals next to each individual. The
same applies to every property, instance and subclass relationship. Note, for example,
how the instantiation relationship of Germany to Empire has a temporal interval from
1871 to 1918, while the one to Republic has a temporal interval from 1918 to 1945.
It is important for such a knowledge base to contain no inconsistencies, i.e., situations
like having an instantiation relationship with a temporal interval bigger than the interval
of the class it instantiates. Although temporal RDF lacks this kind of axiomatization, a
mechanism for consistency checking needs to be in place for the accurate representation
1 The information of the four zones is not illustrated in the Figure 1
Page 4
hidden
4of concepts and individuals in history. Our solution provides an axiomatization of the
temporal RDF that guarantees the consistency of the historical information recorded in
a knowledge base.
Consider now the case of a historian that is interested in studying the history of Ger-
many. Using traditional query mechanisms, the historian will only be able to retrieve
the individual Germany. Using keyword based techniques, she may be able to also re-
trieve the remaining individuals modeling Germany at different times in history, but this
under the assumption that each such individual contains the keyword “Germany”. Ap-
plying terminology evolution [11], it is possible to infer that the four terms for Germany,
i.e., Germany, East Germany, West Germany, and Reunified Germany refer to related
concepts regardless of the keywords that appear in the terms. Yet, in neither case the
historian will be able to reconstruct how the constituents of Germany have changed
over time. She will not be able to find that the East and West Germany were made by
splitting the pre-war Germany and its parts, neither that East and West Germany were
the same two that merged to form the Reunified Germany.
We propose the use of explicit constructs to allow the modeling of the sequential
conceptual evolution in knowledge bases. A small illustration of such constructs (split,
merge, etc.) can be found in Figure 1(b).
Assuming such constructs are in place, let the historian pose the question whether
Berlin is part of the Reunified Germany. Since Berlin is modeled to be part of Germany
and not of Reunified Germany, the historian will get a negative answer. Since the Re-
unified Germany was formed (through some intermediate steps) from the initial pre-war
Germany, one would expect the answer to the query of whether Berlin is part of Reuni-
fied Germany to be positive. Furthermore, a typical historian query is the one asking
for the list of all the leaders of a country in history. Gerhard Schroeder was a leader
of the West Germany and of the Reunified Germany, but not of the pre-War Germany.
However, the historian should be able to express queries that can collect the leaders of
Germany from all its phases through time.
3 Temporal Knowledge Bases
We consider an RDF-like data model. The model is expressive enough to represent ER
models and the majority of ontologies and schemas that are met in practice [12]. It
does not include certain OWL Lite features such as sameAs or equivalentClass, since
these features have been considered to be out of the main scope of this work and their
omission does not restrict the functionality of the model.
We assume the existence of an infinite set of resources U , each with a unique re-
source identifier (URIs), and a set of literals L. A property is a relationship between
two resources. Properties are considered resources.
We consider the existence of the special properties: rdfs:type, rdfs:domain, rdfs:range,
rdfs:subClassOf and rdfs:subPropertyOf, which we denote for simplicity as type, dom,
rng, subc, and subp, respectively. The set U contains three special resources: rdfs:Property,
rdfs:Class and rdf:Thing, which we denote for simplicity as Prop, Class and Thing, re-
spectively. The semantics of these resources as well as the semantics of the special
properties are those defined in RDFS [13].
Page 5
hidden
5Resources are described by a set of triples that form a knowledge base.
Definition 1. A knowledge base is a tuple hU;L; T i, whereU  U ,L  L, T  U
UfU [Lg, and U contains the resources rdfs:Property, rdfs:Class, and rdf:Thing. The
set of classes of the knowledge base  is the set C=fx j 9hx; type; rdfs:Classi 2 Tg.
Similarly, the set of properties is the set P=fx j 9hx; type; rdfs:Propertyi 2 Tg. The
set P must contain the RDFS properties type, dom, rng, subc, and subp. A resource i
is said to be an instance of a class c 2 C (or of type c) if 9hi; type; ci 2 T . The set of
instances is the set I=fi j 9hi; type; yi 2 Tg.
A knowledge base can be equivalently represented as a hypergraph, which we call
RDF graph. Thus, the rest of the paper, we will use the terms knowledge base and RDF
graph, equivalently.
Definition 2. An RDF graph of a knowledge base  is an hypergraph in which nodes
represent resources and literals and the edges represent triples.
Example 1. Figure 1(a) is an ilustration of an RDF Graph. The nodes Berlin and Ger-
many represent resources. The edge labeled part-of between them represents the triple
hBerlin,part-of,Germanyi. The label of the edge, i.e., part-of, represents a property.
To support the temporal dimension in our model, we adopt the approach of Tem-
poral RDF [8] which extends RDF by associating to each triple a time frame. Unfor-
tunately, this extension is not enough for our goals. We need to add time semantics
not only to relationships between resources (what the triples represent), but also to
resources themselves by providing temporal-varying classes and individuals. This ad-
dition and the consistency conditions we introduce below are our temporal extensions
to the temporal RDF data model.
We consider time as a discrete, total order domain T in which we define different
granularities. Following [14], a granularity is a mapping from integers to granules (i.e.,
subsets of the time domain T) such that contiguous integers are mapped to non-empty
granules and granules within one granularity are totally ordered and do not overlap.
Days and months are examples of two different granularities, in which each granule is
a specific day in the former and a month in the latter. Granularities define a lattice in
which granules in some granularities can be aggregated in larger granules in coarser
granularities. For instance, months are a coarser granularity than days because every
granule in the former (a month) is composed of an integer number of granules in the
latter (days). In contrast, months are not coarser (nor finer) than weeks because not
every month is composed by an integer number of weeks (and obviously no week is
compose of an integer number of months).
Even though we model time as a point-based temporal domain, we use intervals
as abbreviations of sets of instants whenever possible. An ordered pair [a; b] of time
points, with a; b granules in a granularity, and a  b, denotes the closed interval from
a to b. As in most temporal models, the current time point will be represented with the
distinguished word Now. We will use the symbol T to represent the infinite set of all
the possible temporal intervals over the temporal domain T, and the expressions i:start
and i:end to refer to the starting and ending time points of an interval i. Given two
Page 6
hidden
6intervals i1 and i2, we will denote by i1vi2 the containment relationship between the
intervals in which i2:starti1:start and i1:endi2:end.
Definition 3. A temporal knowledge base T is a tuple hU;L; T; i, where hU;L; T i
is a knowledge base and  is function that maps every resource r 2 U to a temporal
interval in T . The temporal interval is also referred to as the lifespan of the resource.
The expressions r:start and r:end denote the start and end points and of the interval,
respectively. The temporal graph of T is the RDF graph of hU;L; T i enhanced with
the temporal intervals on the edges and nodes.
For a temporal knowledge base to be semantically meaningful, the lifespan of the
resources need to satisfy certain conditions. For instance, it is not logical to have an in-
dividual with a lifespan that does not contain any common time points with the lifespan
of the class it belongs to. Temporal RDF does not provide such a mechanism, thus, we
are introducing the notion of a consistent temporal knowledge base.
Definition 4. A consistent temporal knowledge base is a temporal knowledge base
=hU;L; T; i that satisfies the following conditions:
1. 8r 2 L [ fProp;Class;Thing; type; dom; rng; subc; subpg: (r)=[0; Now];
2. 8hd; p; ri 2 T : (hd; p; ri)v(d) and (hd; p; ri)v(r);
3. 8hd; p; ri 2 T with p 2 ftype; subc; subpg: (d)v(r).
Intuitively, literals and the special resources and properties defined in RDFS need to
be valid during the entire lifespan of the temporal knowledge base, which is [0; Now]
(Condition 1). In addition, the lifespan of a triple needs to be within the lifespan of the
resources that the triple associates (Condition 2). Finally, the lifespan of a resource has
to be within the lifespan of the class the resource instantiates, and any class or property
needs to be within the lifespan of its superclasses or superproperties (Condition 3).
4 Modeling evolution
Apart from the temporal dimension that was previously described, two new dimensions
need to be taken into consideration to successfully model evolution: the mereological
and the causal.
Mereology [15] is a sub-discipline in philosophy that deals with the ontological
investigation of the part-whole relationships. It is used in our model to capture the
parthood relationship between concepts in a way that is carried forward as concepts
evolve. Such a relationship is modeled through the introduction of the special property
part-of, which is reflexive, antisymmetric and transitive. A property part-of is defined
from a resource x to a resource y if the concept modeled by resource x is part of the
concept modeled by resource y. Apart from this special semantics, part-of behaves
as any other property in a temporal knowledge base. For readability and presentation
reasons, we may use the notation x
part-of
! y to represent the existence of a triple<x,part-
of,y> in the set T of a temporal knowledge base  .
To capture the causal relationships, i.e., the interdependency between two resources
whose lifespans do not overlap, we additionally introduce the notion of becomes,
Page 7
hidden
7Fig. 2. Liaison examples
which is an antisymmetric and transitive relation. For similar reasons as before, we
may use the notation x
becomes
! y to represent the fact that (x; y) 2 becomes. Intuitively,
x
becomes
! y means that the concept modeled by resource y originates from the concept
modeled by resource x. In addition to not overlapping, we require that (x):end <
(y):starts.
To effectively model evolution, we introduce the notion of a liaison. A liaison be-
tween two concepts is another concept that keeps the former two linked together in
time by means of part-of and becomes. A liaison is part of at least one of the con-
cepts it relates and has some causal relationship to a part of the other. This situation is
graphically depicted in Figure 2 (a). The boxes A and B represent the two main con-
cepts whereas the x and y represent two of their respective parts. Figure 2 (b) illustrates
the special case in which x and y are actually the same concept. Figure 2 (c) (respec-
tively, (d)) shows the special case in which y (respectively, x) is exactly the whole of B
(respectively, A) rather than a part of it.
Definition 5 (Liaison). Let A, B be two concepts of a temporal knowledge base with
(A):start < (B):start, and x, y concepts for which x
part-of
! A and y
part-of
! B. A con-
cept x (or y) is said to be a liaison between A and B if either x
becomes
! y, or x
part-of
! B,
or x
becomes
! B, or A
becomes
! y.
To model the different kinds of evolution events that may exist, we introduce four
evolution terms: join, split, merge, and detach.
[join] The join term, denoted as join(c1 : : : cn; c; t), models the fact that when the new
concept c is born at time t, every part of it comes from a part (or the whole) of some
concept in fc1,: : :,cng. In particular:
– (c):start=t
– 8x s.t. x
part-of
! c: 9ci s.t. x is a liaison between ci and c, with 1in
[split] The split term, denoted as split(c; c1 : : : cn; t), is used to model the fact that
when the lifespan of a concept c ends at a time t, every part of c becomes either one of
the new independent concepts c1,: : :,cn, or one of their parts. In particular:
– (c):end=t
– 8x s.t. x
part-of
! c: 9ci s.t. x is a liaison between c and ci, with 1in
Page 8
hidden
8[merge] The merge term, denoted as merge(c; c0; t), is used to model the fact that
the lifespan of a concept c ends at a time t and at least a part of it becomes part of an
existing concept c0. In particular:
– (c):end=t
– 9x s.t. x
part-of
! c: x is a liaison between c and c0
[detach] The detach term, denoted as detach(c; c0; t), is used to model the fact that
the new concept c0 is formed at a time t with at least one part from c. In particular:
– (c0):starts=t
– 9x s.t. x
part-of
! c: x is a liaison between c and c0
We record the becomes relation and the evolution terms in the temporal knowl-
edge base as evolution triples hc; term; c0i, where term is one of the special evolution
properties becomes, join, split, merge, and detach. Evolution properties are meta-
temporal, i.e., they describe how the temporal model changes, and thus their triples do
not need to satisfy the consistency conditions in Definition 4. A temporal knowledge
base with a set of evolution properties and triples defines an evolution base.
Definition 6. An evolution base ET is a tuple hU;L; T;E; i, where hU;L; T; i is
a temporal knowledge base, U contains a set of evolution properties, and E is a set
of evolution triples. The evolution graph of ET is the temporal graph of hU;L; T; i
enhanced with edges representing the evolutions triples.
The time in which the evolution event took place does not need to be recorded ex-
plicitly in the triple since it can be retrieved from the lifespan of the involved concepts.
For instance, detach(Kingdom of the Netherlands, Belgium, 1831) is modeled as the
triple: hKingdom of the Netherlands, detach, Belgiumi with (Belgium):start= 1831.
For recording evolution terms that involve more than two concepts, e.g. the join,
multiple triples are needed. We assume that the terms are indexed by their time, thus, the
set of (independent) triples that belong to the same terms can be easily detected since
they will all share the same start or end time in the lifespan of the respective concept.
For instance, split(Germany, East Germany, West Germany, 1949) is represented in
our model through the triples hGermany, split, East Germanyi and hGermany, split,
West Germanyi with  (East Germany):start = (West Germany):start = 1949.
Note that the evolution terms may entail facts that are not explicitly represented in
the knowledge base. For instance, the split of Germany into West and East implies the
fact that Berlin, which is explicitly defined as part of Germany, becomes part of either
East or West. This kind of reasoning is beyond the scope of the current work.
5 Query Language
We define a navigational query language to traverse temporal and evolution edges in
an evolution graph. This language is analogous to nSPARQL [16], a language that
Page 9
hidden
9E [[self ]] := fhx; x; (x)i jx 2 U[Lg
E [[self ::r]] := fhr; r; (r)ig
E [[next]] := fhx; y; (t)i j t = hx; z; yi 2 Tg
E [[next ::r]] := fhx; y; (t)i j t = hx; r; yi 2 Tg
E [[edge]] := fhx; y; (t)i j t = hx; y; zi 2 Tg
E [[edge ::r]] := fhx; y; (t)i j t = hx; y; ri 2 Tg
E [[node]] := fhx; y; (t)i j t = hz; x; yi 2 Tg
E [[node ::r]] := fhx; y; (t)i j t = hr; x; yi 2 Tg
E [[e-edge]] := fhx; e-axis; [0; Now]i j t = hx; e-axis; zi 2 Eg
E [[e-node]] := fhe-axis; y; (t)i j t = hz; e-axis; yi 2 Eg
E [[e-axis]] := fhx; y; [0; Now]i j 9 t = hx; e-axis; yi 2 Eg
E [[forward]] :=
S
e-axis E [[e-axis]]
E [[backward]] :=
S
e-axis E [[e-axis
1]]
E [[self :: [exp]]] := fhx; x; (x)\ Ii jx 2 U[L, 9hx; z; Ii 2 P[[exp]], (x)\I 6= ;g
E [[next :: [exp]]] := fhx; y; (t)\Ii j t = hx; z; yi 2 T , 9hz; w; Ii 2 P[[exp]], (t)\I 6= ;g
E [[edge :: [exp]]] := fhx; y; (t)\Ii j t = hx; y; zi 2 T , 9hz; w; Ii 2 P[[exp]], (t)\I 6= ;g
E [[node :: [exp]]] := fhx; y; (t)\Ii j t = hz; x; yi 2 T , 9hz; w; Ii 2 P[[exp]], (t)\I 6= ;g
E [[axis1]] := fhx; y; (t)i j hy; x; (t)i 2 E [[axis]]g
E [[t-axis1 ::r]] := fhx; y; (t)i j hy; x; (t)i 2 E [[t-axis ::r]]g
E [[t-axis1 :: [exp]]] := fhx; y; (t)i j hy; x; (t)i 2 E [[t-axis :: [exp]]]g
E [[exp[I]]] := fhx; y; I\I 0i j hx; y; I 0i 2 E [[exp]] and I\I 0 6= ;g
E [[exp=e-exp]] := fhx; y; I2i j 9hx; z; I1i 2 E [[exp]], 9hz; y; I2i 2 E [[e-exp]]g
E [[exp=t-exp]] := fhx; y; I1\I2i j 9hx; z; I1i 2 E [[exp]], 9hz; y; I2i 2 E [[t-exp]] and
I1\I2 6= ;g
E [[exp1jexp2]] := E [[exp1]] cupE [[exp2]]
E [[exp]] := E [[self ]] [ E [[exp]] [ E [[exp=exp]] [ E [[exp=exp=exp]] [ : : :
P[[e-exp]] := E [[e-exp]]
P[[t-exp]] := E [[t-exp]]
P[[t-exp=exp]] := fhx; y; I1\I2i j 9hx; z; I1i 2 E [[t-exp]], 9hz; y; I2i 2 E [[exp]] and
I1\I2 6= ;g
P[[e-exp=exp]] := fhx; y; I1i j 9hx; z; I1i 2 E [[e-exp]], 9hz; y; I2i 2 E [[exp]]g
P[[exp1jexp2]] := E [[exp1jexp2]]
P[[exp]] := E [[exp]]
t-exp 2 ft-axis; t-axis ::r; t-axis :: [exp]; t-axis[I]g and
e-exp 2 fe-axis; e-axis :: [exp]; e-axis[I]; forward;backwardg
Fig. 3. Formal semantics of nested evolution expressions
extends SPARQL with navigational capabilities based on nested regular expressions.
nSPARQL uses four different axes, namely self ; next; edge; and node, for navi-
gation on an RDF graph and node label testing. We have extended the nested regular
expressions constructs of nSPARQL with temporal semantics and a set of five evolution
axes, namely join; split; merge; detach; and becomes that extend the traversing
capabilities of nSPARQL to the evolution edges. The language is defined according to
Page 10
hidden
10
the following grammar:
exp := axis j t-axis :: a j t-axis :: [exp] j exp[I] j exp=exp j expjexp j exp
where a is a node in the graph, I is a time interval, and axis can be either forward,
backward, e-edge; e-node, a t-axis or an e-axis, with t-axis 2 fself ; next;
edge; nodeg and e-axis 2 fjoin; split;merge; detach; becomesg.
The evaluation of an evolution expression exp is given by the semantic function E
defined in Figure 3. E [[exp]] returns a set of tuples of the form hx; y; Ii such that there
is a path from x to y satisfying exp during interval I . For instance, in the evolution
base of Figure 1, E [[self :: Germany=next :: head=next :: type]] returns the tuple
hGermany, Chancellor, [1988; 2005]i. It is also possible to navigate an edge from a node
using the edge axis and to have a nested expression [exp] that functions as a predicate
that the preceding expression must satisfy. For example, E [[self [next:: head=self ::
Gerhard Schro¨der]]] returns hReunified Germany, Reunified Germany, [1990; 2005]i and
hWest Germany, West Germany, [1988; 1990]i.
In order to support evolution expressions, we need to extend nSPARQL triple pat-
terns with temporal and evolution semantics. In particular, we redefine the evaluation
of an nSPARQL triple pattern (?X; exp; ?Y ) to be the set of triples hx; y; Ii that result
from the evaluation of the evolution expression exp, with the variables X and Y bound
to x and y, respectively. In particular:
E [[(?X; exp; ?Y )]] := f((?X); (?Y )) j (?X) = x and (?Y ) = y and hx; y; Ii 2 E [[exp]]g
Our language includes all nSPARQL operators such as AND, OPT, UNION and FIL-
TER with the same semantics as in nSPARQL. For instance:
E [[(P1 AND P2)]] := E [[(P1)]] ./ E [[(P2)]]
where P1 and P2 are triple patterns and ./ is the join on the variables P1 and P2 have
in common. A complete list of all the nSPARQL operators and their semantics can be
found in [16].
6 Application scenarios
Consider an evolution base that models how countries have changed over time in terms
of territory, political division, type of government, etc. Classes are represented by ovals
and instances by boxes. A small fragment of that evolution base is illustrated as a graph
in Figure 4.
Germany is a concept that has changed several times along history. The country
was unified as a nation-state in 1871 and the concept of Germany first appears in our
historical knowledge base as Germany at instant 1871. After WWII, Germany was di-
vided into four military zones (not shown in the figure) that were merged into West and
East Germany in 1949. This is represented with two split edges from the concept of
Germany to the concepts of West Germany and East Germany. The country was finally
reunified in 1990, which is represented by the coalescence of the West Germany and
East Germany concepts into Unified Germany via two merge edges. These merge and
Page 11
hidden
11
Fig. 4. The evolution of the concepts of Germany and France (full black lines represent gov-
ernedBy properties)
split constructs are also defined in terms of the parts of the concepts they relate. For in-
stance, a part-of property indicates that Berlin was part of Germany during [1871; 1945].
Since that concept of Germany existed until 1945 whereas Berlin exists until today, the
part-of relation is carried forward by the semantics of split and merge into the concept
of Reunified Germany. Consider now a historian who is interested in finding answers to
a number of evolution-related queries.
[Example Query 1]: How has the notion of Germany changed over the last two cen-
turies in terms of its constituents, government, etc.? The query can be expressed in our
extended query language as follows:
Select ?Y; ?Z; ?W
(?X; self ::Reunified Germany=backward[1800; 2000]=; ?Y ) AND
(?Y; edge; ?Z) AND (?Z; edge; ?W )
The query first binds ?X to Reunified Germany and then follows all possible evo-
lution axes backwards in the period [1800; 2000]. All concepts bound to ?Y are in an
evolution path to Reunified Germany, namely Germany, West Germany, and East Ger-
many. Note that, since the semantics of an  expression includes self (see Figure 3),
then Reunified Germany will also bind ?Y . The second triple returns in ?Z the name of
the properties of which ?Y is the subject, and finally the last triple returns in ?W the
objects of those properties. By selecting ?Y; ?Z; ?W in head of the query, we get all
evolutions of Germany together with their properties.
Page 12
hidden
12
Fig. 5. The evolution of the concept of Biotechnology
[Example Query 2]: Who was the head of the German government before and after the
unification of 1990? The query can be expressed as follows:
Select ?Y
(?X; self ::Reunified Germany=join1[1990]=next :: head[1990]; ?Y ) AND
(?Z; self ::Reunified Germany=next :: head[1990]; ?Y )
The first triple finds all the heads of state of the Reunified Germany before the
unification by following join1[1990] and then following next :: head[1990]. The
second triple finds the heads of state of the Reunified Germany. Finally, the join on ?Y
will bind the variable only to those heads of state that are the same in both triples, hence
returning the one before and after the mentioned unification.
Consider now the evolution of the concept of biotechnology from a historical point
of view. According to historians, biotechnology got its current meaning (related to
molecular biology) only after the 70s. Before that, the term biotechnology was used
in areas as diverse as agriculture, microbiology, and enzyme-based fermentation. Even
though the term “biotechnology” was coined in 1919 by Karl Ereky, a Hungarian engi-
neer, the earliest mentions of biotechnology in the news and specialized media refer to a
set of ancient techniques like selective breeding, fermentation and hybridization. Even
though from the 70s the dominant meaning of biotechnology has been closely related
to genetics, it is possible to find a large body of news and other media articles from the
60s to the 80s that use the term biotechnology to refer to an environmentally friendly
technological orientation entirely unrelated to genetics but closely related to bioprocess
engineering. Not only the use of the term changed from the 60s to the 90s, but also the
two different meanings coexisted in the media for almost two decades.
Page 13
hidden
13
Figure 5 illustrates the evolution of the notion of biotechnology since the 40s. The
notions of Selective breeding, Fermentation and Hybridization existed from an indeter-
minate time until now and in the 40s joined the new topic of Conventional Biotech,
which groups ancient techniques like the ones mentioned above. Over the next decades,
Conventional Biotech started to include more modern therapies and products such as
Cell Therapy, Penicillin and Cortisone. At some point in the 70s, the notions of Cell
Therapy and Bioprocess Engineering matured and detached from Conventional Biotech
becoming independent concepts. The three concepts coexisted in time during part of
the 70s and the 80s. During the 80s, the notion of Conventional Biotech stopped be-
ing used and all its related concepts became independent topics. In parallel to this, the
new topic of Biotech started to take shape. We could see Biotech as an evolution of
the former Conventional Biotech but using Genetic Engineering instead of conventional
techniques. Concepts and terms related to the Biotech and Genetic Engineering topics
are modeled with a part-of property.
[Example Query 3]: Is the academic discipline of biotechnology a wholly new tech-
nology branch or has it derived from the combination of other disciplines? Which ones
and how? The query requires to follow evolution paths and return the traversed triples in
addition to the nodes in order to answer the question of “how”. The query is expressed
in our language as follows:
Select ?Y; ?Z, ?W
(?X; self ::Biotechnology=backward; ?Y ) AND
(?Y; e-edge=self ; ?Z) AND (?Z; e-node; ?W )
The first triple binds ?Y to every node reachable from Biotechnology following
evolution edges backwards. Then, for each of those nodes, including Biotechnology,
the second triple gets all the evolution axes of which the bindings of ?Y are sub-
jects whereas the third triple get the objects of the evolution axes. This query re-
turns, (Biotech, evolve1, Conventional Biotech), (Conventional Biotech, join1, Hy-
bridization), (Conventional Biotech, join1, Fermentation), and (Conventional Biotech,
join1, Selective Breeding).
[Example Query 4]: Which scientific and engineering concepts and disciplines are
related to the emergence of cell cloning? We interpret “related” in our model as be-
ing immediate predecessors/successors and “siblings” in the evolution process. That is,
from a concept we find its immediate predecessors by following all evolution edges
backwards one step. From the results we follow all evolution edges forward on step and
we get the original concept and some of its “siblings”. We do the same in the opposite
direction following first evolution edges forward one step and then backwards. Based
on this notion, we can express the query as follows:
Select ?Y; ?Z, ?W
(?X; self ::Cell Cloning; ?Y ) AND ?(Y;backward jbackward=forward; ?Z)
AND (?Y; forward j forward=backward; ?W )
The first triple will just bind Cell Cloning to ?Y . The second triple follows the
detach edge back to Cloning, and then the detach edge forward to Molecular Cloning.
Page 14
hidden
14
The third triple starts again from Cell Cloning and follows the join edge forward to
Bioprocess Engineering and then the detach edge backwards to Conventional Biotech.
All these concepts will be returned by the query.
7 Related Work
Managing time in databases: Temporal data management [17] has been extensively
studied in the relational paradigm [4] (A unified formal framework for studying tem-
poral databases in general and temporal query languages in particular can be found
in [18].) For semi-structured data, one of the first models for managing historical infor-
mation as an extension of the Object Exchange Model (OEM) was introduced in [6].
Versioning schemes for XML were first proposed in [19] and [20], the latter specif-
ically for web data warehouses. These approaches store only the information of the
entire document at some point in time and then use edit scripts and change logs to
reconstruct versions of the entire document. This may lead to large overheads when
processing queries that span multiple versions. In contrast, [21] and [7] maintain a sin-
gle temporal document from which versions of any document fragment (even single
elements) can be extracted directly when needed. XQuery, a temporal extension to
XQuery, was introduced in [22]. In this proposal, temporal queries are translated into
XQuery and evaluated by a standard XQuery engine. A survey on temporal extensions
to the Entity-Relationship (ER) model is presented in [5].
Change management in ontologies: There is a fundamental distinction between an
actual update and a revision in knowledge bases [23]. An update brings the knowledge
base up to date when the world it models has changed. In contrast, a revision incor-
porates new knowledge from a world that has not changed. Our evolution framework
models updates since it describes how real-world concepts have changed over time.
The survey in [24] provides a thorough classification of the types of changes that occur
in ontologies. However, their definition of evolution is completely different from ours:
they view evolution as a special case of versioning. Moreover, there is no entry in their
taxonomy that corresponds to the kind of concept evolution we developed in this work.
An approach to model revision in RDF ontologies has been presented in [25]. The first
temporal extension to RDF was presented in [8]. Their model is an RDF graph with
temporal intervals associated only to triples. Orthogonal to the concept evolution prob-
lem is that of terminology evolution which studies how the terms describing the same
notion in the domain of discourse have change over time [11]. Similarly to versioning
in databases, ontology versioning study the problem of maintaining changes in ontolo-
gies by creating and managing different variants of it [26]. The main limitation of such
approaches is the same as in databases: large overheads when processing queries that
span multiple versions. Closer to our work is the proposal in [27] for modeling changes
in geographical information systems (GIS). They use the notion of a change bridge
to model how the area of geographical entities (countries, provinces, etc.) evolve. A
change bridge is associated with a change point and indicates what concepts become
obsolete, what new concepts are created, and how the new concepts overlap with older
ones. Since they focus on the GIS domain, they are not able to model causality and
different types of evolution involving abstract concepts beyond geographical entities.
Page 15
hidden
15
8 Conclusion
In this work we studied the novel problem of concept evolution, i.e., how the semantics
of an entity changes over time. In contrast to temporal models and schema evolution,
concept evolution deals with mereological and causal relationships between concepts.
Recording concept evolution also allows users to pose queries on the history of a con-
cept. We presented a framework for modeling evolution as an extension of temporal
RDF with mereology and causal properties expressed with a set of evolution terms.
Furthermore, we presented an extension of nSPARQL that allows navigation over the
history of the concepts. Finally, we applied our framework in two real world scenarios,
the history of Germany and the evolution of biotechnology, and we showed how queries
of interest can be answered using our proposed language.
References
1. Blakeley, J., Larson, P.A., Tompa, F.W.: Efficiently Updating Materialized Views. In: SIG-
MOD. (1986) 61–71
2. Lerner, B.S.: A Model for Compound Type Changes Encountered in Schema Evolution.
TODS 25(1) (March 2000) 83–127
3. Velegrakis, Y., Miller, R.J., Popa, L.: Preserving mapping consistency under schema changes.
VLDB J. 13(3) (2004) 274–293
4. Soo, M.D.: Bibliography on Temporal Databases. SIGMOD Record 20(1) (1991) 14–23
5. Gregersen, H., Jensen, C.S.: Temporal entity-relationship models - a survey. IEEE Trans.
Knowl. Data Eng. 11(3) (1999) 464–497
6. Chawathe, S., Abiteboul, S., Widom, J.: Managing historical semistructured data. In: Theory
and Practice of Object Systems, Vol 5(3). (1999) 143–162
7. Rizzolo, F., Vaisman, A.A.: Temporal XML: modeling, indexing, and query processing.
VLDB Journal 17(5) (2008) 1179–1212
8. Gutie´rrez, C., Hurtado, C.A., Vaisman, A.A.: Temporal RDF. In: ESWC. (2005) 93–107
9. Dong, X., Halevy, A.Y., Madhavan, J.: Reference Reconciliation in Complex Information
Spaces. In: SIGMOD. (2005) 85–96
10. Palpanas, T., Chaudhry, J., Andritsos, P., Velegrakis, Y.: Entity Data Management in
OKKAM. In: SWAP DEXA Workshop. (2008) 729–733
11. Tahmasebi, N., Iofciu, T., Risse, T., Niederee, C., Siberski, W.: Terminology evolution in
web archiving: Open issues. In: International Web Archiving Workshop. (2008)
12. Lenzerini, M.: Data Integration: A Theoretical Perspective. In: PODS. (2002) 233–246
13. W3C: RDF vocabulary description language 1.0: RDF Schema. http://www.w3.org/TR/rdf-
schema/ (2004)
14. Dyreson, C.E., Evans, W.S., Lin, H., Snodgrass, R.T.: Efficiently supported temporal granu-
larities. IEEE Trans. Knowl. Data Eng. 12(4) (2000) 568–587
15. Keet, C.M., Artale, A.: Representing and reasoning over a taxonomy of part-whole relations.
Applied Ontology 3(1-2) (2008) 91–110
16. Pe´rez, J., Arenas, M., Gutierrez, C.: nSPARQL: A navigational language for RDF. In:
International Semantic Web Conference. (2008) 66–81
17. Snodgrass, R.T., Ahn, I.: Temporal Databases. IEEE Computer 19(9) (1986) 35–42
Page 16
hidden
16
18. Chomicki, J.: Temporal query languages: a survey. In: Proceedings of the 1st International
Conference on Temporal Logic,LNAI 827. (1994) 506–534
19. Chien, S., Tsotras, V., Zaniolo, C.: Efficient management of multiversion documents by
object referencing. In: Proceedings of the 27th International Conference on Very Large Data
Bases, Rome, Italy (2001) 291–300
20. Marian, A., Abiteboul, S., Cobena, G., Mignet, L.: Change-centric management of versions
in an XML warehouse. In: Proceedings of the 27th VLDB Conference, Rome, Italy (2001)
581–590
21. Buneman, P., Khanna, S., Tajima, K., Tan, W.: Archiving scientific data. In: Proceedings of
the 2002 ACM SIGMOD International Conference on Management of Data, Madison, USA
(2002) 1–12
22. Gao, C., Snodgrass, R.: Temporal slicing in the evaluation of XML queries. In: Proceedings
of the 29th International Conference on Very Large Data Bases, Berlin, Germany (2003)
632–643
23. Katsuno, H., Mendelzon, A.O.: On the difference between updating a knowledge base and
revising it. In: International Conference on Principles of Knowledge Representation and
Reasoning. (1991) 387–394
24. Flouris, G., Manakanatas, D., Kondylakis, H., Plexousakis, D., Antoniou, G.: Ontology
change: classification and survey. Knowledge Eng. Review 23(2) (2008) 117–152
25. Konstantinidis, G., Flouris, G., Antoniou, G., Christophides, V.: On RDF/S ontology evolu-
tion. In: SWDB-ODBIS. (2007) 21–42
26. Klein, M.C.A., Fensel, D.: Ontology versioning on the semantic web. In: SWWS. (2001)
75–91
27. Kauppinen, T., Hyvo¨nen, E.: Modeling and reasoning about changes in ontology time se-
ries. In: Ontologies: A Handbook of Principles, Concepts and Applications in Information
Systems. (2007) 319–338

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

8 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
50% Ph.D. Student
 
13% Post Doc
 
13% Researcher (at an Academic Institution)
by Country
 
25% Italy
 
25% Germany
 
13% Switzerland