Sign up & Download
Sign in

Calculating the Trust of Event Descriptions using Provenance

by Davide Ceolin, Paul Groth, Willem Robert Van Hage
Proceedings Of The SWPM 2010 Workshop At The 9th International Semantic Web Conference ISWC2010 (2010)

Abstract

Understanding real world events often calls for the integration of data from multiple often conflicting sources. Trusting the description of an event requires not only determining trust in the data sources but also in the integration process itself. In this work, we propose a trust algorithm for event data based on Subjective Logic that takes into account not only opinions about data sources but also how those sources were integrated. This algorithm is based on a mapping between a general event ontology, the Simple Event Model, and a model for describing provenance, the Open Provenance Model. We discuss the results of applying the algorithm to a use case from the maritime domain.

Cite this document (BETA)

Available from Paul Groth's profile on Mendeley.
Page 1
hidden

Calculating the Trust of Event Descriptions using Provenance

Calculating the Trust of Event Descriptions using
Provenance
Davide Ceolin, Paul Groth, Willem Robert van Hage
VU University Amsterdam
Amsterdam, The Netherlands
Email: dceolin,pgroth,wrvhage@few.vu.nl
Abstract—Understanding real world events often calls for
the integration of data from multiple often conflicting sources.
Trusting the description of an event requires not only determining
trust in the data sources but also in the integration process itself.
In this work, we propose a trust algorithm for event data based
on Subjective Logic that takes into account not only opinions
about data sources but also how those sources were integrated.
This algorithm is based on a mapping between a general event
ontology, the Simple Event Model, and a model for describing
provenance, the Open Provenance Model. We discuss the results
of applying the algorithm to a use case from the maritime domain.
I. INTRODUCTION
The hijacking of a freighter in the Gulf of Aden, a goal not
given in the semi-final of the World Cup and the sudden rise
of the stock market, understanding these events requires the
integration of data from multiple data sources using complex
data integration routines. For example, to build a description
of why a goal was not given there may be the report of the
referee, the comments of managers and players, and video
from different camera angles. The veracity of the resulting
description of the event is dependent not only upon the trust
one has in the original data sources (e.g. players, referees,
cameras) but also in trust one has in the process used to create
the event description.
Therefore, in this work, we investigate the generation of
trust ratings for event descriptions. These trust ratings are
calculated with respect to not only the original sources but
also to the data integration process itself. Thus, the trust
calculations consider the whole of an event description’s
provenance. The trust algorithms presented here rely on the
novel combination of two existing representations, the Simple
Event Model (SEM) for event representations and the Open
Provenance Model (OPM) for representing the data integration
process itself. Based on a mapping of these models, we
develop a trust algorithm using subjective logic. We apply our
trust algorithm to a use case from maritime shipping. The
contributions of this paper are twofold:
1) A mapping of SEM to OPM.
2) An algorithm for computing trust ratings for event
descriptions based on their provenance.
The rest of this paper is organized as follows. We begin
with a description of a use case for data integration for
event descriptions, which we use as a running example. This
is followed by a discussion of both OPM and SEM and a
presentation of the mapping between these models. Based on
this mapping, we then present an algorithm for producing trust
ratings for event descriptions. After this we present initial
results applied to the use case. We end with a discussion of
related work and a conclusion.
II. USE CASE
Our use case comes from the maritime domain. It is of
vital importance for the coast guard, harbors and ships to know
where ships are and their vicinity to one another. Being able to
track ships helps avoid collisions, manage traffic in crowded
harbors, respond to emergency, and facilitate navigation. To
enable this tracking, a common system has been developed
called The Automatic Identification System (AIS) has been
developed.1 The International Maritime Organization requires
that the system be installed on all ships over 300 tons. AIS
works by exchanging messages between local ships and radar
stations. This messages provide a range of information about
the ship including its geoposition, navigation status, speed, ra-
dio call sign, the ship’s unique registered id (MMSI - Maritime
Mobile Service Identity ), a permanent id (IMO - International
Maritime Organization Number) and the ship’s dimensions.
Such messages are subject to manipulation, corruption, and
errors impacting their reliability [1]. For example, the unique
registered id may be falsely programmed into the system, the
message may be corrupted during radio transmission, or users
may fail to update their navigation status.
An AIS message or series of AIS messages describe the
event of a ship’s movement or change in status. Often, one
would like to extract information about that event. Here, we
use a simple example of extracting what nation the ship is
registered to. This is known as the flag of the ship. This is
actually a difficult problem as both the MMSI number as well
as the IMO number report the country of origin and these
may disagree because the MMSI can change when the ship
is reregistered. Indeed, one report identified 26 vessels using
the same MMSI number [1]. In addition, country information
may be garbled or incorrectly entered. Thus, if part of the event
description is a flag then it is important to be able to determine
whether to trust that flag information based on the information
sources and how those sources were combined. SEM is already
1http://www.uais.org
Page 2
hidden
being used to represent ship movement events based on AIS
messages [2]. However, we need to add additional information
to represent the provenance of the description. For this, we
turn to a model designed specifically for provenance, namely,
OPM.
III. MAPPING SEM AND OPM
In order to connect the description of an event to how that
description was created, we need to be able to interpret the
event description with respect to its provenance. To do so, we
provide a mapping from the model used for event descriptions
(SEM) to the model used for describing provenance (OPM).
To facilitate the explaination of this mapping, we first briefly
introduce both SEM and OPM.
A. SEM, the Simple Event Model
SEM [2], [3], [4] is a schema for the semantic represen-
tation of events. It does not deal with the way data about
events is stored, but only with the events themselves. SEM
focuses on modeling the most common facets of events: who,
what, where, and when. These are represented respectively
by the SEM core classes sem:Actor, sem:Place, sem:Object
and sem:Time. SEM is a model that takes into account the
inherent messiness of the Web by making as little semantic
commitment (e.g. disjointness statements, functional proper-
ties) as possible. Every instance of one of the core classes can
be assigned types from domain vocabularies. For example,
the sem:Event instance ex:world cup 2010 can be assigned
a sem:eventType dbpedia:FIFA Club World Cup. Any prop-
erty of SEM, including the type properties, is optional and
duplicable. SEM and Simple Knowledge Organization Sys-
tem (SKOS) [5] mappings to related models (DOLCE-Lite,
CIDOC-CRM, SUMO, LODE, F, Dublin Core, FOAF, and
the CultureSampo and Queen Mary’s event models) can be
accessed online.2. Additionally, through sem:View an event
can have multiple, perhaps conflicting, descriptions.
B. OPM, the Open Provenance Model
OPM is a community developed model for the exchange
of provenance information [6]. It stems from a series of in-
teroperability challenges (Provenance Challenges) held by the
provenance research community to understand and exchange
provenance information between systems. While not as com-
prehensive as some other provenance models such as ProPreO
[7] , OPM provides a common technology-agnostic layer of
agreement between systems. OPM was used by 15 teams
during the Third Provenance Challenge [6]. These teams used
a variety of provenance management systems ranging from
those focused on workflow systems to those concentrating on
operating systems. Thus, by using OPM, we aim to be able to
apply our trust algorithm to a variety of systems.
OPM represents the provenance of an object as a directed
acyclic graph with the possibility for annotations on the graph.
The graph is interpreted as being causal. An OPM graph
2http://semanticweb.cs.vu.nl/2009/11/sem/
SEM SKOS relation OPM
sem:Event skos:closeMatch opm:Process
sem:Actor skos:closeMatch opm:Artifact
sem:Actor skos:broadMatch opm:Agent
sem:Place skos:closeMatch opm:Artifact
sem:Place skos:broadMatch opm:Agent
sem:Role skos:closeMatch opm:Role
sem:View skos:closeMatch opm:Account
TABLE I
MAPPING BETWEEN OPM AND SEM
captures the past execution of a process. The graph consists
of three types of nodes:
 An opm:Artifact, which is an immutable piece of state,
for example, a file.
 An opm:Process, which is perform actions upon artifacts
and produce new artifacts. An example of a process
would be the execution of the Unix command cat on two
files to produce a new concatenated file.
 An opm:Agent, which controls or enables a process. An
example of an agent would be the operating system that
a process runs in or the person who started the process.
These nodes are linked by five kinds of edges repre-
senting dependency between nodes. An opm:Process used
and generated opm:Artifacts, represented by opm:used and
opm:wasGeneratedBy edges. These artifacts can be given an
opm:Role with respect to an opm:Process distinguishing it
from other artifacts. Note, an opm:Process can only produce
one opm:Artifact. Dependency between opm:Artifacts is repre-
sented using opm:wasDerivedFrom while dependency between
opm:Processes is represented using the opm:wasTriggeredBy
edge. Finally, the control of an opm:Process by an opm:Agent
is expressed using the opm:wasTriggeredBy edge.
Each part of an OPM graph can be labeled with an account,
which allows the same execution to be explained from different
perspectives. For example, one could describe the generation
of an event description with more or less detail.
C. Mapping
Given an event description in SEM, we would like to
determine how its facets should map to OPM so that we can
describe the facet’s provenance using OPM. For example, if an
event occurred at a sem:Place, we could consider that place an
opm:Artifact. This idea is in-line with the notion of sub-typing
within OPM [6]. We could say that a particular opm:Artifact
has a type of sem:Place. To represent the mapping, we use
SKOS, a W3C standard for describing and mapping vocab-
ularies (i.e. concept schemes). The use of SKOS follows the
practice of the W3C Provenance Incubator Group in defining
a set of Provenance Vocabulary Mappings [8]. We refer the
readers to [5] for the exact definitions of skos:closeMatch,
skos:relatedMatch and skos:broadMatch.
Our mapping focuses on the nodes within the OPM graph
and not the edges, because our aim is to describe the prove-
nance of both the event description and its facets. We now
discuss the mapping shown in Table I in more detail.
Page 3
hidden
For sake of space, we report only a mapping at class level.
A more comprehensive mapping detailed with justifications is
available on the web.3
Each sem:Event is an action with some duration, this maps
very closely with the notion of an opm:Process. SEM has the
notion of an sem:Actor, the entities or people who take part or
are involved in an event. If an sem:Actor is directly a cause
or is vital for an event to take place, we would model this
as an opm:Artifact used by an opm:Process. For people who
were not directly involved but enabled the event to take place,
the sem:Actor would be mapped to an opm:Agent. By way
of example, the crew on board a ship would be modeled as
opm:Artifacts while the CEO of the shipping company can be
seen as an opm:Agent controlling the event of sending an AIS
message. Similar reasoning applies to mapping sem:Place to
OPM.
The sem:Role signifies the role a particular SEM facet plays
in an event, just as an opm:Role signifies the role a particular
opm:Artifact plays with respect to an opm:Process. Addition-
ally, an sem:View allows for multiple descriptions of the same
event, which maps naturally to an opm:Account describing
different descriptions of the same execution. Finally, the time
of an sem:Event can be easily mapped to the time annotations
present on OPM edges.
IV. TRUST RATING ALGORITHM
We now describe our trust rating algorithm. The algorithm
works upon OPM graphs. We assume that the provenance of
each facet of an event description is captured. Before applying
the algorithm, the above mapping is applied in order to view
the facets of the SEM event description in OPM.
A. Subjective Logic
Subjective logic [9] is a probabilistic logic that provides
the basis for the evidential reasoning part of our trust model.
Subjective logic’s probabilities are based on the Beta proba-
bility distribution [10]. These probabilities represent the level
of belief, disbelief and uncertainty about each proposition
we encounter, according to the evidence we own and are
represented by means of “opinions” about such propositions.
This logic provides also operators for combining such
opinions in order to handle the combination of opinions that
reflect the application of propositional logic operators to the
proposition which are objects of such opinions.
B. Opinions
The key concept of Subjective Logic logic is the concept of
“opinion”, which is the probability of correctness of a propo-
sition according to a certain source. An opinion according
to source x about proposition y is represented as !xy . More
precisely, opinions are depicted as follows:
!yx(b; d; u; a)
3http://bit.ly/c8A3A7
which is a representation equivalent to the Beta probability
distribution, where :
b =
positive evidence
total evidence+ n
d =
negative evidence
total evidence+ n
u =
n
total evidence+ n
a =
1
n
b,d,u are, respectively, belief, disbelief and uncertainty. a is the
a priori probability, that is the probability that the proposition
is correct, in absence of evidence. n is the cardinality of the
set of possible outcomes, so it may be equal to 2, in case of
a boolean outcome, or higher.
The expected value of the probability distribution repre-
sented by an opinion is given by:
E = b+ a u
The expected value E will be used as trust value about
propositions. E is the “trust value”. Given the evidence that
we have collected about a certain proposition, E represents the
probability that the proposition is true. Therefore it numeri-
cally quantifies our trust in the proposition.
Consider the following example. There are 249 countries in
the world. Thus, the number of possible outcomes for a flag
is 249. For sake of simplicity, we consider the 35 most used
flags, which cover 99% of ships.
Here we consider three sources of information about the
flag. Two sources say the flag is Italy. One source says the
flag is the USA. Each of these opinions is secure according to
each source, therefore they assume the pattern !xy

1; 0; 0; 1n

.
!s1italy

1; 0; 0;
1
35

!s2italy

1; 0; 0;
1
35

!s3usa

1; 0; 0;
1
35

These are the opinions about the three sources, where n = 2
because, unlike previous opinions that represent the probability
that a given value is correct (in a multivalued distribution),
these opinions represent the probability that the source is
reliable (therefore in this case the probability distribution is
binomial):
!xs1

8
12
;
2
12
;
2
12
;
1
2

!xs2

9
12
;
1
12
;
2
12
;
1
2

!xs3

5
12
;
5
12
;
2
12
;
1
2

Procedure opinion source(Ai) of Algorithm Fig. 1 (Lines
26 - 30) builds opinions for given Artifact Ai.
C. Weighting (discounting) operators
Subjective Logic allows to build networks of opinions. The
logic allows opinions to be transitive, but such opinions are
weighted on the reputation of the source when evaluated by
third parties. Given the opinion of z on y (!zy), and the opinion
of x on z (!xz ), the opinion that x derives from z about y is
represented by !x:zy . The operator for weighting opinions is:
!xz
!
z
y = !
x:z
y (b
x
zb
z
y; b
x
zd
z
y; d
x
z + u
x
z + b
x
zu
z
y; a
z
y)
Page 4
hidden
Following the previous example, the weighted opinions
become:
!x:s1italy

8
12
; 0;
4
12
;
1
35

!x:s2italy

9
12
; 0;
3
12
;
1
35

!x:s3usa

5
12
; 0;
7
12
;
1
35

All the disbeliefs have value zero as consequence of starting
from secure opinions.
On line 31 of Algorithm of Figure 1, procedure opin-
ion sources(Ai) returns opinions about artifact Ai weighted
on reputation of the sources.
D. Fusion operator
Finally, the logic provides a range of operators which allow
us to combine opinions about the same proposition (fusion).
The fusion of n opinions given by sources x1; :::; xn about the
same proposition y is represented as !x1:::xny . The operator
works as follows:
!siy  !
sj
y = !
sisj (
bsiy  uB + b
sj
y  usiy
usiy + u
sj
y u
si
y  u
sj
y
;
dsiy  u
sj
y + d
sj
y  usiy
usiy + u
sj
y u
si
y  u
sj
y
;
usiy  u
sj
y
usiy + u
sj
y u
si
y  u
sj
y
; asiy )
Since si’s and sj’s opinion have the same object, their a
priori probability is the same (asiy = a
sj
y ).
 is an operator that returns cumulative fusion of opinions
[11] (since we assume that they are independent opinions,
evidence that these opinions resemble are cumulated).
Continuing our example, by merging the previous opinions
regarding the two outcomes (Italy and USA), we obtain:
!x:s1x:s2italy (0:77; 0:14; 0:09; 0:5) !
x:s3
usa (0:42; 0:42; 0:16; 0:5)
Line 21 of algorithm of Figure 1 iteratively merges opinions
about the Artifact of interest.
E. Trust Rating Algorithm
Here we present an algorithm for calculating the trust value
of an event facet, represented by artifacts. However, because
of its recursive nature, the algorithm is directly applicable to
event descriptions.
Given an artifact to calculate the trust value of, our first step
is determine the opinion of any source that directly generates
the artifact’s value. The following steps are:
 take the amount of evidence given by each source about
each possible value for the artifact. Usually each source
gives one output, but if more are available, then the
resulting opinion is stronger (see subsect. IV.B).
 weight the opinions given by the sources according to the
opinion on the source itself (in turn, based on previous
evidence about its trustworthiness, see subsection IV.C)
 merge all the opinions (see subsection IV.D)
Generalizing, we can say that:
 given an artifact A;
 given a set of sources: s1, ... sn
(1) proc tv (Ai) 
(2) res := null
(3) for Pk : Ai opm : wasGeneratedBy Pk do
(4) for Aj : Pk opm : used Aj do
(5) if Ai opm : wasDerivedFrom Aj
(6) then
(7) if res = null
(8) then res := tv(Aj)
(9) else res := F (Pk)(res; tv(Aj))
(10) fi
(11) fi
(12) od
(13) od
(15) comment: res = !
8Ajx:tv(Aj)
v(Ai)
(16) for si : 9vsi(Ai) 6= ; do
(17) if res = null
(18) then res := opinion sources(Ai)
(19) else res := res opinion sources(Ai)
(20) fi
(21) od
(22) return res
(23) end
(24) proc opinion source(Ai)
(25) for si : vsi(Ai) 6= null do
(26) record evidence(vsi(Ai))
(27) od
(28) return !x:siv(Ai)
(29) end
(30) proc (t; si; Ai)
(31) e : e 2 domain ^ dist(e; vsi(Ai) =
(32) = min8e02domain(dist(e0; vsi(Ai))
(33) d := dist(e; vsi(Ai))
(34) record !sivsi (Ai)=e(b
0
si 
1
d ; 0; (d
0
si + u
0
si)  (1
1
d ); a
0
si)
(35) comment: b0si ; d
0
si ; u
0
si ; a
0
si are the
(36) comment: projections of bsi ; dsi ; usi ; asi
(37) end
(38) proc dist
(39) comment: distance between two points
(40) comment: (e.g. Euclidean).
(41) proc record evidence
(42) comment: stores evidence in memory .
(43) proc record
(44) comment: stores opinion in memory.
(45) proc !
(46) comment: returns an opinion
(47) comment: based on stored evidence.
(48) comment: Possible values for F:
(49) F (concat) = ^
(50) F (lookup(t)) = ^  (t)
Fig. 1. Trust Rating Algorithm
 given a function v(si; A) = vsi(A)
 given opinions on the sources !xsi(bsi ; dsi ; usi ; asi)
We compute the opinion on a event facet from each source:
Page 5
hidden
PIMO
Flag
MMSI
P
Flag
Value
IMO
Source Source
MMSI
Trust
Value
Trust
Value
Trust
Value
Trust
Value
wasDerivedFrom
wasDerivedFrom
used
used
Source
MMSI
Trust
Value
Trust
Value
Trust
Value
wasGeneratedBy
Fig. 2. Provenance and Trust graphs about the flag value of a ship. The left graph reconstructs the provenance of the flag field. The graph on the right,
starting from the first ancestors of the flag field, collects all the evidence about all the artifacts involved in the provenance trail (of the left graph) and gradually
merges them.
!x:sivsi (A)
(bsi ; 0; dsi + usi ; asi)
Once we have the opinions about the values from each
source, we merge them in order to obtain an opinion for each
value from all sources:
M
vsi
!x:sivsi (A)
(bsi ; 0; dsi + usi ; asi)
F. Integration process
We want to consider not only sources that directly provide
the artifact value but also which process is used during
integration to generate the artifact. Therefore, in case the
artifact is not a leaf node, then we need to merge the (eventual)
opinions computed taking into account the provenance of
the artifact. For example, considering the example of Fig. 2,
we see that the trust level of the root node depends on the
trust levels of the leaf nodes, combined according to how the
process manipulates them. Therefore, we should use a functor
that, allows us to apply proper functions to the trust values
of the input artifacts, according to the kind of process that
manipulates them.
Two examples are provided in Algorithm 1: in case of a
concatenation process (that takes as inputs two strings and
outputs their concatenation), then all the trust value equally
contribute to determining the outcome and therefore they are
merged by conjunction. In case of a lookup process (that takes
as inputs a key and a value table, and outputs the value in
the table corresponding to the key), then before calculating
the conjunction of the trust values, we project them into the
space of the possible values, possibly smaller than the space
of plausible ones. Moreover, in case the value we face does
not fall into the range of possible values, then we consider the
value or values closer to it and belonging to the sset of possible
values. Clearly, we weight these contributions according to the
distance to the given value.
V. APPLYING THE ALGORITHM
We now discuss how, by taking advantage of both prove-
nance and background knowledge, the trust algorithm can
produce more precise trust ratings.
One important feature of the algorithm is that, by means
of provenance, we encorporate in our algorithm also semantic
information.
This way, we restrict the domain of possible value for each
field to the range of real, meaningful values. For instance, if the
nationality field of a MMSI is a 3 digit code, then there are 103
possible values, since any cypher would be equally probable in
each of the 3 positions. By taking into account the meaning
(semantics) of the MMSI, the cardinality of the set of the
plausible values would restrict to 35 (considering the countries
which own 99% of the ships). This means that if we own 10
positive evidence and we restrict the plausibilty set from 1000
to 35, then the trust value rises from E = 101010 +
1
1000
1000
1010 =
0; 0189::: to E = 1045 +
1
35 
35
45 = 0; 3143::::. Note that the
MMSI field is retrieved via traversing the provenance graph.
Another important feature of the algorithm is the usage
of provenance information. Because of this, we enlarge the
availability of evidence at disposal for calculating trust values.
In fact, we don’t limit to the use of direct evidence about
the facets we have to evaluate, but we consider also evidence
about elements used in the process that lead us to our facets.
Therefore, we check whether these initial elements were
correct and whether they were combined properly in order to
produce the facet we are analyzing. Once we have this result,
we can compare it with evidence directly referred to the facet
we are evaluating, obtaining an improvement of the precision
of the trust value.
Continuing the previous example, if we have also sources
that provide a value for the nation, knowing that the national
code is determined by looking it up into a trusted table, then by
applying the Trust Ranking Algorithm, we obtain the following
trust value: E = 2045 +
1
35 
35
45 = 0; 4667::::.
If we adopt a conservative approach and accept only facets
which trust value is above a certain threshold, then this change
reduces the amount of errors due to false negatives.
VI. RELATED WORK
Trust is a widely explored topic within a variety of areas
within computer science including security, intelligent agents,
software engineering and distributed systems. Here, we focus
on those works directly touching upon the junction of trust,
provenance and the Semantic Web. For a readable overview

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

12 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
58% Ph.D. Student
 
17% Post Doc
 
8% Researcher (at an Academic Institution)
by Country
 
25% Netherlands
 
17% United Kingdom
 
8% Germany