Sign up & Download
Sign in

Linked Data Quality Assessment through Network Analysis

by Christophe Guéret, Paul Groth, Claus Stadler, Jens Lehmann
The 10th International Semantic Web Conference (2011)
  • ISSN: 00223085

Abstract

Object. Acute subdural hematoma (SDH) is one of the most lethal forms of intracranial injury; several risk factors predictive of a worse outcome have been identified. Emerging research suggests that patients with coagulopathy and intracerebral hemorrhage have a worse outcome than patients without coagulopathy but with intracerebral hemorrhage. The authors sought to determine if such a relationship exists for patients with acute SDH. Methods. The authors conducted a retrospective analysis of consecutive patients admitted to a neurosciences intensive care unit with acute SDH over a 4-year period (January 1997-December 2001). Demographic data, laboratory values, admission source, prior functional status, medical comorbidities, treatments received, and discharge disposition were recorded, as were scores on the Acute Physiology, Age, and Chronic Health Evaluation III (APACHE III). Coagulopathy was defined as an internal normalized ratio > 1.2 or a prothrombin time 12.7 seconds. Univariate and multivariate analyses were performed on 244 patients to determine factors associated with worse short-term outcomes. Results. The authors identified 248 patients with acute SDH admitted to the neurointensive care unit during the study period, of which 244 had complete data. Most were male (61%), and the mean age of the study population was 71.3 15 years (range 20-95 years). Fifty-three patients (22%) had coagulopathy. The median APACHE III score was 43 (range 11-119). Twenty-nine patients (12%) died in the hospital. Independent predictors of inhospital death included APACHE III score (odds ratio OR 4.4, 95% confidence interval CI 1.4-13.4, p = 0.011) and coagulopathy (OR 2.7, 95% CI 1.1-7.1, p = 0.037). Surgical evacuation of acute SDH was associated with reduced inhospital deaths (OR 0.2, 95% CI 0.1-0.6, p = 0.003). Conclusions. Coagulopathy is independently associated with inhospital death in patients with acute SDH. Time to treatment to correct coagulopathy using fresh frozen plasma and/or vitamin K was prolonged.

Cite this document (BETA)

Available from Paul Groth's profile on Mendeley.
Page 1
hidden

Linked Data Quality Assessment through Network Analysis

Linked Data Quality Assessment through
Network Analysis
Christophe Gueret1, Paul Groth1, Claus Stadler2, and Jens Lehmann2
1 Free University Amsterdam, De Boelelaan 1105, 1081HV Amsterdam
fc.d.m.gueret,p.t.grothg@vu.nl
2 University of Leipzig, Johannisgasse 26, 04103 Leipzig
fcstadler,lehmanng@informatik.uni-leipzig.de
Abstract. Linked Data is at its core about the setting of links between
resources. Links provide enriched semantics, pointers to extra informa-
tion and enable the merging of data sets. However, as the amount of
Linked Data has grown, there has been the need to automate the cre-
ation of links and such automated approaches can create low-quality links
or unsuitable network structures. In particular, it is dicult to obtain
an overall picture as to whether the links introduced improve or dimin-
ish the quality of Linked Data. In this work, we present an extensible
framework that allows for the assessment of Linked Data quality from
a global perspective. We test the framework on a set of known quality
links and show that it e ectively detects quality changes.
Keywords: linked data, quality assurance, network analysis
1 Introduction
Linked Data features a distributed publication model that allows for any data
publisher to semantically link to other resources on the Web. Because of this
open nature, several mechanisms have been introduced to semi-automatically
link resources on the Web of Data (e.g. Silk [4].) This partially automated intro-
duction of links begs the question as to which links are improving the quality of
the Web of Data or are just adding clutter. The notion of link quality is impor-
tant on the Web of Data, particularly, because unlike the regular Web, there is
not a human deciding based on context whether a link is useful or not. Instead,
automated agents (with currently less capabilities) must be able to make these
decisions. The quality of a link can be assessed in a number of di erent ways.
Here, we want to look at the global rami cations of link creation on the Web of
Data (or subsets of it).
In order to address this , we rst must de ne quality at a global scale. Given
that the Web of Data is a network, we can asses its global properties using net-
work measures. These statistical techniques provide summaries of the network
along di erent dimensions. These dimensions can be used to get an overall per-
spective on the quality of the network. [1] analyzed a number of networks in
Page 2
hidden
II
nature and noted similar characteristics for all, including the web graph. Natu-
rally, analysing quality via network measures has limitations: It is always possible
to create an arti cial meaningless data set, which would score high in several
network criteria. For this reason, we view our approach as an addition, rather
than a replacement, to other quality assurance methods.
Unfortunately, many network measures that can be used to check the quality
of the Web of Data are computationally complex. This limits the applicability
of these measures as the Web of Data continues to expand. However, instead
of computing the exact value of network measures, one can compute some lo-
cal approximations of them. We measure the quality of a set of Linked Data
under change along using approximate network measures and compare the re-
sults against a set of goal statistics. For each measure, we provide the capability
to identify the particular links that are causing the deviation from the ideal.
This allows designers of link creation mechanisms to adjust their approaches
using ne-grained information. Importantly, these measures are encapsulated in
a framework, LINK-QA, which allows for the de nition and addition of new
quality measures. We now describe the framework and some initial experiments
using it.
2 LINK-QA analysis framework
LINK-QA is a framework for assessing the quality of the Web of Data through
the analysis of its constituent parts. The framework is scalable and extensible:
the metrics applied are generic and share a common set of basic requirements,
making it easy to incorporate new metrics. Additionally, metrics are computed
using only the local network of a resource and are thus parallelisable by design.
It di ers from other approaches [3,2] in that it takes a network centric approach.
The framework consists of ve components, \Select", \Construct", \Extend",
\Analyse" and \Compare". These components are assembled together in the
form of a work
ow. We now discuss each of the ve components.
Select This component is responsible for selecting the set of resources to be
evaluated. This can be done through a variety of mechanisms including sampling
the Web of Data, using a user speci ed set of resources, or looking at the set
of resources to be linked by a link discovery algorithm. These set of resources
de ne the network under consideration.
Construct Once a set of resources is selected, a local network (i.e. a small
neighborhood around a node) is constructed for each resource. The local net-
works are created by querying the Web of Data. Practically, LINK-QA makes
use of either SPARQL endpoints or data les to create the graph surrounding a
resource. In particular, sampling is achieved by rst sending a SPARQL query
to a list of endpoints. If no data is found, LINK-QA falls back on de-referencing
the resource.
Extend This component adds new edges that are provided as input to the
framework. These input edges are added to each local network where they apply.
Once these edges are added we compute a set new local networks around the
Page 3
hidden
III
original set of selected resources. The aim here is to measure the impact of
these new edges on the overall network. This impact assessment is done by the
Compare component.
Analyse Once the original local network and its extended local networks
have been created, an analysis consisting of two parts is performed: First, a
set of metrics m is performed on each node vi within each local network. This
produces a set of metric results mi for each node vi. Then, these results are
aggregated into a distribution.
Currently, the following ve metrics are used: degree, clustering coecient,
the number of open owl:sameAs chains in the local network, centrality and a
measure of the richness of a resource description.
Compare The result coming from both analyses (before and after adding
the new edges) are compared to ideal distributions for the di erent metrics. The
comparison component assesses whether the set of new links globally improves
the quality of the network. Note, that we can also run the framework without
any additional input edges. This would just provide a quality assessment of the
current network as represented by the selected resources.
The implementation of LINK-QA is available as free software on https://
github.com/cgueret/LinkedData-QA. and takes as input a set of resources, in-
formation from the Web of Data (i.e. SPARQL endpoints and/or de-referencable
resources) and a set of new triples to perform quality assessment on. The im-
plementation generates HTML reports for the results of the quality assessment
(see Figure 1).
Fig. 1. Example of report generated. Both this report and the links analysed are avail-
able at https://github.com/cgueret/LinkedData-QA/tree/master/example
Page 4
hidden
IV
3 Experimental Results and Conclusion
The framework is designed to analyse the potential impact of a set of link candi-
dates prior to their publication on the Web of Data. We test the links produced
by one project using state of the art link generation tools. The European project
LOD Around the Clock (LATC) aims to enable the use of the Linked Open Data
cloud for research and business purposes. LATC created and manually checked
a set of reference linking speci cations for the engine Silk [4]. Linking speci -
cations are used by Silk to generate link sets. The speci cations along with the
link sets they produce are publicly available3.
Table 1 shows the results obtained globally for 14 heterogeneous linking spec-
i cations, establishing links among entities from DBpedia, GHO, LinkedCT and
Eunis. Changes are detected for 4 of the metrics roughly 80% of the time.
Degree Clustering sameAs Centrality Description
Red 35:7% (1:87 1:38%) 64:3% (0:21 0:17%) 0:1% (6:67 0%) 71:4% (0:21 0:15%) 0%
Green 42:9% (2:14 1:87%) 14:3% (0:05 0:02%) 0% 0:1% (0:05 0%) 78:6% (71:78 12:60%)
N.A. 21:4% 21:4% 99:9% 28:5% 21:4%
Table 1. Amount of change detected by the metrics reported in terms of status with
respect to the ideal. Red: distance to ideal increased. Green: decreased. N.A.: no change
Links are created across di erent data sets with few established connections.
These changes contribute to obtaining a power-law distribution of the degree
and increasing the overall descriptive richness. From these results, we conclude
that LINK-QA is able to detect global changes related to the addition of a set
of heterogeneous links.
LINK-QA is an extensible framework for performing quality assessment on
the Web of Data. We analysed a set of links provided by well-known link creation
services and showed how the framework can be used to detect change in quality.
Going forward, we aim to develop a live service for running quality assessment
over the whole of the Web of Data on a periodic basis.
Acknowledgements This work was supported by the EU 7th Framework Pro-
gramme within projects LOD2 (GA no. 257943) and LATC (GA no. 256975). The
authors would like to thank Peter Mika for his input.
References
1. Barabasi, A.L.: Linked. (Perseus, Cambridge, Massachusetts) (2002)
2. Bizer, C., Cyganiak, R.: Quality-driven information ltering using the WIQA policy
framework. Web Semantics: Science, Services and Agents on the World Wide Web
7(1), 1{10 (Jan 2009)
3. Niu, X., Wang, H., Wu, G., Qi, G., Yu, Y.: Evaluating the stability and credi-
bility of ontology matching methods. In: 8th Extended Semantic Web Conference
(ESWC2011) (June 2011)
4. Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk: A link discovery framework for
the web of data. In: Bizer, C., Heath, T., Berners-Lee, T., Idehen, K. (eds.) 2nd
Linked Data on the Web Workshop LDOW2009. pp. 1{6. CEUR-WS (2009)
3 https://github.com/LATC/24-7-platform/tree/master/link-specifications

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

3 Readers on Mendeley
by Discipline
 
by Academic Status
 
67% Post Doc
 
33% Ph.D. Student
by Country
 
67% Netherlands
 
33% France

Groups

WoD Analysis