Sign up & Download
Sign in

Why Linked Data is Not Enough for Scientists

by Sean Bechhofer, John Ainsworth, Jitenkumar Bhagat, Iain Buchan, Phillip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Carole Goble, Danius Michaelides, Paolo Missier, Stuart Owen, David Newman, David De Roure, Shoaib Sufi show all authors
Future Generation Computer Systems (2010)

Abstract

AbstractScientific data stands to represent a significant portion of the linked open data cloud and science itself stands to benefit from the data fusion capability that this will afford. However, simply publishing linked data into the cloud does not necessarily meet the requirements of reuse. Publishing has requirements of provenance, quality, credit, attribution, methods in order to provide the reproducibility that allows validation of results. In this paper we make the case for a scientific data publication model on top of linked data and introduce the notion of Research Objects as first class citizens for sharing and publishing.

Cite this document (BETA)

Available from eprints.soton.ac.uk
Page 1
hidden

Why Linked Data is Not Enough for Scientists

Why Linked Data is Not Enough for Scientists
Sean Bechhofer1, John Ainsworth2, Jiten Bhagat1, Iain Buchan2, Philip Couch2,
Don Cruickshank3, David De Roure3;4, Mark Delderfield2, Ian Dunlop1,
Matthew Gamble1, Carole Goble1, Danius Michaelides3,
Paolo Missier1, Stuart Owen1, David Newman3, Shoaib Sufi1
1School of Computer Science, University of Manchester, UK
2School of Medicine, University of Manchester, UK
3School of Electronics and Computer Science, University of Southampton, UK
4Oxford e-Research Centre, University of Oxford, UK
Abstract—Scientific data stands to represent a significant
portion of the linked open data cloud and science itself stands
to benefit from the data fusion capability that this will afford.
However, simply publishing linked data into the cloud does
not necessarily meet the requirements of reuse. Publishing has
requirements of provenance, quality, credit, attribution, methods
in order to provide the reproducibility that allows validation of
results. In this paper we make the case for a scientific data
publication model on top of linked data and introduce the
notion of Research Objects as first class citizens for sharing and
publishing.
I. INTRODUCTION
Changes are occurring in the ways in which scientific
research is conducted. Within wholly digital environments,
methods such as scientific workflows, research protocols,
standard operating procedures and algorithms for analysis or
simulation are used to manipulate and produce data. Ex-
perimental or observational data and scientific models are
typically “born digital” with no physical counterpart. This
move to digital content is driving a sea-change in scientific
publication, and challenging traditional scholarly publication.
Shifts in dissemination mechanisms are thus leading towards
increasing use of electronic publication methods. Traditional
paper publications are, in the main linear and human (rather
than machine) readable. A simple move from paper-based to
electronic publication does not, however, necessarily make a
scientific output decomposable. Nor does it guarantee that
outputs, results or methods are reusable.
Current scientific knowledge management serves society
poorly where for example, the time to get new knowledge
into practice can be more than a decade. The models used to
support medical decisions are not dynamically linked to the
body of knowledge that defines best practice. More than half of
the effects of medical treatments cannot be predicted from the
literature, because trials exclude women of child bearing age,
people with other diseases or on other medications. Doctors
audit the outcomes of their treatments using research methods
yet the results are not captured and put back into medical
research for the benefit of society [1].
As an example from the medical field, there are multiple
studies relating sleep patterns to work performance, each study
has a slightly different design, and there is disagreement in
reviews as to whether or not the overall message separates out
cause from effect. Ideally the study-data, context information,
and modelling methods would be extracted from each paper
and put together in a larger model - not just a review of
summary data. To do this well is intellectually harder than
running a primary study – one that measures things directly.
This need for broad-ranging “meta-science” and not just deep
“mega-science” is shared by many domains of research, not
just medicine.
Studies continue to show that research in all fields is
increasingly collaborative [2]. Most scientific and engineering
domains would benefit from being able to “borrow strength”
from the outputs of other research, not only in information to
reason over but also in data to incorporate in the modelling task
at hand. We thus see a need for a framework that facilitates
the reuse and exchange of digital knowledge. Linked Data [3]
provides a compelling approach to dissemination of scientific
data for reuse. However, simply publishing data out of context
would fail to respect research methodology nor would it
respect the flow of rights and reputation of the researcher.
Scientific practice is based on publication of results being
associated with provenance to aid interpretation and trust, and
description of methods to support reproducibility.
In this paper, we discuss the notion of Research Objects,
semantically rich aggregations of resources that provide the
“units of knowledge” which supply structure for delivery of
information as Linked Data. A Research Object (RO) provides
a container for a principled aggregation of resources, produced
and consumed by common services and shareable within and
across organisational boundaries. An RO bundles together es-
sential information relating to experiments and investigations.
This includes not only the data used, and methods employed
to produce and analyse that data, but also the people involved
in the investigation. In the following sections, we look at the
motivation for linking up science, consider scientific practice
and look to three case studies to inform our discussion. Based
on this, we identify principles of ROs and map this to a set
of features. We discuss the implementation of ROs in the
2010 Sixth IEEE International Conference on e–Science
978-0-7695-4290-4/10 $26.00 © 2010 IEEE
DOI 10.1109/eScience.2010.21
300

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

69 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
22% Ph.D. Student
 
14% Other Professional
 
13% Researcher (at an Academic Institution)
by Country
 
25% United States
 
14% Germany
 
9% United Kingdom