Sign up & Download
Sign in

Files are Siles: Extending File Systems with Semantic Annotations

by Bernhard Schandl
International Journal on Semantic (2010)

Cite this document (BETA)

Available from eprints.cs.univie.ac.at
Page 1
hidden

Files are Siles: Extending File Systems with Semantic Annotations

Files are Siles: Extending File Systems
with Semantic Annotations?
Bernhard Schandl and Bernhard Haslhofer
University of Vienna, Department of Distributed and Multimedia Systems
fbernhard.schandl,bernhard.haslhoferg@univie.ac.at
Abstract. With the increasing storage capacity of personal computing
devices, the problems of information overload and information fragmen-
tation become apparent on users' desktops. For the Web, semantic tech-
nologies aim at solving this problem by adding a machine-interpretable
information layer on top of existing resources. It has been shown that the
application of these technologies to desktop environments is helpful for
end users. Certain characteristics of the Semantic Web architecture that
are commonly accepted in the Web context, however, are not desirable
for desktops; e.g., incomplete information, broken links, or disruption
of content and annotations. To overcome these limitations we propose
in this paper the sile model, an intermediate data model that combines
characteristics of the Semantic Web and le systems. This model is in-
tended to be a conceptual foundation of the Semantic Desktop, and to
serve as underlying infrastructure on which applications and further ser-
vices, can be built. We present one such service, namely a virtual le
system based on siles, which allows users to semantically annotate les
and directories but at the same time keeps full compatibility to tradi-
tional hierarchical le systems; hence, users can continue to use le-based
applications. We discuss strategies how Semantic Web vocabularies can
be applied for meaningful annotation of les. Further, we present a pro-
totypical implementation of our model and analyze the performance of
typical access operations, both on the le system level as well as on the
metadata level.
1 Introduction
Large amounts of information are stored on personal desktops. We use our per-
sonal computing devices|both mobile and stationary|to communicate, to write
documents, to organize multimedia content, to search for and retrieve informa-
tion, and much more. With the increasing computing and storage power of such
devices, we face the problem of information overload : the amount of data we gen-
erate and consume is permanently increasing, and because of the availability of
cheap storage space, each and every bit of information is stored. Another problem
is even more prevalent on the desktop than on the Web: information fragmen-
tation. Data of di erent kinds are stored in heterogeneous silos, and|contrary
? This paper is an extended version of [31].
Page 2
hidden
to the Web, where hyperlinks can be de ned between documents and across site
boundaries|there exist only limited means to de ne and retrieve relationships
between di erent desktop resources. In the best case such relationships can be
represented using additional infrastructure (e.g., relational databases or speci c
applications), but these are usually not tightly integrated with le systems.
The Semantic Web aims to deal with the problems mentioned before by
adding a layer on top of the existing Web infrastructure, wherein descriptions
about web resources are expressed using the Resource Description Framework
(RDF) using commonly accepted vocabularies or ontologies. This allows ma-
chines to interpret the published data and thus helps end users to nd infor-
mation more eciently. A large number of data sets1 and vocabularies2 have
already been published and form a solid data corpus that can be indexed by
(semantic) search engines and serves as foundation for applications.
Recent research in the eld of the Semantic Desktop [6, 17, 20] has shown that
a number of features provided by Semantic Web technologies are also suitable for
the problem of information management on the desktop; especially, the provision
of uni ed identi ers, the ability to represent data in an application-independent
generic format, the
exibility to describe resources using formalized vocabular-
ies, and the possibility to reason over these descriptions. It has also been shown
[28, 13] that the inclusion of semantic technologies on the desktop can signi -
cantly improve the user's perceived quality of personal information management,
especially when they are applied during a longer time period.
However there exist some signi cant conceptual di erences between the Web
and the desktop. First, in contrast to the World Wide Web, the desktop already
has a well-established organization metaphor for data: le systems, which have
been in use for decades. In consequence, the vast majority of personal informa-
tion are stored in les, which are organized using hierarchical, labelled collections
(folders or directories) or, to a far more limited extent, using metadata attached
to or encoded within les. Therefore it is crucial for the Semantic Desktop to
smoothly integrate with le systems in a way that allows for the annotation of
les without breaking the behavior of existing desktop applications. A second
major di erence is the handling of broken links. While appearing and disappear-
ing web resources are|to a certain extent|accepted on the Web, users rightfully
expect their data on the desktop to remain consistent over time.
Since the RDF data model exposes a number of shortcomings that may
cause problems for an ecient implementation of the Semantic Desktop (cf.
Section 3), we propose the sile model, a data model that acts as an intermediate
and integrative layer between le systems and Semantic Web technologies. This
model allows users and applications to annotate and interrelate le-like desktop
resources. It is designed as an infrastructure on which applications and services
can be built. One example of such a service, a virtual le system, is presented in
this paper. Through this virtual le representation, the sile model can be used
1
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets
2
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularies
2

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

1 Reader on Mendeley
by Discipline
 
by Academic Status
 
100% Ph.D. Student
by Country
 
100% Brazil