Files are Siles: Extending File Systems with Semantic Annotations
Available from eprints.cs.univie.ac.at
Page 1
Files are Siles: Extending File Systems with Semantic Annotations
Files are Siles: Extending File Systems
with Semantic Annotations?
Bernhard Schandl and Bernhard Haslhofer
University of Vienna, Department of Distributed and Multimedia Systems
fbernhard.schandl,bernhard.haslhoferg@univie.ac.at
Abstract. With the increasing storage capacity of personal computing
devices, the problems of information overload and information fragmen-
tation become apparent on users' desktops. For the Web, semantic tech-
nologies aim at solving this problem by adding a machine-interpretable
information layer on top of existing resources. It has been shown that the
application of these technologies to desktop environments is helpful for
end users. Certain characteristics of the Semantic Web architecture that
are commonly accepted in the Web context, however, are not desirable
for desktops; e.g., incomplete information, broken links, or disruption
of content and annotations. To overcome these limitations we propose
in this paper the sile model, an intermediate data model that combines
characteristics of the Semantic Web and le systems. This model is in-
tended to be a conceptual foundation of the Semantic Desktop, and to
serve as underlying infrastructure on which applications and further ser-
vices, can be built. We present one such service, namely a virtual le
system based on siles, which allows users to semantically annotate les
and directories but at the same time keeps full compatibility to tradi-
tional hierarchical le systems; hence, users can continue to use le-based
applications. We discuss strategies how Semantic Web vocabularies can
be applied for meaningful annotation of les. Further, we present a pro-
totypical implementation of our model and analyze the performance of
typical access operations, both on the le system level as well as on the
metadata level.
1 Introduction
Large amounts of information are stored on personal desktops. We use our per-
sonal computing devices|both mobile and stationary|to communicate, to write
documents, to organize multimedia content, to search for and retrieve informa-
tion, and much more. With the increasing computing and storage power of such
devices, we face the problem of information overload : the amount of data we gen-
erate and consume is permanently increasing, and because of the availability of
cheap storage space, each and every bit of information is stored. Another problem
is even more prevalent on the desktop than on the Web: information fragmen-
tation. Data of dierent kinds are stored in heterogeneous silos, and|contrary
? This paper is an extended version of [31].
with Semantic Annotations?
Bernhard Schandl and Bernhard Haslhofer
University of Vienna, Department of Distributed and Multimedia Systems
fbernhard.schandl,bernhard.haslhoferg@univie.ac.at
Abstract. With the increasing storage capacity of personal computing
devices, the problems of information overload and information fragmen-
tation become apparent on users' desktops. For the Web, semantic tech-
nologies aim at solving this problem by adding a machine-interpretable
information layer on top of existing resources. It has been shown that the
application of these technologies to desktop environments is helpful for
end users. Certain characteristics of the Semantic Web architecture that
are commonly accepted in the Web context, however, are not desirable
for desktops; e.g., incomplete information, broken links, or disruption
of content and annotations. To overcome these limitations we propose
in this paper the sile model, an intermediate data model that combines
characteristics of the Semantic Web and le systems. This model is in-
tended to be a conceptual foundation of the Semantic Desktop, and to
serve as underlying infrastructure on which applications and further ser-
vices, can be built. We present one such service, namely a virtual le
system based on siles, which allows users to semantically annotate les
and directories but at the same time keeps full compatibility to tradi-
tional hierarchical le systems; hence, users can continue to use le-based
applications. We discuss strategies how Semantic Web vocabularies can
be applied for meaningful annotation of les. Further, we present a pro-
totypical implementation of our model and analyze the performance of
typical access operations, both on the le system level as well as on the
metadata level.
1 Introduction
Large amounts of information are stored on personal desktops. We use our per-
sonal computing devices|both mobile and stationary|to communicate, to write
documents, to organize multimedia content, to search for and retrieve informa-
tion, and much more. With the increasing computing and storage power of such
devices, we face the problem of information overload : the amount of data we gen-
erate and consume is permanently increasing, and because of the availability of
cheap storage space, each and every bit of information is stored. Another problem
is even more prevalent on the desktop than on the Web: information fragmen-
tation. Data of dierent kinds are stored in heterogeneous silos, and|contrary
? This paper is an extended version of [31].
Page 2
to the Web, where hyperlinks can be dened between documents and across site
boundaries|there exist only limited means to dene and retrieve relationships
between dierent desktop resources. In the best case such relationships can be
represented using additional infrastructure (e.g., relational databases or specic
applications), but these are usually not tightly integrated with le systems.
The Semantic Web aims to deal with the problems mentioned before by
adding a layer on top of the existing Web infrastructure, wherein descriptions
about web resources are expressed using the Resource Description Framework
(RDF) using commonly accepted vocabularies or ontologies. This allows ma-
chines to interpret the published data and thus helps end users to nd infor-
mation more eciently. A large number of data sets1 and vocabularies2 have
already been published and form a solid data corpus that can be indexed by
(semantic) search engines and serves as foundation for applications.
Recent research in the eld of the Semantic Desktop [6, 17, 20] has shown that
a number of features provided by Semantic Web technologies are also suitable for
the problem of information management on the desktop; especially, the provision
of unied identiers, the ability to represent data in an application-independent
generic format, the
exibility to describe resources using formalized vocabular-
ies, and the possibility to reason over these descriptions. It has also been shown
[28, 13] that the inclusion of semantic technologies on the desktop can signi-
cantly improve the user's perceived quality of personal information management,
especially when they are applied during a longer time period.
However there exist some signicant conceptual dierences between the Web
and the desktop. First, in contrast to the World Wide Web, the desktop already
has a well-established organization metaphor for data: le systems, which have
been in use for decades. In consequence, the vast majority of personal informa-
tion are stored in les, which are organized using hierarchical, labelled collections
(folders or directories) or, to a far more limited extent, using metadata attached
to or encoded within les. Therefore it is crucial for the Semantic Desktop to
smoothly integrate with le systems in a way that allows for the annotation of
les without breaking the behavior of existing desktop applications. A second
major dierence is the handling of broken links. While appearing and disappear-
ing web resources are|to a certain extent|accepted on the Web, users rightfully
expect their data on the desktop to remain consistent over time.
Since the RDF data model exposes a number of shortcomings that may
cause problems for an ecient implementation of the Semantic Desktop (cf.
Section 3), we propose the sile model, a data model that acts as an intermediate
and integrative layer between le systems and Semantic Web technologies. This
model allows users and applications to annotate and interrelate le-like desktop
resources. It is designed as an infrastructure on which applications and services
can be built. One example of such a service, a virtual le system, is presented in
this paper. Through this virtual le representation, the sile model can be used
1
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets
2
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularies
2
boundaries|there exist only limited means to dene and retrieve relationships
between dierent desktop resources. In the best case such relationships can be
represented using additional infrastructure (e.g., relational databases or specic
applications), but these are usually not tightly integrated with le systems.
The Semantic Web aims to deal with the problems mentioned before by
adding a layer on top of the existing Web infrastructure, wherein descriptions
about web resources are expressed using the Resource Description Framework
(RDF) using commonly accepted vocabularies or ontologies. This allows ma-
chines to interpret the published data and thus helps end users to nd infor-
mation more eciently. A large number of data sets1 and vocabularies2 have
already been published and form a solid data corpus that can be indexed by
(semantic) search engines and serves as foundation for applications.
Recent research in the eld of the Semantic Desktop [6, 17, 20] has shown that
a number of features provided by Semantic Web technologies are also suitable for
the problem of information management on the desktop; especially, the provision
of unied identiers, the ability to represent data in an application-independent
generic format, the
exibility to describe resources using formalized vocabular-
ies, and the possibility to reason over these descriptions. It has also been shown
[28, 13] that the inclusion of semantic technologies on the desktop can signi-
cantly improve the user's perceived quality of personal information management,
especially when they are applied during a longer time period.
However there exist some signicant conceptual dierences between the Web
and the desktop. First, in contrast to the World Wide Web, the desktop already
has a well-established organization metaphor for data: le systems, which have
been in use for decades. In consequence, the vast majority of personal informa-
tion are stored in les, which are organized using hierarchical, labelled collections
(folders or directories) or, to a far more limited extent, using metadata attached
to or encoded within les. Therefore it is crucial for the Semantic Desktop to
smoothly integrate with le systems in a way that allows for the annotation of
les without breaking the behavior of existing desktop applications. A second
major dierence is the handling of broken links. While appearing and disappear-
ing web resources are|to a certain extent|accepted on the Web, users rightfully
expect their data on the desktop to remain consistent over time.
Since the RDF data model exposes a number of shortcomings that may
cause problems for an ecient implementation of the Semantic Desktop (cf.
Section 3), we propose the sile model, a data model that acts as an intermediate
and integrative layer between le systems and Semantic Web technologies. This
model allows users and applications to annotate and interrelate le-like desktop
resources. It is designed as an infrastructure on which applications and services
can be built. One example of such a service, a virtual le system, is presented in
this paper. Through this virtual le representation, the sile model can be used
1
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets
2
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularies
2
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
1 Reader on Mendeley
by Discipline
by Academic Status
100% Ph.D. Student
by Country
100% Brazil



