Sign up & Download
Sign in

AUTOMATIC DOCUMENT-LEVEL SEMANTIC METADATA ANNOTATION USING FOLKSONOMIES AND DOMAIN ONTOLOGIES

by Hend S Al-Khalifa
ACM SIGWEB Newsletter (2007)

Abstract

The last few years have witnessed a fast growth of the concept of Social Software. Be it video sharing such as YouTube, photo sharing such as Flickr, community building such as MySpace, or social bookmarking such as del.icio.us. These websites contain valuable user-generated metadata called folksonomies. Folksonomies are ad hoc, light-weight knowledge representation artefacts to describe web resources using peoples own vocabulary. The cheap metadata contained in such websites presents potential opportunities for us (researchers) to benefit from. This thesis presents a novel tool that uses folksonomies to automatically generate metadata with educational semantics in an attempt to provide semantic annotations to bookmarked web resources, and to help in making the vision of the Semantic Web a reality. The tool comprises two components: the tags normalisation process and the semantic annotation process. The tool uses the del.icio.us social bookmarking service as a source for folksonomy tags. The tool was applied to a case study consisting of a framework for evaluating the usefulness of the generated semantic metadata within the context of a particular eLearning application. This implementation of the tool was evaluated over three dimensions: the quality, the searchability and the representativeness of the generated semantic metadata. The results show that folksonomy tags were acceptable for creating semantic metadata. Moreover, folksonomy tags showed the power of aggregating peoples intelligence. The novel contribution of this work is the design of a tool that utilises folksonomy tags to automatically generate metadata with fine gained and extra educational semantics.

Cite this document (BETA)

Available from eprints.ecs.soton.ac.uk
Page 1
hidden

AUTOMATIC DOCUMENT-LEVEL SEMANTIC METADATA ANNOTATION USING FOLKSONOMIES AND DOMAIN ONTOLOGIES


AUTOMATIC DOCUMENT-LEVEL SEMANTIC METADATA
ANNOTATION USING FOLKSONOMIES AND DOMAIN
ONTOLOGIES

By
Hend S. Al-Khalifa











A thesis submitted for the degree of Doctor of Philosophy







In the
Faculty of Engineering and Applied Science
School of Electronics and Computer Science,
University of Southampton,
United Kingdom.

June, 2007
Page 2
hidden
ii

UNIVERSITY OF SOUTHAMPTON

ABSTRACT

FACULTY OF ENGINEERING AND APPLIED SCIENCE
SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE

Doctor of Philosophy
AUTOMATIC DOCUMENT-LEVEL SEMANTIC METADATA ANNOTATION
USING FOLKSONOMIES AND DOMAIN ONTOLOGIES
by Hend S. Al-Khalifa

The last few years have witnessed a fast growth of the concept of Social Software.
Be it video sharing such as YouTube, photo sharing such as Flickr, community
building such as MySpace, or social bookmarking such as del.icio.us. These websites
contain valuable user-generated metadata called folksonomies. Folksonomies are ad
hoc, light-weight knowledge representation artefacts to describe web resources using
people’s own vocabulary. The cheap metadata contained in such websites presents
potential opportunities for us (researchers) to benefit from.

This thesis presents a novel tool that uses folksonomies to automatically generate
metadata with educational semantics in an attempt to provide semantic annotations to
bookmarked web resources, and to help in making the vision of the Semantic Web a
reality. The tool comprises two components: the tags normalisation process and the
semantic annotation process. The tool uses the del.icio.us social bookmarking service
as a source for folksonomy tags.

The tool was applied to a case study consisting of a framework for evaluating the
usefulness of the generated semantic metadata within the context of a particular
eLearning application. This implementation of the tool was evaluated over three
dimensions: the quality, the searchability and the representativeness of the generated
semantic metadata. The results show that folksonomy tags were acceptable for
creating semantic metadata. Moreover, folksonomy tags showed the power of
aggregating people’s intelligence.

The novel contribution of this work is the design of a tool that utilises folksonomy
tags to automatically generate metadata with fine gained and extra educational
semantics.
Page 3
hidden
iii
Contents
Chapter 1 Introduction.........................................................................................1
1.1 Research Overview ........................................................................................1
1.2 Significance of the Research..........................................................................3
1.3 Research Hypotheses .....................................................................................3
1.4 Research Scope ..............................................................................................4
1.5 Contributions..................................................................................................5
1.6 Outline of the Research chapters....................................................................6
1.7 Declaration .....................................................................................................8
Chapter 2 Metadata and Learning Objects ........................................................9
2.1 Introduction ....................................................................................................9
2.2 Metadata Types ............................................................................................10
2.3 Metadata Principles......................................................................................11
2.4 Metadata Purposes and Applications ...........................................................12
2.5 How Is Metadata Generated/Created? .........................................................12
2.6 Metadata and the Semantic Web..................................................................13
2.7 Educational Metadata...................................................................................14
2.8 Application Profiles .....................................................................................15
2.9 Issues Associated with Educational Metadata .............................................17
2.10 Learning Objects ..........................................................................................18
2.11 Taxonomy of Learning Objects Types.........................................................19
2.11.1 Level of Granularity for Learning Objects.......................................20
2.12 Learning Objects and the Semantic Web.....................................................20
2.13 Chapter Summary.........................................................................................20
Chapter 3 Collaborative Tagging ......................................................................21
3.1 What is tagging?...........................................................................................21
3.1.1 Why do people tag?..........................................................................22
3.1.2 Folksonomy: A Definition ...............................................................24
3.2 Folksonomy Types .......................................................................................25
3.3 Folksonomies: Pros and Cons ......................................................................26
3.4 Folksonomy and Taxonomy.........................................................................27
3.5 Folksonomy and the Semantic Web.............................................................27
Page 4
hidden
iv
3.6 State-of-the-Art Folksonomy Research........................................................28
3.6.1 Research ...........................................................................................28
3.6.2 Workshops .......................................................................................34
3.6.3 Case Studies .....................................................................................35
3.6.4 Thesis ...............................................................................................36
3.6.5 Discussion ........................................................................................36
3.7 Chapter Summary.........................................................................................37
Chapter 4 Social Bookmarking Services ...........................................................38
4.1 The del.icio.us Social Bookmarking Service ...............................................39
4.1.1 The del.icio.us Data Model ..............................................................40
4.1.2 Anatomy of the del.icio.us Tags ......................................................43
4.1.3 Users’ Patterns in del.icio.us for a Domain of Interest ....................45
4.1.4 Social Bookmarking Services versus Search Engines .....................46
4.2 Chapter Summary.........................................................................................48
Chapter 5 The Semantic Web and Ontologies in Education...........................49
5.1 Ontologies: Definition and Design Principles..............................................50
5.2 Types of Ontologies .....................................................................................51
5.3 Ontology Languages ....................................................................................53
5.3.1 XML/DTD/XML Schema................................................................54
5.3.2 RDF/RDFS.......................................................................................55
5.3.3 OWL.................................................................................................56
5.4 Building Ontologies .....................................................................................57
5.4.1 Existing Ontologies on the Web ......................................................58
5.5 Ontologies in Education...............................................................................59
5.6 Ontology Applications in Education............................................................59
5.7 Chapter Summary.........................................................................................60
Chapter 6 Semantic Metadata Annotation .......................................................61
6.1 What is Semantics? ......................................................................................61
6.2 What is Annotation?.....................................................................................62
6.3 What is Semantic Metadata Annotation?.....................................................63
6.3.1 Categories and Levels of Semantic Annotation ...............................64
6.4 Semantic Annotation Research ....................................................................64
6.4.1 Platform Classification.....................................................................65
6.4.2 Semantic Annotation Frameworks...................................................66
Page 5
hidden
v
6.4.3 Semantic Annotation Tools..............................................................67
6.5 Semantic Annotation Tools for eLearning...................................................72
6.5.1 Annotation Goals in Education ........................................................73
6.6 Discussion ....................................................................................................74
6.7 Chapter Summary.........................................................................................75
Chapter 7 Exploring the Value of Folksonomies..............................................76
7.1 Related Work ...............................................................................................77
7.1.1 Discussion ........................................................................................78
7.2 Experiment Setup and Test Data..................................................................79
7.3 The Comparison System Framework...........................................................80
7.4 Data Selection ..............................................................................................81
7.5 Other General Heuristics..............................................................................82
7.6 Results ..........................................................................................................83
7.6.1 Phase 1 .............................................................................................83
7.6.2 Phase 2 .............................................................................................86
7.6.3 Phase 3 .............................................................................................92
7.6.4 Phase 4 .............................................................................................94
7.7 Discussion ....................................................................................................95
7.8 Chapter Summary.........................................................................................96
Chapter 8 The FolksAnnotation Tool System Architecture............................98
8.1 Tags Extraction and Normalisation..............................................................99
8.1.1 Tags Sense Disambiguation Module..............................................102
8.1.2 Related Work in Tags Disambiguation ..........................................106
8.2 Semantic Annotation Pipeline....................................................................107
8.2.1 Inference Module ...........................................................................108
8.3 General Heuristic and the Resultant Semantic Metadata...........................111
8.4 Development Tools ....................................................................................114
8.5 Chapter Summary.......................................................................................114
Chapter 9 Domain Ontologies and Semantic Metadata ................................115
9.1 Introduction ................................................................................................115
9.2 Ontology Building......................................................................................116
9.2.1 Web Design Ontology....................................................................118
9.2.2 CSS Ontology ................................................................................120
9.2.3 Resource Type Ontology................................................................122
Page 6
hidden
vi
9.2.4 Ontology instances .........................................................................124
9.3 The Semantic Metadata..............................................................................125
9.3.1 The Metadata Elements..................................................................128
9.3.2 The Generated Elements ................................................................129
9.4 Chapter Summary.......................................................................................133
Chapter 10 Evaluation, Analysis and Discussion .............................................134
10.1 Descriptive Statistics..................................................................................135
10.2 Metadata Searchability Evaluation ............................................................136
10.2.1 Browsing & Querying ....................................................................137
10.2.2 Semantic Search .............................................................................141
10.3 Metadata Assignment Evaluation ..............................................................154
10.3.1 Metadata Representativeness .........................................................158
10.3.2 Metadata Quality and Validity.......................................................162
10.3.3 Discussion ......................................................................................171
10.4 Further Evaluations ....................................................................................172
10.4.1 Analysis of Unused Tags ...............................................................172
10.4.2 Folksonomy vs. Automatic Keyword Extraction Assignment.......178
10.4.3 Discussion ......................................................................................179
10.4.4 Niche Tags and The Long Tail.......................................................180
10.5 Evaluation Summary..................................................................................182
10.6 Chapter Summary.......................................................................................182
Chapter 11 Related Work...................................................................................183
11.1 Standard Metadata Techniques ..................................................................183
11.2 Semantic Metadata Techniques..................................................................184
11.3 Folksonomic metadata techniques .............................................................185
11.4 Discussion ..................................................................................................186
11.5 Chapter Summary.......................................................................................187
Chapter 12 Conclusion and Further Work.......................................................188
12.1 Research Justification.................................................................................188
12.2 Research Findings ......................................................................................189
12.3 Further work...............................................................................................190
12.3.1 Tool Enhancements........................................................................190
12.3.2 Metadata Descriptors Expansion and Enhancement ......................191
12.3.3 Ontologies Expansion ....................................................................191
Page 7
hidden
vii
12.3.4 Improving the Normalisation Pipeline...........................................192
12.3.5 Improving the Semantic Annotation Pipeline................................193
12.3.6 Further Evaluation Factors.............................................................194
12.4 Future Research Directions ........................................................................195
12.4.1 Personalisation, Adaptation and Recommender Systems ..............195
12.4.2 Web Services..................................................................................198
12.5 Conclusion .................................................................................................199
Appendix A. Metadata Questionnaire .....................................................................201
Appendix B. CSS Ontology .......................................................................................210
Appendix C. Web Design Ontology..........................................................................228
Appendix D. Resource Type Ontology.....................................................................234
References 244

Page 8
hidden
viii
List of Figures
Figure 1.1: The research scope..................................................................................... 4
Figure 3.1: The four regions for the Motivation of Tagging alongside some examples
of services that satisfy that motive [by (Hammond et al., 2005)] ...................... 23
Figure 3.2: The illustration on the left depicts broad folksonomy while the one on the
right depicts narrow folksonomy [by Vander Wal, 2005] ................................. 25
Figure 4.1: Excerpt from the del.icio.us service showing the tags (Blogs, internet, ...
,cool) for the URL of the article by Jonathan J. Harris, the last bookmarker
(pacoc, 3mins ago) and the number of people who bookmarked this URL (1494
other people)....................................................................................................... 39
Figure 4.2: The relation between the three del.icio.us components........................... 40
Figure 4.3: A Screenshot showing the ‘common tags’ portion.................................. 41
Figure 4.4: A Screenshot showing the ‘suggestions’ field and the ‘tags’ box........... 42
Figure 4.5: A Screenshot showing the ‘related tags’ portion..................................... 42
Figure 4.6: An excerpt from the del.icio.us service, showing that the resource
“jQuery: New Wave Javascript” is about ‘Programming’ in ‘Ajax’, and the
person who bookmarked it will going to read it ‘toRead’ and (s)he describe how
useful the resource was (‘Cool’) and defines the type of the resource as being a
‘library’ .............................................................................................................. 44
Figure 4.7: Set of Tags assigned to a website with the title “Layout-o-matic” ......... 46
Figure 5.1: The Semantic Web Language Layer Cake [Berners-Lee, 2000]............. 53
Figure 5.2: XML Snippet. .......................................................................................... 54
Figure 5.3: RDF serialization using XML. ................................................................ 55
Figure 5.4: RDF in N3. .............................................................................................. 55
Figure 5.5: RDF as a binary predicate. ...................................................................... 55
Figure 5.6: RDF as a graph. ....................................................................................... 55
Figure 6.1: Microformats diagram [by Microformats.org] ........................................ 71
Figure 7.1: The Comparison System Framework. ..................................................... 81
Figure 7.2: A visualization of the categorization results for the 10 web resources
layered on top of each other shaping a ghost effect, (a) corresponds to the results
of the first indexer (b) corresponds to the results of the second indexer. .......... 89
Figure 7.3: Histogram of the Percentage of Overlap (PoL) for 100 websites............ 93
Page 9
hidden
ix
Figure 7.4: A Venn diagram that shows Folksonomy (F), Yahoo TE (K) and the
human indexer (I) sets as three distinct circles and highlights the percentage of
the overlap between the three sets...................................................................... 95
Figure 8.1: Overview of the system illustrating the interplay of the different
components ........................................................................................................ 99
Figure 8.2: A screenshot of the finished normalization process for a bookmarked web
resource ............................................................................................................ 101
Figure 8.3: A screenshot showing the appearance of the ‘list’ tag in “Listamatic:
Rollover horizontal list” web resource............................................................. 104
Figure 8.4: A schema that depicts the semantic relationships between the ‘list’
instance and its neighbouring instances in the CSS ontology.......................... 104
Figure 8.5: A screenshot showing the appearance of the ‘list’ tag in “CSS - Contents
and compatibility” web resource...................................................................... 106
Figure 8.6: Pseudocode for the process of the semantic annotation. ....................... 107
Figure 8.7: The Reasoning rules pipeline ................................................................ 108
Figure 8.8: Level 1 reasoning rules excerpt for the difficulty level descriptor........ 109
Figure 8.9: Level 1 reasoning rules excerpt for instructional level descriptor......... 109
Figure 8.10: The folksonomy list for the ‘Nifty Corners’ web resource (date accessed
31-January-2007 @2:00 PM)........................................................................... 110
Figure 8.11: The pedagogical rules Editor............................................................... 111
Figure 8.12: A list of tags for a website about Drag and Drop method, notice the
position of the CSS tag in the list..................................................................... 112
Figure 8.13: A list of tags for a website about CSS, notice the position of the CSS tag
in the list........................................................................................................... 112
Figure 8.14: The generated RDF Semantic metadata for the ‘Nifty Corners’ web
resource. ........................................................................................................... 113
Figure 9.1: ‘is-a’ diagram showing the hierarchical relationship between the main
concepts in the Web Design domain................................................................ 118
Figure 9.2: ‘is-a’ diagram showing the hierarchical relationships between the
concepts in the CSS domain............................................................................. 120
Figure 9.3: Resource Type ontology........................................................................ 123
Figure 9.4: An Excerpt of the RDF Graph used to describe a CSS web resource. .. 125
Figure 9.5: A screen shot showing the different inappropriate descriptions applied by
the del.icio.us users for the “CSS tests and experiments” web resource ......... 129
Figure 9.6: Template (a) and example (b) of the dc:description element. ............... 130
Page 10
hidden
x
Figure 10.1: The Evaluation Framework ................................................................. 135
Figure 10.2: Browsing the CSS Ontology ............................................................... 138
Figure 10.3: Retrieved results after selecting the context "menu" to search the CSS
knowledge base ................................................................................................ 138
Figure 10.4: Browsing the Resource Type ontology ............................................... 139
Figure 10.5: Query filters selection.......................................................................... 140
Figure 10.6: Query form builds up........................................................................... 140
Figure 10.7: Query results after entering "menu" as an application and choosing
"easy" as difficulty level .................................................................................. 141
Figure 10.8: The Recall (A), Precision (B) and F-Measure (C) of ontology-based
folksonomy search against folksonomy search alone. The performance of both
techniques is shown for three different queries 1, 2 and 3............................... 144
Figure 10.9: The Recall (A), Precision (B) and F-Measure (C) of folksonomy search
results against the human expert search results. The performance of both
techniques is shown for eight different queries. .............................................. 150
Figure 10.10: Distribution of the Web designers’ group based on their professional
role.................................................................................................................... 156
Figure 10.11: Distribution of the specialists group based on their professional role
.......................................................................................................................... 157
Figure 10.12: Classification of unused tags ............................................................. 173
Figure 10.13: The long tail, colored in yellow [Wikipedia, 2007] .......................... 180
Figure 10.14: The Long Tail shape for the mapped tags used to semantically annotate
the “What Are CSS Sprites? > A Quick Example: Button Rollovers” web
resource ............................................................................................................ 181
Figure 12.1: The enhanced normalisation pipeline .................................................. 193
Figure 12.2: The enhanced semantic annotation pipeline........................................ 194
Page 11
hidden
xi
List of Tables
Table 2.1: Major educational metadata application profiles [from (Qin and
Hernández., 2006)]............................................................................................. 16
Table 7.1: Topics covered in the experiment data set ................................................ 82
Table 7.2: Average Inter-Rater agreement for the ten evaluated web resources in
phase 1................................................................................................................ 84
Table 7.3: The average mode values for each website in both Folksonomy (F) and
Yahoo TE (K) set along with the mean, mode and standard deviation for all 10
evaluated websites.............................................................................................. 85
Table 7.4: Average Inter-Rater agreement for the ten evaluated web resources in
phase 2................................................................................................................ 87
Table 7.5: The average mode values for each website in both Folksonomy (F) and
Yahoo TE (K) set along with the mean, mode and standard deviation for all 10
evaluated websites.............................................................................................. 88
Table 8.1: Tags used to annotate a sample web resource stored in the del.icio.us
service (before normalization) ......................................................................... 101
Table 8.2: Tags after applying the normalization process. ...................................... 101
Table 8.3: The Semantic Matrix for the ‘list’ instance, the row headings represents
the ambiguous word while the columns headings represent the neighbour
instances in the ontology.................................................................................. 103
Table 9.1: Web Design Ontology Concepts............................................................. 119
Table 9.2: Properties of the Web Design ontology.................................................. 120
Table 9.3: CSS Ontology Concepts ......................................................................... 122
Table 9.4: Properties of the CSS ontology............................................................... 122
Table 9.5: Instances samples from Web Design Ontology, CSS ontology and
Resource Type Ontology.................................................................................. 124
Table 9.6: LOM descriptors used in the CSS Semantic Metadata........................... 126
Table 9.7: Extra descriptors used with the CSS semantic metadata ........................ 127
Table 9.8: Specific CSS descriptors with their RDF binding .................................. 128
Table 9.9: The result of the computed recommendation value for four examples .. 132
Page 13
hidden
xiii
Table 10.20: Overall evaluation of the Application element for the specialists group
.......................................................................................................................... 168
Table 10.21: Overall evaluation of the Technique element for the Web Designers
group ................................................................................................................ 169
Table 10.22: Overall evaluation of the Technique element for the specialists group
.......................................................................................................................... 169
Table 10.23: Overall evaluation of the Property element for the Web Designers group
.......................................................................................................................... 169
Table 10.24: Overall evaluation of the Property element for the specialists group. 170
Table 10.25: Overall evaluation of the Element, Layout and Selector descriptors for
the Web Designers group ................................................................................. 170
Table 10.26: Overall evaluation of the Element, Layout and Selector descriptors for
the specialists group ......................................................................................... 171
Table 10.27: Examples of patterns in people tags.................................................... 174
Table 11.1: A Summary of automatic metadata generation in the eLearning domain
.......................................................................................................................... 186
Page 15
hidden
xv
Definitions and Abbreviations Used
ARIADNE Alliance of Remote Instructional Authoring and Distribution
Networks for Europe
AICC Aviation Industry CBT Committee
ADL Advanced Distributed Learning Initiative
API Application Programming Interface – a software interface that
allows web applications to exchange data
CSS Cascading Style Sheets
DC Dublin Core
FOAF Friend Of A Friend
MERLOT Multimedia Educational Resources for Learning and Online
Training
LOM Learning Object Metadata
IMS Instructional Management Systems
IEEE Institute of Electrical and Electronic Engineering
RDF Resource Description Framework
RSS Real Simple Syndication
OWL Ontology Web Language
Folksonomy Is a blend of the words Folks + Taxonomy, which is a
neologism for a practice of collaborative categorisation using
freely chosen keywords.
Social Software Let people connect or collaborate by use of a computer
network.
Web 2.0 A term often applied to a perceived ongoing transition of the
World Wide Web from a collection of websites to a full-
fledged computing platform serving web applications to end
users.

Metadata Elements, Fields and Descriptors are terms used interchangeably
throughout this thesis to mean the same thing.
Page 16
hidden
1
Chapter 1
Introduction
1.1 Research Overview
Metadata standards are used in many areas such as: library science, database systems
and file systems. They can be defined as formal specifications used to semantically
annotate electronic materials of any kind. They have been developed to support both
machine interoperability (information exchange) and resource discovery by human
users (Stratakis et al., 2003).

The importance of metadata has also evolved to include the domain of the Semantic
Web. At the heart of the Semantic Web is the idea of adding formal metadata that
describes the content, context and/or structure of a web resource (Berners-Lee et al.,
2001).

Metadata are also used in the educational domain to describe learning materials (see
chapter 2). There are two widely accepted metadata standards in education (Stratakis
et al., 2003), namely:
1. DC (Dublin Core) educational version, and
2. IEEE-LOM (Institute of Electrical and Electronic Engineers/Learning Object
Metadata).

Most eLearning developers do not adhere strictly to these standards, but prefer to use
“application profiles” which more accurately reflect their application’s metadata
needs.
Page 17
hidden
2
Duval et al. (2006) have defined application profiles as “… mixing and matching
metadata elements, in order to meet specific requirements for a particular context”.
Examples of application profiles include CanCore1, UK LOM2 and ARIADNE3.

To utilize application profiles, their elements need to be populated with appropriate
descriptors. This brings us back to the main dilemmas of creating standard metadata,
which are: the number of fields to be filled and the amount of time required to fill
them.

A possible solution is “Electronic Forms Must Die” (Duval, 2004), Duval’s famous
slogan to evangelize the automation of metadata creation. Erik Duval, a well-known
member in IEEE-LOM standardisation board, has realized the need for more
automated process to create metadata so that the burden of creation can be alleviated
by machines.

Despite Duval’s vision of metadata automation, it is not possible within the existing
standards to represent sufficiently fine grained semantic information about learning
resources, which would allow the selection of appropriate learning materials from a
number of resources within some domain. This drives the researcher to the use of
semantic metadata techniques that employ ontologies to generate specific domain
semantics.

Therefore, to remove the burden of metadata generation and to generate semantic
metadata that handles particular domain semantics, the researcher proposes the use of
folksonomies.

Folksonomies, as one of Web 2.0 signatures, are considered a free source of
unstructured metadata. They can reveal a lot about a web resource subject, its type
and possible applications. Social bookmarking services such as del.icio.us4 are by
definition good sources of folksonomies.

1 http://www.cancore.org [last accessed 21/2/2007]
2 http://www.cetis.ac.uk/profiles/uklomcore [last accessed 21/2/2007]
3 http://www.ariadne-eu.org/ [last accessed 21/2/2007]
4 http://del.icio.us [last accessed 21/2/2007]
Page 18
hidden
3

The problem of metadata granularity and the need for automating the process of
metadata generation are two important issues that led to the idea of using
folksonomies in the process of creating semantic metadata. This realization can be
exploited using the power of semantic metadata representations.

This thesis shows that folksonomies contain “good enough” indexing words that can
create semantic metadata with added value. As Peterson (2006) said "The overall
usefulness of folksonomies is not called into question; just how they can be refined
without losing the openness that makes them so popular". In this work, rather than
attempting to refine the tagging process, the researcher has taken the open
vocabulary tags and mapped them against domain ontologies in order to derive
structured semantic metadata from the folksonomy tags.
1.2 Significance of the Research
The significance of this research revolves around the following motives:
1- Proof-of-concept; to show that self-tagging (hereafter folksonomies) can be
considered a good source of metadata to semantically annotate web
resources; folksonomies can describe what a resource is about, and of which
type it is (e.g. reference, slides) so that it can be used in specific fields.
2- To benefit from the social aspect of the Web, in other words, to harness the
wisdom of the crowds. This can be achieved by customizing large social
bookmark services to serve different domain requirements. In this thesis it
will be the case of the educational domain.
3- Folksonomies are a new trend on the Web and their popularity is growing
overtime, however, little has been written about them academically. This
thesis will try and explore one aspect of folksonomies, using them to create
semantic metadata, and report the results of the approach to the community.
1.3 Research Hypotheses
The hypotheses of this thesis can be stated as follows:
1. Folksonomies can be used in the process of semantic annotation of web
resources; this implies the following sub-hypothesis:
Page 19
hidden
4
a. Folksonomies, as index keywords, hold more semantic value than
keywords automatically extracted by machines.
b. Searching by folksonomies mapped to ontologies retrieve more web
resources than searching by folksonomies alone.
c. Folksonomy annotations cover more contextual dimensions than a
human subject-expert does.
2. Fine-grained metadata elements’ values come from The Long Tail5.
1.4 Research Scope
Figure 1.1 gives a snapshot of the various technologies utilised in this thesis.
Figure 1.1: The research scope

From the Web 2.0 domain, the thesis exploits folksonomies, the light weight
knowledge representation artefacts used in most contemporary web applications.

From the Semantic Web domain, the thesis employs the power of ontologies to
generate semantic metadata using folksonomies.


5 A theory that states “in statistical distribution the accumulated minority can be more important than
the simple majority” (Grimes and Torres, 2006).
Page 24
hidden
9
Chapter 2
Metadata and Learning Objects
2.1 Introduction
In the past, metadata was often neglected and treated as a second-class citizen.
However, once the computer era emerged and people started using computers to
store their data, the need for techniques to retrieve these data from computers was
established. Since then the metadata concept has evolved in the computer science
paradigm, starting from the simple file systems (file names and types) in the early
60s, then database management systems (to describe database fields) in the early 70s,
until the 21st century with the advent of the concept of metadata warehouses (Arun,
2004).

Metadata can take many forms and formats, they can be applied electronically to
documents, applications and web services, or they can be presented physically such
as the margins in a textbook. Metadata can also be expressed in a wide range of
languages (formal or natural) by using a wide range of vocabularies (Corcho, 2006).

Metadata is a record that consists of structured information about a resource; it can
be also defined as information about information or data about data; and it is
structured in a manner that facilitates the management, discovery and retrieval of
resources. Another useful definition for metadata is given by (Haase, 2004) as “any
data which conveys knowledge about an item without requiring examination of the
item itself.”

Page 25
hidden
10
A metadata record typically consists of a set of elements (fields) which describe in
detail the content of the resource, such as its intellectual property rights, and its
'instantiation' (e.g. date created) (LTSO, 2004).

In this chapter, metadata types, principles, applications and purposes will be
discussed. Also a glance into metadata in education and The Semantic Web will be
given. Finally, a short discussion about learning objects and their types will be
presented.
2.2 Metadata Types
Metadata can be as simple as a set of keywords or as complex as a structured record.
In principle, there are three types of metadata: descriptive, structural and
administrative metadata (NISO, 2004).

Descriptive metadata describes what a resource is about to foster discovery and
identification (e.g. title, author and keywords). Structural metadata describes how
resources are related (e.g. how chapters are structured in a book). Administrative
metadata describes how a resource can be managed (e.g. creation date, file type and
who is allowed to access the resource).

Similarly, looking into the literature of metadata and its evolution (Al-Khalifa and
Davis, 2006c), metadata can be classified based on recent research into:
1. Standard metadata: those are formal specifications used to semantically
annotate materials of any kind. They have been developed to support both
machine interoperability (information exchange) and resource discovery by
human users (Stratakis et al., 2003). Examples include Dublin Core (DC) and
IEEE-LOM.
2. Semantic metadata: “…the process of attaching semantic descriptions to
Web resources by linking them to a number of classes and properties defined
in Ontologies” (Scerri et al., 2005). More on semantic metadata in chapter 6.
3. Attention metadata: “… concerns collecting detailed information about the
relation between users and the content they access.” (Najjar et al., 2006).
Page 26
hidden
11
Attention metadata uses the AttentionXML6 open standard to track user
interaction with web applications such as Blogs, Wikis, news, etc. The
collected data from log files includes information about the user’s
preferences, context, goals and interests (Najjar et al., 2006). Najjar et al. is
working on extending AttentionXML in order to collect rich data from
eLearning applications. Their new attention schema is called CAM
(Contextualized Attention Metadata) and it is used to collect and merge
attention metadata of users from different educational tools.
2.3 Metadata Principles
Another important aspect of metadata is its underlying principles. Duval et al. (2002)
have defined the principles in metadata context as: “concepts that are judge to be
common to all domains of metadata and which might inform the design of any
metadata schema or application”. Applying these principles will provide the
guidelines for developing practical solutions for semantic and machine
interoperability for any domain using any metadata standard.

The first principle is modularity, which is a key organizing principle for managing
multiple sources of content in metadata. It allows metadata schema designers to
assemble data elements from different schemas rather than reinventing anew
elements. They also benefit from vocabularies as well as other building blocks by
combining them in a syntactic and semantic way to leverage interoperability.

The second principle is extensibility; this means that metadata schemas must be
flexible enough to accept the addition of new elements to accommodate application
needs. This also implies the notion of a base schema that has the basic elements
which can be exchanged by different applications and the notion of local schema that
has additional elements that tailor a given application to local or domain specific
needs.

The third principle is refinement, which means the appropriate level of detail a
metadata might have for a given application. This applies two notions which are:

6 http://developers.technorati.com/wiki/attentionxml [last accessed 24/2/2007]
Page 32
hidden
17
discussed in chapter 5) to represent metadata records in a more flexible and scalable
manner.

The use of RDF as a preferable format for representing metadata can be justified by
reading the seminal paper entitled “Semantic Web Metadata for e-Learning - Some
Architectural Guidelines” by (Nilsson et al., 2002). Nilsson et al. highlighted some
major differences between XML schema, which most standard application profiles
use, and RDF schema. One important difference is that XML schema describes the
syntactic structure of XML documents, while RDF schema describes the semantics
of a vocabulary that can be reused in any setting. Moreover, when creating
application profiles using XML, for each new application requirement the developer
needs to create a new application profile, while in the case of RDF, for each new
application requirement the developer needs just to add an extra RDF statement
without the problem of reconstructing the RDF schema. These were just two samples
of the benefit of RDF over XML, and for more about this topic the reader is referred
to (Nilsson et al., 2002).
2.9 Issues Associated with Educational Metadata
By skimming through research that utilises standard metadata in the eLearning
domain, the researcher has found that most researchers were unsatisfied with the
capabilities provided by educational standard metadata. Among these recent
complaints:
• “The problem with metadata information like IEEE-LOM or IMS is mainly
number of fields to fill (more than 50 fields) and the amount of time a user
has to invest to describe a resource” (Yin et al., 2003).
• “LOM has a deficiency in semantic-awareness capability” (Lee et al., 2006).
• “… educational attributes of LOM are very difficult to produce” (Motelet and
Baloian, 2006).
• “… LOM and SCORM, have emerged to annotate and package learning
content. But they mainly deal with technical aspects and do not express much
information about pedagogy” (Dehors and Faron-Zucker, 2006).

Page 35
hidden
20
2.11.1 Level of Granularity for Learning Objects
Learning objects can vary in size from a single slide in a PowerPoint presentation to
a whole certificate program as has been discussed in the previous section. Thus, to
deploy, reuse or author a learning object its level of granularity needs to be defined.

There are different levels of granularity for learning objects and many papers such as
(Duval and Hodgins, 2003), (Redeker, 2003) and (Stratakis et al., 2003), have tried to
define the boundaries between these levels. However, the issue remains fuzzy and it
is hard to achieve consensuses due to the different perspectives of learning object
authors and pedagogical specialists.
2.12 Learning Objects and the Semantic Web
The current set of elements in the IEEE-LOM standard is not sufficnet for intelligent
discovery and assembly of learning objects. To verify this each learning object needs
to specify how it is related to concepts in a particular domain and also clarify the
types of learning outcomes possible in that domain (i.e. the need for an ontology).
With this kind of knowledge Web agents can search and retrieve learning objects
more intelligently (Mohan and Brooks, 2003). Further discussion about Semantic
Web and ontologies in education will be addressed in Chapter 5.
2.13 Chapter Summary
This chapter has overviewed both metadata and learning objects as key players in
learning technologies discipline. Metadata is used to describe learning objects for
easy retrieval and discovery. Also, this chapter discussed the importance of metadata
in the Semantic Web, which will be a major theme in this thesis.

Finally, the research in the area of metadata standards and application profiles has
resulted in a proposal for an initiative to create the first Arabic metadata application
profile called AraCore (Al-Khalifa and Davis, 2005).

Page 37
hidden
22
reflecting what a user thinks is the appropriate term to describe a resource. Notice
that the tags’ namespaces are user created and are usually uncontrolled.

There are many successful contemporary services on the Web that foster the concept
of tagging. These include del.icio.us18, flickr19 and furl20, to name but a few. In
addition, tagging services fall into more specialised categories, like social
bookmarking (e.g. de.icio.us), photo-sharing (e.g. flickr), and community-based
news websites (e.g. Digg21), etc. Tags also play a prominent role in the Windows
Vista OS, as reported on the Microsoft website22. Also, Amazon23 is asking its
customers to use tags for annotating its commodities (e.g. books, toys, etc.) and
Google24 is using tagging in its GMail25 service.

Tagging can have other names that can be used interchangeably to mean the act of
people assigning descriptions to resources, among these are: mob indexing, folk
categorisation, social tagging, federated tagging, lazy tagging, folksonomy,
tagsonomy, tagonomy, free tagging, distributed classification, post coordinate
indexing, collective indexing, user-generated tagging and ethnoclassification
(Hammond et al., 2005). However, the widely accepted and popular word is
folksonomy; therefore, this term will be used throughout this thesis.
3.1.1 Why do people tag?
Hammond et al. (2005) have identified the motivation for tagging in four regions as
shown in Figure 3.1. The figure splits the tagging players into a horizontal axis
which denotes the creator of the content (either one or more) and a vertical axis
which refers to the users of the generated tags.

18 http://del.icio.us [last accessed 18/2/2007]
19 http://www. flickr.com [last accessed 18/2/2007]
20 http://www.furl.net [last accessed 18/2/2007]
21 http://www.digg.com/ [last accessed 18/2/2007]
22http://www.microsoft.com/windows/products/~/productivity.mspx [last accessed 18/2/2007]
23 http://www.amazon.com/gp/tagging/cloud [last accessed 18/2/2007]
24 http://www.google.com [last accessed 18/2/2007]
25 http://www.gmail.com [last accessed 18/2/2007]
Page 38
hidden
23
Region 1 (self, self) represents an individual who is tagging his/her own content for
their own benefit (as content creators and consumers) without taking into
consideration the use of others; an example of such a tagging habit is evident in the
Flickr photo-sharing service.

Figure 3.1: The four regions for the Motivation of Tagging alongside some examples of
services that satisfy that motive [by (Hammond et al., 2005)]

Region 2 (others, self), represents an individual who is tagging others resources for
his/her own use. An example of such a service is the social bookmarking system
del.icio.us.
Region 3 (self, others) represents an individual who is tagging his/her own content
for the benefit of other people. An example of this act is Technorati26 service, an
Internet search engine for searching blogs.
Region 4 (others, others) represents people who are tagging others resources for
others to use. A well-known example is the Wikipedia27 website.


26 http://technorati.com/ [last accessed 18/2/2007]
27 http://www.wikipedia.org/ [last accessed 18/2/2007]
3 4
2 1
Page 44
hidden
29
design, they talked about the dimensions of tagging systems’ design that may have
immediate and considerable effect on the content and usefulness of the tags
generated by the system (e.g. tagging rights, tagging support, etc). When discussing
user incentives, they claim that users’ motivations, either personally or socially, play
a significant role in affecting the tags that emerge from social tagging systems. They
also present a preliminary analysis of tag usage within the photo-sharing and tagging
system ‘Flickr’ to suggest potential future directions of research in tagging systems.
Similarly, Wu et al. (2006) have proposed some enhancements that need to be
considered when designing collaborative tagging systems. They also highlighted
some key challenges encountered while building collaborative tagging systems and
have developed a comprehensive evaluation methodology to be used in assessing the
construction of collaborative tagging systems.

Ontology creation research:
A study by Mika (2005) has been carried out to construct a community-based
ontology using del.icio.us as a data source. He created two lightweight ontologies out
of folksonomies; one is the actor-concept (i.e. user-concept) ontology and the other is
the concept-instance ontology. The goal of his experiment was to show that
ontologies can be built using the context of the community in which they are created
(the del.icio.us community). Despite the innovative approach that Mika follows, this
thesis has not considered building ontologies from folksonomies. By the same token,
Tom Gruber is working on a system called TagOntology to build ontologies out of
folksonomies, and in his paper entitled “Ontology of Folksonomy: A Mash-up of
Apples and Oranges” he casts light on some design considerations needed to be
taken into account when constructing ontologies from tags (Gruber, 2005).

In addition, Ohmukai et al. (2005) proposed a social bookmark system, called
‘socialware’, using several representations of personal networks and metadata to
construct a community-based ontology. The personal network was constructed using
FOAF30, RSS31, and simple RDFS32 formats, while folksonomies were used as the
metadata.

30 Friend Of A Friend
31 Rich Site Summary
32 To be discussed in chapter 5
Page 45
hidden
30
Their system allows a user to browse friends’ bookmarks on his/her personal
network, and map their own tag onto more than one tag from multiple friends, so that
they are linked by the user. This technique will allow for efficient recommendation
for tags because it is derived from personal interest and trust. They also used their
social bookmark system to design an RDF-based metadata framework to support
open and distributed models.

Christiaens (2006) devised a mechanism to convert folksonomy tags into a taxonomy
and then combine them with ontologies. The process of creating a taxonomy was not
explicitly clear in his paper; however, the author claimed that trying this approach in
a system called Guide proved valuable. His idea originated from the need to bridge
the gap between restricted vocabulary (i.e. ontologies) and free vocabulary (i.e.
folksonomies).

Folksonomy patterns, linguistics and analysis research:
Golder and Huberman (2006), from HP Labs, analysed the structure of collaborative
tagging (aka folksonomies) to discover the regularities in user activity, tag
frequencies, the kind of tags used and bursts of popularity in bookmarked URLs in
the del.icio.us system. They also developed a dynamic model that predicts the stable
patterns in collaborative tagging and relates them to shared knowledge. Their results
show that a significant amount of tagging is done for personal use rather than public
benefit. However, even if information is tagged for personal use, other users can
benefit from it. They also state that del.icio.us, for most users, functions as a
recommendation system even without explicitly providing recommendation. This
argument supports the design decision that the researcher has followed when
developing her annotation tool.

Sen et al. (2006) presented a user-centric model of vocabulary evolution in tagging
communities based on community influence and personal tendency. They collapsed
Golder’s classes into three general classes and used the modified classification metric
to evaluate the MovieLens recommender system. They also used four tag selection
algorithms to recommend tags to users of the MovieLens recommender system and
to evaluate the effect of the algorithms on vocabulary evolution, tag utility, tag
adaptation and user satisfaction. The modified categorisation that Sen et al. proposed
Page 47
hidden
32
ontology has been used to facilitate interoperability between application-dependent
tag sets.

Folksonomy statistical research:
Hotho et al. (2006a) presented a new search algorithm for folksonomies, called
FolkRank, which exploits the structure of the folksonomy tags. Their proposed
algorithm is used to support the retrieval of resources in the del.icio.us social
bookmarking services by ranking the popularity of tags. They demonstrated their
findings on a large-scale data set (around 250k bookmarked resources) and showed
that their algorithm yielded a set of related users and resources for a given tag.
Therefore, ‘FolkRank’ can be used to generate recommendations within a
folksonomy system. In the same vein, Dubinko et al. (2006) introduced the
‘interestingness’ algorithm, which is based on the characterisation of the most
interesting tags associated with a sliding interval of time. They experimented with a
large number of tags in the Flickr online photo-sharing community to visualise the
interesting tags over time. The Dubinko et al. interestingness algorithm was used by
(Hotho et al., 2006b) to rank the interesting resources in the del.icio.us bookmarking
service for an interval window size of one month. They compared the results of the
interestingness algorithm to the results of the FolkRank algorithm and found that, the
interestingness algorithm is more sensitive to temporary changes in folksonomy tags
than FolkRank. In contrast, FolkRank algorithm was more useful for long-term
observations.
Both the Hotho et al. and Dubinko et al. proposal for computing a recommendation
(ranking) value from folksonomy tags seems practical and very robust, however,
their underlying algorithms were very complicated and they require large data sets to
come up with reasonable values. These two requirements have put off the researcher
from trying to use either algorithm in computing the recommendation value proposed
for the folksonomic semantic metadata, as will be seen in Chapter 8.

Similarly, Szekely and Torres (2005) from Harvard University have developed a
system called “gourmetvillage.org” that uses folksonomies as a vehicle for sharing
and classifying information in order to evaluate restaurants. The system is based on
two algorithms: ‘UserRank’ and ‘TagRang’. Szekely and Torres define UserRank as
“… an algorithm based on Google’s PageRank that provides a ranking of users
Page 50
hidden
35
or with controlled vocabulary and title-based automatic indexing as in (Lin et al.,
2006), or finding the relationship between the Semantic Web and social tagging as in
(Campbell, 2006). Also the workshop had a panel about social classification of visual
resources, where it discussed the use of social tagging in museums and photo-sharing
sites.

In spite of the diverse themes and topics tackled in the both workshops, the WWW06
and CRW, none of these have proposed any possible usage of folksonomy tags in the
domain of eLearning; for example, using folksonomy tags to create semantic
metadata for annotating learning resources. However, these workshops gave the
researcher an insight of the line of research that both computer science and library
science researchers are embracing.
3.6.3 Case Studies
Elke Michlmayr has conducted a case study on the properties of metadata provided
by folksonomy; her domain of research was in social networks (Michlmayr, 2005).
In her paper Elke provided an in-depth study of the properties of tags produced by
folksonomies. She investigated how metadata produced by folksonomies can serve as
simulation data in peer-to-peer environments. To accomplish her goal she developed
a method for selecting subsets of folksonomies tags, from the del.icoi.us bookmark
service, that adhere to the principle of interest-based locality. Her result shows that
folksonomies can be applied for simulating peers and their content in peer-to-peer
environment.

Another case study was carried out by (Lawrence and Schraefel, 2006) on the
amateur fiction community. The study analysed how folksonomies evolve inside
these communities and considered how ontologies and folksonomies can be used
together to add the easy usability of free tagging to ontology descriptions and the
richness of conceptual ontologies to folksonomies.
Page 54
hidden
39
This chapter, however, is dedicated to analyzing the del.icio.us social bookmarking
service for two main reasons: 1) del.icio.us is the largest social bookmarking service
on the web; since its introduction in December 2003, it has gained great popularity
and there are more than 90,000 registered users using the service and over a million
unique tagged bookmarks (Menchen, 2005; Sieck, 2005); and 2) del.icio.us shares
the same characteristics and underlying concepts that other social bookmarking
services use, such as: tagging, web-based storage and the social nature of these Web
applications (Millen et al., 2005).

Therefore, this chapter will start with a comprehensive overview of the del.icio.us
service; then an anatomy of the tags stored within the del.icio.us service will be
carried out. Finally, the chapter will conclude with a brief comparison between
bookmarking services and search engines.
4.1 The del.icio.us Social Bookmarking Service
Every day hundreds of URLs are bookmarked online using the del.icio.us
bookmarking service. Each bookmarked URL is accompanied by a line of text
describing it and a set of tags assigned by people who bookmarked the web resource
(as shown in Figure 4.1).


Figure 4.1: Excerpt from the del.icio.us service showing the tags (Blogs, internet, ... ,cool)
for the URL of the article by Jonathan J. Harris, the last bookmarker (pacoc, 3mins ago) and
the number of people who bookmarked this URL (1494 other people).

Visitors and users of the del.icio.us service can browse the bookmarked URLs by
user, by keywords (tags) or by a combination of both techniques. By browsing
others’ bookmarks, people can learn how other people tag their resources; thus,
increasing their awareness of the different usage of the tags. In addition, any user can
create an inbox for other users’ bookmarks, by subscribing to the other user’s

43 http://www.citeulike.org/ [last accessed 15/2/2007]
Saved by 1494 other people…..
Page 55
hidden
40
del.icio.us pages. Also, users can subscribe to RSS feeds for a particular tag, group of
tags or other del.icio.us users.
4.1.1 The del.icio.us Data Model
The del.icio.us data model is composed mainly of three interconnected components,
as shown in Figure 4.2, which are: URLs, tags and users.


Figure 4.2: The relation between the three del.icio.us components

URLs are the main assets of the del.icio.us service. A bookmarked URL can have
multiple tags and can be bookmarked by many users. It can also point to a website, a
Word file, PDF document, a Video, Audio or Flash resource.

Bookmarked URLs cover a variety of topics such as web development, media,
business and entertainment. Over time, tags are accumulated depending on the
number of people who bookmarked the same URL. Each bookmarked URL is listed
in a backward-chronological order.

In the del.icio.us service, what makes a URL so valuable is the fact that tags have
been assigned to it. Tags can be treated as kind of metadata; they can tell what a
resource is about without further investigation. So, as a URL gains more popularity
overtime, the bookmarking service can be thought of as a collaborative information
filtering (i.e. recommendation or voting) system for the best web resources on the
web.

Page 57
hidden
42

Figure 4.4: A Screenshot showing the ‘suggestions’ field and the ‘tags’ box

Tags listed under a URL can be clicked; this will take the user to a page which lists
all the URLs given the same tag. The page will also display a list of ‘related tags’
that have been used with the given tag, but in a different context (see Figure 4.5).

Figure 4.5: A Screenshot showing the ‘related tags’ portion

Users are the engine of the del.icio.us service. With their social efforts del.icio.us has
been widely used. del.icio.us provides each user with his/her own page that shows
his/her bookmarked web resources displayed in a chronological order together with
the associated tags. The web page also list all the tags used by the user.

In a pilot research by (Menchen, 2005) to identify the occupation of the del.icio.us
users, she found that the predominant occupations for a sample of the del.icio.us
users were in the information technology industry and education or research. Another
indicator of the IT nature of URLs bookmarked in del.icio.us is an experiment
Page 59
hidden
44

Figure 4.6: An excerpt from the del.icio.us service, showing that the resource “jQuery: New
Wave Javascript” is about ‘Programming’ in ‘Ajax’, and the person who bookmarked it will
going to read it ‘toRead’ and (s)he describe how useful the resource was (‘Cool’) and defines
the type of the resource as being a ‘library’

Many patterns have been observed after analysing people’s vocabulary, which
include:
• Specific Words with distinct meaning (e.g. CSS, Mac, OS).
• Acronyms and abbreviations (e.g. UI means User Interface, CompSci means
Computer Science).
• Compound words or phrases (e.g. computerscience, computer_sceince,
computer.science or ComputerScience).
• Misspelled tags.
• Singular and plural.
• Synonyms.
• Capitalisation (e.g. CSS or Css or css).
• Non-English tags and symbols (e.g @site).

In other words, people vocabulary can be categorised into:
• Domain specific tags, either broad or narrow (e.g. broad: Programming,
narrow: Javascript).
• Type of a resource (e.g article, tutorial, reference)
• Subjective (opinion or expression) that provides judgment-related context
(e.g. fun, funny, cool)
• Attitudes, functional tags (e.g. toread, 2read, tovisit, learn-later)
• Colloquial phrases and localisation (Motive, 2005).
• Others that only make sense to the tag creator. Hence, people usually tag for
themselves (WeBreakStuff, 2005; Stock, 2006).

Furthermore, some users of del.icio.us have adopted a private convention to indicate
the tag’s hierarchy (i.e. structural relation between tags e.g. Dev/Perl). Also, another
Page 61
hidden
46

Figure 4.7: Set of Tags assigned to a website with the title “Layout-o-matic45”

The researcher has also taken into consideration the importance of the number
associated with each tag; hence the number represents how many people have used
that tag. Detailed discussion about this topic and the design decisions will be
addressed in Chapter 9.
4.1.4 Social Bookmarking Services versus Search Engines
A comparison between search engines and social bookmarking services is not quite
equitable since each system provides a different service. On one hand, search engines
are purely machine-centric but on the other hand, social bookmarks are purely
human-centric. However, one problem with search engines comes from the results
they give. Search engine results usually include ‘noise’ in the form of unrelated
results. Their results differ from tag search results, as search engines are not based on
user-assigned keywords.


45 http://www.inknoise.com/experimental/layoutomatic.php
A
A
B
C
Page 63
hidden
48
sensitive web resources. This can be witnessed from people’s experience in using
del.icio.us to pitch their websites e.g. (Martino, 2005).
4.2 Chapter Summary
Every day, hundreds of URLs are bookmarked using the del.icio.us service; these
URLs represent what people think worth bookmarking for later use. Among the
bookmarked URLs there exists some sign of web resources that can be nominated as
being useful in an educational context.

To further investigate the usage and quality of folksonomies two experiments will be
presented. The first experiment was carried out to explore the value of folksonomies
compared to automatically extracted keywords (Chapter 7). The second experiment
was carried out to use folksonomies in the process of semantic annotation (Chapter
8).
Page 67
hidden
52
There is also another classification of ontologies based on their generality (i.e. scope)
and expressiveness (i.e. level of details) (Bruijn and Fensel, 2005). In the level of
generality there are three different types of ontologies: top-level ontologies (e.g.
CYC51, WordNet52) which are shared by many people in different domains, domain
ontologies (e.g. UNSPSC53, The United Nations Standard Products and Services
Code for classifying products and services) which are shared between stakeholders in
a particular domain and finally application ontologies (e.g. an ontology for a course)
which are used for a particular application.

The other orthogonal classification of ontologies is based on their expressiveness.
Ontologies can be distinguished by their different levels of expressiveness such as:
thesaurus (e.g. WordNet), controlled vocabulary (e.g. Dublin Core54),
informal/formal taxonomy (e.g. Yahoo directory55/UNSPSC), frames (e.g. RDFS),
value restrictions (e.g. OWL data-type), limited logic constraints (e.g. OWL DL56)
and general logic constraints (e.g. CyCL57, OWL DL).

Finally, Bruijn and Fensel (2005) also mentioned that the level of expressiveness can
be seen as two distinct categories: light-weight ontologies, which include the
concepts and the relations between them and heavy-weight ontologies, which include
axioms and constraints.

This thesis is going to focus on the use of application ontologies with a light-weight
level of expressiveness.

51 http://www.opencyc.org/ [last accessed 11/2/2007]
52 http://wordnet.princeton.edu/ [last accessed 11/2/2007]
53 http://www. unspsc.org [last accessed 11/2/2007]
54 http://dublincore.org/ [last accessed 11/2/2007]
55 http://dir.yahoo.com/ [last accessed 11/2/2007]
56 http://www.w3.org/TR/owl-guide/ [last accessed 11/2/2007]
57 http://www.cyc.com/cycdoc/ref/cycl-syntax.html [last accessed 11/2/2007]
Page 68
hidden
53
5.3 Ontology Languages
Prior to the initiative of the Semantic Web by Tim Berners Lee, many systems
existed that used different languages to represent ontologies like SCL58, CyCL and
LOOM59 (Bruijn and Fensel, 2005). Although they offer a powerful expression and
reasoning mechanism, they still lack intimate support of RDF (the key language in
the Semantic Web).

To express the semantics of a resource on the Web so that humans, as well as
machines, can understand it, a set of formal languages are used. These languages can
be stacked on top of each other to form what Berners-Lee (2000) called “The
Semantic Web Language Layer Cake”.


Figure 5.1: The Semantic Web Language Layer Cake [Berners-Lee, 2000]60.

Figure 5.1 depicts the layers of the Semantic Web starting with: the Unicode and
URI layer which forms the base for the upcoming layers. The second layer is the
XML and XML Schema, which forms the syntactical basis for the Semantic Web
languages. The third layer is the RDF and RDF Schema which represents the
expressive language for the Semantic Web. The next layer is OWL, which represents
the ontology language for the Semantic Web. An overview of each of the three
languages used in the Semantic Web is presented in the following sub-sections, with

58 Simple Common Logic (SCL) http://www.ihmc.us/users/phayes/SCL-december.html [last accessed
11/2/2007]
59 http://www.isi.edu/isd/LOOM/ [last accessed 11/2/2007]
60 http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0.html [last accessed 11/2/2007]
Page 72
hidden
57
5.4 Building Ontologies
Ontologies can be generated either manually or semi-automatically (Gómez-Pérez
and Manzano-Macho, 2004). Manual ontology building is a tedious, time-consuming
and error-prone task. Semi-automatic building of ontologies is more appropriate for
speeding up the process of ontology generation.

The process of semi-automatic generation of ontologies is usually referred as an
ontology learning process, which can be defined as
“the application of a set of methods and techniques used for building an
ontology from scratch by enriching, or adapting, an existing ontology in a
semi-automatic fashion using distributed and heterogeneous knowledge and
information sources, allowing a reduction in the time and effort needed in the
ontology development process” (Gómez-Pérez and Manzano-Macho, 2004,
p.187).

The process of ontology learning from text includes a number of methods that came
from complementary disciplines (e.g. Natural Language Processing ‘NLP’ and
machine learning) and is applied to different types of unstructured, semi-structured,
and fully structured data. These methods can be summarized as follows (Gómez-
Pérez and Manzano-Macho, 2004):
• Approaches based on linguistic techniques: These include NLP techniques
such as pattern-based extraction, semantic relativeness, etc. An example of a
system using this technique is SOAT (WU and HSU, 2002).
• Approaches based on statistical techniques: These methods rely on
calculating several statistical measures (e.g. Term Frequency Inverse
Document Frequency ‘TFIDF’) to help the ontologist detect new concepts
and the relationships between them. As an example of a system based on this
technique is WOLFIE (WOrd Learning From Interpreted Examples)
(Thompson and Mooney, 1999).
• Approaches based on machine learning algorithms: These algorithms include
all methods from the machine learning domain to assist the ontologist in
detecting new concepts and their relations, and to help in placing them in the
correct position in the taxonomy. As an example of a system that uses this
technique is OntoLearn (Navigli et al., 2003).
Page 76
hidden
61
Chapter 6
Semantic Metadata Annotation
Annotation is a mechanism to associate metadata with web resources (Bechhofer et
al., 2002). Annotating a web resource with semantic metadata provides meaning to
its content.

This chapter starts by clarifying the meanings of ‘semantics’, ‘annotation’ and
‘semantic metadata annotation’, as these three terms formulate a cornerstone for
understanding what is meant by semantic metadata annotation. Next, a
comprehensive discussion about the different semantic annotation techniques and
methods that have been used in most semantic annotation tools is laid out. Finally,
the chapter ends with some concluding remarks concerning the development of the
FolksAnnotation tool.

6.1 What is Semantics?
Semantics [noun]: the study of meanings; the meaning or relationship of
meanings of a sign or set of signs; especially: connotative meaning (From
Merriam-Webster online Dictionary71).

Different areas of computer science have different interpretations of what
‘semantics’ mean (Sheth et al., 2005; Lytras and Naeve, 2006). For instance, in the

71 http://www.m-w.com/dictionary/Semantics [last accessed 11/2/2007]
Page 80
hidden
65
Annotizer tool (Handschuh et al., 2001). The most significant drawback of manual
annotation is that it is prone to errors due to many factors such as annotator
unfamiliarity with the domain and/or his/her lack of motivation (Bayerl et al., 2003).
Also manual annotation is an expensive process in terms of time and effort.

Semi-automatic annotations analyze a text to identify instances and then relate them
to their corresponding ontological concept. These systems are not completely
automatic; hence human intervention is required to clarify ambiguous terms. An
example of this type of annotation is SemTag (Dill et al., 2003a; Dill et al., 2003b).

Reeve and Han have claimed that complete automatic annotation tools do not exist,
based on the fact that in an early stage of the annotation process a human
intervention is required to bootstrap the process (Reeve and Han, 2005). However,
the researcher will show in section 6.4.3.2 an example of a complete automatic
annotation tool called C-PANKOW.
6.4.1 Platform Classification
As mentioned previously, Reeve and Han classified annotation platforms based on
the type of annotation method used into: pattern-based, machine learning and multi-
strategy based.

The role of Pattern-based annotation is to find patterns for a defined initial set of
entities in a corpus. Thus, when new entities are discovered along with new patterns,
the process is repeated until no more entities are discovered or the user stops the
process. This process can also use manual rules to find entities in text.

Machine-based annotation uses two methods: probability and induction.
Probabilistic annotation tools use statistical models to locate entities within text.
Induction tools use either linguistic or structural analysis to perform wrapper
induction74.

74 Wrapper induction is ‘a technique for automatically constructing wrappers from labeled examples
of a resource's content’. From http://www.cs.washington.edu/homes/weld/wrappers.html [last
accessed 28/2/2007]
Page 81
hidden
66

Finally, multi-strategy annotation combines both pattern-based and machine-based
methods; however, Reeve and Han claim that until now no system exists that
implements the multi-strategy annotation method.
6.4.2 Semantic Annotation Frameworks
Uren et al., on the other hand, have talked about two annotation frameworks:
Annotea the W3C annotation project (Kahan et al., 2001) and CREAM (Handschuh
and Staab, 2003), an annotation framework developed at the university of Karlsruhe.

Annotea (Koivunen, 2005) (Kahan et al., 2001) is a free text annotation tool that
associates statements about documents in a collaborative fashion. These statements
must have metadata fields such as author, creation time, etc. Annotea uses RDF as
the format of the metadata. The types of documents that can be annotated using
Annotea are limited to XML and HTML format. The generated metadata can be
stored either locally (in the user machine) or on public RDF servers. Examples of
tools based on the Annotea framework are Amaya75 and Annozilla76.

The CREAM (Creating RElational, Annotation-based Metadata) framework
(Handschuh et al., 2001; Handschuh and Staab, 2003; Handschuh and Staab, 2003a)
allows the creation of relational metadata, metadata that comprises class instances
and relationship instances.

The CREAM framework as an annotation framework comprises the following
modules that are required for semantic annotation: a document viewer to visualize the
web page content, an ontology guide to help in the annotation process, a crawler to
search the Semantic Web for an existing annotation for the instance being annotated,
an annotation inference server for querying annotated documents, and document
management for managing annotated documents. Furthermore, CREAM is capable
of annotating the deep web i.e. databases; therefore when web pages are generated

75 http://www.w3.org/Amaya/ [last accessed 12/2/2007]
76 A browser based on Mozilla browser to create and view annotations associated with a web page,
http://annozilla.mozdev.org/ [last accessed 12/2/2007]
Page 87
hidden
72
Piggybank91 (Huynh et al., 2005) is a Firefox plug-in to semantically annotate
websites. It also provides screen-scraper functionality (a screen-scraper is a client-
side program that extracts specific information from a web page e.g. price, product,
and colour from a commerce website). Piggybank converts the information collected
by the screen-scraper users add their own tags (i.e. folksonomies) to annotate
websites and save these tags in an RDF format. The saved RDF files can be either
saved on the user’s computer or moved into a collaborative server called a Semantic-
Bank.
6.5 Semantic Annotation Tools for eLearning
Few semantic annotation tools exist for annotating learning resources. In a survey
paper by (Azouaou et al., 2004) about the different tools for semantic annotation for
learning materials, the authors tried to identify some specifications as guidelines for
developing semantic annotation tools that fulfil the requirements of educational
applications.

They first categorized the three main players in the annotation activity which
includes:
• The author of the annotation (the annotator).
• The addressee of the annotation (the user of the annotation).
• The fact that the annotation is semantic or not.

Then, based on the previous characterization, they provided four properties of
annotation tools, which are:
• Automatic versus manual annotation.
• Cognitive versus non-cognitive annotation.
• Computational versus non-computational annotation.
• Semantic versus non-semantic annotation.

They also list the requirements for eLearning annotation tools, namely: usefulness
(which takes into account teaching/learning context); shareability (which enables

91 http://simile.mit.edu/wiki/Piggy_Bank [last accessed 12/2/2007]
Page 89
hidden
74

In this thesis the goal of using semantic annotation is to classify and add information
to existing web resources, so they can be retrieved and searched by semantic means,
which makes these web resources amenable for machine processing.
6.6 Discussion
From the previous overview of the different aspects of the process of semantic
annotation (general and domain specific), several points can be highlighted:
• Most previously mentioned tools rely on either human manual annotations or
(semi)-automatic annotation that uses Information Extraction (IE) and
Machine Learning (ML) techniques to extract valuable information from a
web resource. Both techniques suffer from apparent shortcomings. In the case
of manual annotation, the main shortcoming is that it is a human dependent
process, which leads to significant effort and sometimes to errors when
handled by an incompetent annotator. In the case of (semi)-automatic
annotation, the shortcoming can be viewed as a fluctuation in the accuracy
and quality of the produced semantic metadata.
• There are few semantic annotation tools dedicated to the eLearning domain,
this might be attributed to the sheer interest in the Semantic Web community
for building Semantic Web technologies to serve the needs of large
industries/ organizations and/or research centres, rather than to education.
• Many of the reviewed semantic annotation tools follow a content-level
semantic annotation approach, where the internal pieces of a web resource are
linked to ontological terms, i.e. these tools are designed to insert ontology-
based markups in web pages (Corcho, 2006). However, this thesis is using a
slightly different systematic approach for semantic annotations. The
implemented tool has adopted a document-level semantic annotation
approach, where an overall description of a web resource is generated without
the hassle of performing a content-level interlinking with ontological terms.
• One difference the thesis tool has compared to the Piggybank plug-in is that it
uses pre-generated ontologies and deals with a specific domain, while the
Piggybank plug-in is open to all and does not comply with any ontologies.

Page 90
hidden
75
To conclude, the problem of most automatic semantic annotation tools is that they
require ‘the man in the middle’ process, which uses extraction technologies. This
wastes an extensive amount of processing time in that phase. Moreover, none of the
previously mentioned tools have used folksonomies as guides in the process of
annotating web resources. So, to test the potential of using people’s metadata (aka
folksonomies) in the process of semantic annotation and to check how rich the
generated semantic metadata will be; this thesis explores the benefit of using the
output of contemporary web services that use tagging as their main assets to create
semantic metadata.
6.7 Chapter Summary
The term ‘annotation’ has different interpretations depending on the context that it is
used in. Some might think of it as private notes, others as comments or remarks by
the author or the visitor of a web page. Despite these different interpretations;
annotation, or in particular semantic annotation, is what makes the web amenable for
machine processing.

This chapter has discussed in some detail the different platforms, frameworks and
tools used for semantic annotations. It also highlighted some important guidelines
and requirements that need to be considered when designing an annotation tool for an
eLearning domain.

The vision of this thesis is to develop a semantic metadata annotation tool for the use
in an educational context. The source of semantic descriptors will come from
folksonomy tags; to show the added value of the folksonomy community in the
process of semantic annotation. The FolksAnnotation tool will not annotate the
content of a web resource; instead it will assign document-level semantic metadata to
a web resource as a whole. The discussion of the folksonomy-based annotation tool
along with the design decisions will be the theme of the next chapter.
Page 91
hidden
76
Chapter 7
Exploring the Value of
Folksonomies
While previous chapters have laid out the foundation of the thesis work by
signposting the various technologies exploited for building the FolksAnnotation tool,
this chapter and the following ones will cover the main contributions of this thesis by
discussing the various experiments conducted to justify the thesis hypotheses.

In this chapter, the exploration of the value of folksonomies against automatic
indexing mechanism is done by testing Hypothesis 1(a), which states:
“Folksonomies, as index keywords, hold more semantic value than
keywords automatically extracted by machines.”

The underlying assumption of this hypothesis is that most folksonomy tags are more
related to a professional indexer’s mindset than keywords extracted using automatic
keyword extraction techniques.

The main questions this experiment tries to answer are:
• Do folksonomies only represent a set of keywords that describe what a
document is about, or do they go beyond the functionality of index
keywords?
• What is the relationship between folksonomy tags, automatically extracted
index keywords and keywords assigned by a professional indexer?
Page 92
hidden
77
• Where are folksonomies positioned in the spectrum from professionally
assigned keywords to context-based machine extracted keywords?

In order to find out if folksonomies can improve on automatically extracted
keywords, it is significant to examine the relationship between them, and between
them and professional human indexer keywords. Therefore, this chapter starts by
discussing similar works that have compared folksonomy tags to other indexing
mechanisms. Then the setup of the experiment and data set selection are explained.
Finally, the chapter concludes by reporting and discussing the results of the four
phases of the experiment.
7.1 Related Work
Little research has explored the area of folksonomies compared to other indexing
mechanisms. Kipp (2006) has examined the differences and similarities between the
user keywords (folksonomies), the author and the intermediary (such as librarians)
assigned keywords. She used a sample of journal articles tagged in the social
bookmarking sites citeulike93 and connotea94, which are specialized for academic
articles. Her selection of articles was restricted to a set of journals known to include
author assigned keywords and to journals indexed in the Information Service for
Physics, Electronics, and Computing (INSPEC95) database, so that each article
selected would have three sets of keywords assigned by three different classes of
metadata creators. Her methods of analyses were based on concept clustering via the
INSPEC thesaurus, and descriptive statistics. She used these two methods to
examine differences in context and term usage between the three classes of metadata
creators.

Kipp’s findings showed that many users’ terms were found to be related to the author
and intermediary terms, but were not part of the formal thesauri used by the
intermediaries; this was due to the use of broad terms which were not included in the
thesaurus or to the use of newer terminology. Kipp then concluded her paper by

93 http:// citeulike.org/ [last accessed 5/2/2007]
94 http://connotea.org/ [last accessed 5/2/2007]
95 A database which provides an intermediary assigned controlled vocabulary for searchers.
Page 93
hidden
78
saying “User tagging, with its lower apparent cost of production, could provide the
additional access points with less cost, but only if user tagging provides a similar or
better search context.”

Similarly, Lin et al. (2006) compared social tagging with controlled vocabularies and
title-based automatic indexing. The data set they used was similar to Kipp’s data set,
with an interest in articles in the medical filed. They concentrated on medical articles
in PubMed96 that have Medical Subject Headings (MeSH) terms and used GATE97
text-processing engine to extract the indexing keywords. Their results show that
there was little overlap among the three indexing methods, with 11% between social
tagging and MeSH terms, and 19% between social tagging and automated indexing.

Conversely, Tennis (2006) compared the differences and similarities between social
tagging and subject cataloguing using framework analysis. The framework analysis
compares the 1) processes, 2) structures, of indexing and 3) the context in which
social tagging and subject cataloguing occur. After applying the framework analysis,
Tennis has found that social tagging is quite different from subject cataloguing, and
there was a superficial similarity in purpose between the two.
7.1.1 Discussion
Apparently, the method that Kipp used does not compare folksonomies to keywords
extracted automatically using context-based extraction methods. This extra
evaluation method is significant to measure the relationship between automatic
machine indexing mechanisms lead by a major search engine like Yahoo compared
to human indexing mechanisms, and whether is it possible to replace folksonomies
with automatically extracted keywords. As for Tennis’s comparison, he did not
undertake an in-depth analysis of folksonomy tags; instead he theoretically applied
modified rubrics from the library science to compare between social tagging and
subject cataloguing, this implies that his work lacks an empirical basis. Finally, Lin
et al. is very similar to the experiment described here, differing in the tools and data

96 http://www.ncbi.nlm.nih.gov/entrez/ [last accessed 5/2/2007]
97 http://gate.ac.uk/ [last accessed 5/2/2007]
Page 96
hidden
81
overlapped keywords between the two sets. The tool then calculates the percentage
of overlap between the two sets using the following equation (1):
100
|)||(|
×−+= NKF
NP
ss
(1)

The above equation can be also expressed using set theory as (2):

100
||
|| ×∪
∩=
ss
ss
KF
KFP (2)
Where:
P Percentage of overlap
N Number of overlapped keywords
Fs Folksonomy set
Ks Keyword set










Figure 7.1: The Comparison System Framework.
7.4 Data Selection
The test data used in this experiment was randomly collected from the del.icio.us103
social bookmarking service. One hundred bookmarked websites spanning various
topics from the popular tags webpage were selected, as shown in Table 7.1.



103 http://del.icio.us/tag/, Data was collected between 24/2 and 27/2 2006
html
text
REST XML
Yahoo API
Link
Term
Extractor
Folksonomy
Extractor
Web
Document
Folksonomy
List
Del.icoi.us
DB
Re
su
lt
Comparison
Terms
List
Page 98
hidden
83
2. Yahoo TE is limited to produce only twenty terms, which may consist of one
or more words to represent the best candidate for a website (as mentioned on
the service website); these terms were put in two forms: a) concatenated to
form compound words and b) split out into single words; this action was
necessary so that Yahoo TE keywords might match del.icio.us style for single
and compound word tags.
7.6 Results
7.6.1 Phase 1
The role of phase one is to determine whether or not folksonomies carry more
semantic value than keywords extracted using Yahoo TE. In this phase the phrase
‘semantic value’ means that the tag or keyword used to describe a web resource is
relevant to its gist, i.e. the tag or keyword contributes to the description of the
resource meaning.

Thus, given the sets of keywords from Yahoo TE and del.icio.us; the two trained
indexers104 were asked to blindly105 evaluate each keyword from both sets. The
indexers were provided with a five-category table to classify the keywords from both
sets. The table has the following values: "Strongly relevant" encoded 5, "Relevant"
encoded 4, "Undecided" encoded 3, "Irrelevant” encoded 2 and "Strongly irrelevant”
encoded 1.

After evaluating 10 websites from the thesis data set, an inter-rater reliability test was
conducted for each evaluated web resource to measure the evaluation agreement
between the two indexers. This step is essential to measure the consistency among
the two indexers.


104 Two non-professional colleagues were trained during the course of two weeks on the practice of
evaluating indexing keywords.
105 By blindly, the researcher means that both indexers do not know which keyword list belongs to
which set (i.e. folksonomy or Yahoo TE).
Page 99
hidden
84
The inter-rater agreement reliability test that the researcher used to measure the
consistency of classifying keywords into categories without any ordering (i.e.
nominal data), was the Kappa (k) coefficient, a widely accepted measurement
developed by (Cohen, 1960). The value of the resulting Kappa coefficient indicates
the degree of agreement between the two raters. For interpreting the meaning of the
resulting Kappa value the researcher used (Landis and Koch, 1977) interpretation,
where 0 ≤ k < 0.2 means slight agreement, 0.2 ≤ k < 0.4 means fair agreement, 0.4 ≤
k < 0.6 means moderate agreement, 0.6 ≤ k < 0.8 means substantial agreement, and
0.8 ≤ k < 1.0 means almost perfect agreement.

Table 7.2 shows the overall average degree of agreement between the two indexers
for the 10 evaluated web resources. The obtained Kappa value for both sets falls in
the fair level of agreement, which is considered satisfactory (Bayerl et al., 2003) for
the purpose of this experiment. However, the results show that agreement between
the indexers about the folksonomy set is slightly lower (0.2005) than their agreement
about the Yahoo TE set (0.2162); the difference is statistically significant at p<
0.001. The lower Kappa value for the folksonomy set was due to a slight
disagreement in evaluating one of the websites in that set, which affected the results
accordingly.
Average Inter-Rater Agreement [Kappa-
coefficient value]
Folksonomy 0.2005
Yahoo TE 0.2162
Table 7.2: Average Inter-Rater agreement for the ten evaluated web resources in phase 1

The values summarized in Table 7.3 show the average mode value for each evaluated
website from both indexers. For all values except for site 2, 5 and 8, the results for
the folksonomy set was higher or equal to Yahoo TE values. By further inspecting
the three cases (2, 5 and 8), the researcher has found that what affected the average
mode value in these three cases in the folksonomy set was the amount of general tags
used to describe these web resources compared to the same Yahoo TE set. In
contrast, Yahoo TE extracted more specific keywords (i.e. the same or narrower
terms).

Page 100
hidden
85
The results also show that the folksonomy and Yahoo TE sets scored an equal mode
value (4 = relevant) for all sites. The values for the Yahoo TE varied considerably
compared to the folksonomy values but the most frequent value in Yahoo TE was
still (4) which appeared 3 times compared to 7 times in the folksonomy set.

Moreover, the results show that the folksonomy set has a higher mean and lower
standard deviation i.e. 4.15(0.24), this indicates a low variance in the views of the
two indexers towards classifying folksonomy tags compared to the values for Yahoo
TE, i.e. 3.55(1.01), which indicates a high variance in the views of the two indexers.
These results indicate that the folksonomy tags are more relevant to the human
indexer’s conception than Yahoo TE keywords. Furthermore, the difference between
the two means was statistically significant at p< 0.001.

Site F K
1 4.5 4
2 4 4.5
3 4 3
4 4 2.5
5 4 4.5
6 4.5 3
7 4 1.5
8 4 4.5
9 4 4
10 4.5 4
Mean 4.15 3.55
SD. 0.24 1.01
Mode 4 4
Table 7.3: The average mode values for each website in both Folksonomy (F) and Yahoo TE
(K) set along with the mean, mode and standard deviation for all 10 evaluated websites

The results of this phase gave the researcher the big picture of the semantic
relationships held in the folksonomy and Yahoo TE keywords compared to the two
indexers views. To better understand the semantics of each classified keyword in the
folksonomy and Yahoo TE sets, an in depth analysis is carried out in phase 2.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

32 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
28% Ph.D. Student
 
16% Student (Postgraduate)
 
13% Student (Master)
by Country
 
19% United States
 
19% United Kingdom
 
9% Colombia