A Document Ontology and Agent-Based RDF Metadata Retrieval
Available from maya.cs.depaul.edu
Page 1
A Document Ontology and Agent-Based RDF Metadata Retrieval
A Document Ontology and Agent-Based RDF Metadata Retrieval
Juan L. Dinos, J. Fernando Vega-Riveros
Electrical and Computer Engineering Department
University of Puerto Rico, Mayaguez Campus
Mayaguez, Puerto Rico 00680
Juan.Larry@ece.uprm.edu, fvega@ece.uprm.edu
Abstract
This paper presents the architecture of an agent-based
system to retrieve information resources for a higher
education environment. An ontology representing
documents used by students and faculty members is also
described, where RDFS (Resource Description Framework
Schema) is used. The RDFS recommendation offers
primitives that allow the representation of concepts, their
relationships and their attributes, all of which formed our
metadata.
The architecture uses three types of agent, where, each
agent is in charge of different tasks such as user interaction,
ontology retrieval and metadata search. The Java Agent
Development Framework (JADE) was used for
implementation of the agents.
Introduction
The current growth of the Internet has enabled access to
very large volumes of information resources located in
different and heterogeneous systems. Access to this
information is realized through browsers which use HTML
(Hypertext Markup Language) to display the information.
The main disadvantage of HTML is that it does not allow
for an adequate structuring of the information. Thus, the
W3C has developed other alternatives to code web
information, to give more meaning to and reduce the
ambiguity from the information resources. These are RDF
and RDF(S), which offer primitives for defining
knowledge models that are closer to frame-based
approaches (Gomez-Perez 2002). RDF(S) is widely used as
a representation format in many tools and projects.
In addition to technologies like RDF (Lassila and
Webick 1999) and RDFS (Brickley and Guha 1999 ) to
describe document metadata, other important technologies
such as software agents present important features such as
autonomy, pro-activity and cooperation that can be used
for improved document retrieval. The vision of intelligent
agents is very convincing and many people believe that
these agents will become necessary as complexity within
the World Wide Web grows (Hendler 1999). These agents
will help find information when provided with only some
words to the search engines, achieving the desired results
in a more effective way.
The advantages that can come from the combination of
these technologies to the retrieval of RDF Metadata are
significant. This paper is organized as follows. The next
section presents the motivation for our work and choices
and the formulation of the problem. Next we describe the
document ontology. This is followed by a section that
describes how this ontology was represented in RDF.
Then we go on to presenting the agent-based architecture
for metadata retrieval next, we describe the prototype built
and finally we present the conclusions of our work.
Motivation
Education is a knowledge-intensive activity in which much
remains tacit. It is required that knowledge-bases and
knowledge systems be built to effectively storing and
retrieving that knowledge generated by students and
teachers in the right context to improve learning and
developing skills and competences necessary for the
modern economic models. Universities and higher
education institutions are introducing Knowledge
Management into their organization. Nevertheless, the
complex collaboration processes that take place in the
teaching-learning process still poses major challenges to
research mainly in Knowledge Management (KM) itself
and Artificial Intelligence (AI) as a supporting field due to
the informality that in most cases surrounds learning
(Vega-Riveros 2004).
Today technologies, initiatives and strategies such as the
DCMI (Dublin Core Metadata Initiative) (Beckett, Miller
and Brickley 2002), RDF (Resource Description
Framework) (Lassila and Webick 1999), and OWL
(Ontology Web Language) (McGuinness and van
Harmelen 2004) exist or are being developed which allow
identifying and describing knowledge and information
resources. It is therefore important that knowledge
management technologies and strategies be researched,
developed and applied in education to prepare the future
workforce for the new economic model and simultaneously
enrich their learning environment
The recommendations mentioned before such as DCMI,
RDF, and RDFS, are important to describe the different
types of documents used in education such as homework
assignments, essays, reports and others that form an
important body of learning materials whose knowledge has
Juan L. Dinos, J. Fernando Vega-Riveros
Electrical and Computer Engineering Department
University of Puerto Rico, Mayaguez Campus
Mayaguez, Puerto Rico 00680
Juan.Larry@ece.uprm.edu, fvega@ece.uprm.edu
Abstract
This paper presents the architecture of an agent-based
system to retrieve information resources for a higher
education environment. An ontology representing
documents used by students and faculty members is also
described, where RDFS (Resource Description Framework
Schema) is used. The RDFS recommendation offers
primitives that allow the representation of concepts, their
relationships and their attributes, all of which formed our
metadata.
The architecture uses three types of agent, where, each
agent is in charge of different tasks such as user interaction,
ontology retrieval and metadata search. The Java Agent
Development Framework (JADE) was used for
implementation of the agents.
Introduction
The current growth of the Internet has enabled access to
very large volumes of information resources located in
different and heterogeneous systems. Access to this
information is realized through browsers which use HTML
(Hypertext Markup Language) to display the information.
The main disadvantage of HTML is that it does not allow
for an adequate structuring of the information. Thus, the
W3C has developed other alternatives to code web
information, to give more meaning to and reduce the
ambiguity from the information resources. These are RDF
and RDF(S), which offer primitives for defining
knowledge models that are closer to frame-based
approaches (Gomez-Perez 2002). RDF(S) is widely used as
a representation format in many tools and projects.
In addition to technologies like RDF (Lassila and
Webick 1999) and RDFS (Brickley and Guha 1999 ) to
describe document metadata, other important technologies
such as software agents present important features such as
autonomy, pro-activity and cooperation that can be used
for improved document retrieval. The vision of intelligent
agents is very convincing and many people believe that
these agents will become necessary as complexity within
the World Wide Web grows (Hendler 1999). These agents
will help find information when provided with only some
words to the search engines, achieving the desired results
in a more effective way.
The advantages that can come from the combination of
these technologies to the retrieval of RDF Metadata are
significant. This paper is organized as follows. The next
section presents the motivation for our work and choices
and the formulation of the problem. Next we describe the
document ontology. This is followed by a section that
describes how this ontology was represented in RDF.
Then we go on to presenting the agent-based architecture
for metadata retrieval next, we describe the prototype built
and finally we present the conclusions of our work.
Motivation
Education is a knowledge-intensive activity in which much
remains tacit. It is required that knowledge-bases and
knowledge systems be built to effectively storing and
retrieving that knowledge generated by students and
teachers in the right context to improve learning and
developing skills and competences necessary for the
modern economic models. Universities and higher
education institutions are introducing Knowledge
Management into their organization. Nevertheless, the
complex collaboration processes that take place in the
teaching-learning process still poses major challenges to
research mainly in Knowledge Management (KM) itself
and Artificial Intelligence (AI) as a supporting field due to
the informality that in most cases surrounds learning
(Vega-Riveros 2004).
Today technologies, initiatives and strategies such as the
DCMI (Dublin Core Metadata Initiative) (Beckett, Miller
and Brickley 2002), RDF (Resource Description
Framework) (Lassila and Webick 1999), and OWL
(Ontology Web Language) (McGuinness and van
Harmelen 2004) exist or are being developed which allow
identifying and describing knowledge and information
resources. It is therefore important that knowledge
management technologies and strategies be researched,
developed and applied in education to prepare the future
workforce for the new economic model and simultaneously
enrich their learning environment
The recommendations mentioned before such as DCMI,
RDF, and RDFS, are important to describe the different
types of documents used in education such as homework
assignments, essays, reports and others that form an
important body of learning materials whose knowledge has
Page 2
not been as openly shared by students and professors as
desirable. It is therefore important to develop ontologies
and tools that allow managing these documents so that they
are stored and retrieved in the right context.
This stated necessity generates a justifiable motivation
for the development of tools to assist students and
professors in storing and retrieving information documents
that result from their teaching-learning process and thus
generate knowledge-centered collaboration beyond the
confines of the classroom.
The Document Ontology
The students consult and generate different sources of
information and knowledge in their learning activities such
as elaborating their homework assignments and studying
for their tests. At present many sources of information are
already in the internet or in data bases in digital format,
which enables their search and use.
A suitable domain- or subject-based representation of
these documents and the relations that can exist between
topics would help to organize different types of documents
and facilitate a more effective search. Ontologies “would
allow us to represent the knowledge of the real world
through related entities and objects” (Gruber 1993).
Through ontologies we will be able to identify the concepts
or topics that the students use at the time of searching for a
document.
Building an ontology allow us define a common
vocabulary that can be used to organize the documents in a
repository. The ontology representation offers advantages
like information sharing between people and software
agents, and knowledge reusability
Ontology Representation
In a first phase the types of documents that the students use
were identified, and were organized in a taxonomic
hierarchy. Each class is characterized by properties that are
shared/inherited by all the elements in that class (Lassila et
al. 2000).
Figure 1 shows such taxonomy where the class
document represents the root of our hierarchy and
characterizes documents in the most abstract form.. The
other classes such as assignment, report and notes
represent subclasses of the class document.
Properties of a Document
In this taxonomy some properties shared by all types of
documents that help to describe them were identified.
Properties like title, description and url are used in each
level of the document hierarchy.
Also we have the property hasConcept that allows
relating a document to different concepts in the domain
ontology. The concepts are used to add domain specific
descriptions to documents and in this way enable the
search for one o more documents related to a concept.
Figure 2 shows the relation of the properties of a
document.
figure 2 Properties of a Document
Representation of the Ontology in RDF
The Resource Description Framework (RDF) and RDF
Schema (RDFS) were developed by the W3C to allow the
semantic representation of information resources based on
XML, in a standardized, interoperable manner (Gomez-
Perez 2002). RDFS (Brickley and Guha 1999), is a model
formed by three elements: “resource”, “property” and
“value”, which allow expressing metadata for the
information resources.
The focus in this stage of the investigation was on how
to represent the ontology using these mark-up languages,
and how to enable the reusability and sharing of this
figure 1 Representation of Document Ontology
desirable. It is therefore important to develop ontologies
and tools that allow managing these documents so that they
are stored and retrieved in the right context.
This stated necessity generates a justifiable motivation
for the development of tools to assist students and
professors in storing and retrieving information documents
that result from their teaching-learning process and thus
generate knowledge-centered collaboration beyond the
confines of the classroom.
The Document Ontology
The students consult and generate different sources of
information and knowledge in their learning activities such
as elaborating their homework assignments and studying
for their tests. At present many sources of information are
already in the internet or in data bases in digital format,
which enables their search and use.
A suitable domain- or subject-based representation of
these documents and the relations that can exist between
topics would help to organize different types of documents
and facilitate a more effective search. Ontologies “would
allow us to represent the knowledge of the real world
through related entities and objects” (Gruber 1993).
Through ontologies we will be able to identify the concepts
or topics that the students use at the time of searching for a
document.
Building an ontology allow us define a common
vocabulary that can be used to organize the documents in a
repository. The ontology representation offers advantages
like information sharing between people and software
agents, and knowledge reusability
Ontology Representation
In a first phase the types of documents that the students use
were identified, and were organized in a taxonomic
hierarchy. Each class is characterized by properties that are
shared/inherited by all the elements in that class (Lassila et
al. 2000).
Figure 1 shows such taxonomy where the class
document represents the root of our hierarchy and
characterizes documents in the most abstract form.. The
other classes such as assignment, report and notes
represent subclasses of the class document.
Properties of a Document
In this taxonomy some properties shared by all types of
documents that help to describe them were identified.
Properties like title, description and url are used in each
level of the document hierarchy.
Also we have the property hasConcept that allows
relating a document to different concepts in the domain
ontology. The concepts are used to add domain specific
descriptions to documents and in this way enable the
search for one o more documents related to a concept.
Figure 2 shows the relation of the properties of a
document.
figure 2 Properties of a Document
Representation of the Ontology in RDF
The Resource Description Framework (RDF) and RDF
Schema (RDFS) were developed by the W3C to allow the
semantic representation of information resources based on
XML, in a standardized, interoperable manner (Gomez-
Perez 2002). RDFS (Brickley and Guha 1999), is a model
formed by three elements: “resource”, “property” and
“value”, which allow expressing metadata for the
information resources.
The focus in this stage of the investigation was on how
to represent the ontology using these mark-up languages,
and how to enable the reusability and sharing of this
figure 1 Representation of Document Ontology
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
4 Readers on Mendeley
by Discipline
by Academic Status
75% Ph.D. Student
25% Student (Postgraduate)
by Country
50% France
25% Germany


