Sign up & Download
Sign in

Context metadata generation

by Lars Fredrik Høimyr Edvardsen
Context (2006)

Abstract

The complexity of locating the right object increases as the number of available objects raises. Metadata is used to describe objects, giving precise textual elements to base queries upon. This requires that the objects are correctly described, and that the metadata generated in accordance with the defined metadata standard. IEEE LOM descriptions can be complex and time consuming to produce. There is a potential for increased usability and practical use of such metadata standards for professional and non-professionals by actively using automatic and semi-automatic metadata generating processes to produce and guide the user through the metadata generation process. Research is needed to determine what metadata can be produced automatic and semi-automatic, and how and which of these potential the users desire. Practical experiences will be gained thru implementation and user tests using such a system.

Cite this document (BETA)

Available from www.idi.ntnu.no
Page 1
hidden

Context metadata generation

Context metadata registration v7.5 Page 1 of 27 Last saved 28 February 2006, 06:20


Context metadata generation
Sources for automatic and semi-automatic metadata creation for LOM; multiple
source metadata extraction and merging

Lars Fredrik Høimyr Edvardsen
Ph.D. student @ IME
IDI, NTNU, Norway
Trondheim, Lars.Edvardsen@idi.ntnu.no


Changes since version 7.3: As always there are small changes here
and there, but there are some major changes. Some changes regarding
the introduction of It’s:learning at NTNU has been made in chapter
3.2. A bite has been added about the GUI in chapter 3.9, while the
whole chapter 7.4.4 explaining how the first and second prototype
application GUI will be created, has gone through major changes. As
always chapter 8 concerning future work have been rewritten.


Abstract
The complexity of locating the “right” object increases as the number of available
objects raises. Metadata is used to describe objects, giving precise textual elements to
base queries upon. This requires that the objects are correctly described, and that the
metadata generated in accordance with the defined metadata standard. IEEE LOM
descriptions can be complex and time consuming to produce. There is a potential for
increased usability and practical use of such metadata standards for professional and
non-professionals by actively using automatic and semi-automatic metadata
generating processes to produce and guide the user through the metadata generation
process. Research is needed to determine what metadata can be produced automatic
and semi-automatic, and how and which of these potential the users desire. Practical
experiences will be gained thru implementation and user tests using such a system.

1 Research on automatic metadata generation
1.1 Background
As the amount of information created and made available increases, the information
background needed for separating these, identifying and retrieving the “right” one
becomes increasingly more important in order to separate the irrelevant from the
relevant in query results. Effort is hence needed to build up an adequate information
base to build querying application upon.

Metadata is used to describe objects, giving precise textual descriptions to base
queries upon. This requires that the objects are correctly described, and that the
metadata generated in accordance with the defined metadata schema. Traditionally all
metadata have been registered manually by mainly highly trained professionals. With
an ever increasing amount of publications finding place, their capacity is reached.
There is a need for more and better tools for creating metadata which is not restricted
to use only by professionals because of their limited capacity and the fact that most
Page 2
hidden
Context metadata registration v7.5 Page 2 of 27 Last saved 28 February 2006, 06:20


objects are never viewed as targets for professional efforts. If only professional is to
stand for the metadata labeling, the current situation will be continued where only a
small and fixed amount of objects are and can be registered in accordance with the
metadata schemas. Automatic generation of metadata records is needed in order to
build a structured base to queries for these objects. To automatize the registration
process can increase the registration possibilities, reducing the registration time per
object, with a potential to reduce cost of registration per object and allowing human
resources to be relocated to tasks which cannot be performed up to required criteria
automatically, like quality control. Because less effort is needed to create records, the
knowledge barrier needed to produce records can be lowered allowing more people,
including non-professionals, to produce their own valid metadata records.

In addition if non-professionals are to be creating metadata records, they can face the
situation of having numerous assorted schemas to deal with. This is because no
schema is designed to be all-embracing. That would have resulted in an unwieldy
schema. In stead local schemas has been created, some general like DC, and other
subject specific like LOM. At present use of these schemas all require that the user
knows the schema and has had training in filling them in order to correctly use them.
This makes metadata registration in accordance with schema a profession, which is
not ideal in today’s information society.

At present research is underway to develop more ways of automatically or semi-
automatically (automatic with manual correction possibilities) for generating of
metadata, a research field started in the 1950s [Greenberg et al. 2005]. Current
projects look into look at the content of a computer file (here referred to as an object),
and extracts metadata which is stored outside of the object in a dedicated metadata
record. Here all relevant information about the object is stored in one location, making
the metadata record the only information source needed for object retrieval, without
having to physically access the object each time a query is performed. In addition use
of metadata records opens for queries for offline or restricted objects, objects which
need to be e.g. ordered. Full-text queries can still be useful, though they are not the
most accurate source to base queries upon.

Special metadata extractor applications are developed to create metadata records in
which other applications are used for the user’s actual queries, search engines. The
metadata extractor application needs to understand the file format in which the whole
or selected parts which contains metadata elements that can be collected. Metadata
harvesting relies on computer understandable files, e.g. by using metadata tagging
[Greenberg et al. 2005]. After extraction, the extracted metadata needs to be stored in
a standardized manner, allowing search engines to fully understand and take
advantage of the extracted metadata records. At present much research is on the
Dublin Core (DC) records due to their simple and universal nature and large usage
area. This needs to be extended to cover more detailed and hence more subject
specific metadata formats. In the LOMGen project only worked with the “General”
and “Classification” categories [LOMGen 2004]. In this project the path chosen is
with describing learning objects based on the complete IEEE LOM standard which is
down scalable to DC, and thereby establishing an information background to base
queries upon, not the querying service it selves.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

2 Readers on Mendeley
by Discipline
 
by Academic Status
 
50% Student (Master)
 
50% Ph.D. Student
by Country
 
50% South Africa
 
50% Slovenia