Sign up & Download
Sign in

Automatic evaluation of metadata quality in digital repositories

by Xavier Ochoa, Erik Duval
International Journal on Digital Libraries (2009)

Abstract

Owing to the recent developments in automatic metadata generation and interoperability between digital repositories, the production of metadata is now vastly surpassing manual quality control capabilities. Abandoning quality control altogether is problematic, because low-quality metadata compromise the effectiveness of services that repositories provide to their users. To address this problem, we present a set of scalable quality metrics for metadata based on the Bruce & Hillman framework for metadata quality control. We perform three experiments to evaluate our metrics: (1) the degree of correlation between the metrics and manual quality reviews, (2) the discriminatory power between metadata sets and (3) the usefulness of the metrics as low-quality filters. Through statistical analysis, we found that several metrics, especially Text Information Content, correlate well with human evaluation and that the average of all the metrics are roughly as effective as people to flag low-quality instances. The implications of this finding are discussed. Finally, we propose possible applications of the metrics to improve tools for the administration of digital repositories.

Cite this document (BETA)

Available from www.springerlink.com
Page 1
hidden

Automatic evaluation of metadata quality in digital repositories

Towards Automatic Evaluation of Metadata
Quality in Digital Repositories
Xavier Ochoa1 and Erik Duval2
1 Information Technology Center, Escuela Superior Politcnica del Litoral,
Va Perimetral Km. 30.5, Guayaquil - Ecuador
xavier@cti.espol.edu.ec
2 Dept. Computerwetenschappen, Katholieke Universiteit Leuven,
Celestijnenlaan 200A, B-3001, Heverlee, Belgium
Erik.Duval@cs.kuleuven.be
Abstract. Thanks to recent developments on automatic generation of
metadata and interoperability between repositories, the production, man-
agement and consumption of metadata is vastly surpassing the human
capacity to review or process this information. However, we need to as-
sure that low quality metadata does not compromise the performance of
the services that the repository provides to its users. We contend there
is a need for automatic assessment of the quality of metadata in digital
repositories, so tools or users can be alerted about low quality records.
In this paper, we present several quality metrics for metadata based on
quality evaluation frameworks used for human quality review. We applied
these metrics to a sample of records from a real repository and compared
the results with the quality assessment given to the same records by a
group of human reviewers. Through correlation and regression analysis,
we found that one of the metrics, the text information content, could be
used as a predictor of the human evaluation. While these metrics are not
proposed as a definitive measurement of the complete multi-dimensional
quality of the metadata record, we present ways in which they can be
used to enhance the functionality of digital repositories.
Key words: Information Quality, Metrics, Metadata, Digital Libraries
1 Introduction
The quality of the metadata records stored in digital repositories is perceived
as an important issue for their operation [1] [2] and interoperability [3] [4]. The
main functionality of a digital repository, to provide access to resources, can be
severely affected by the quality of the metadata. For example, a learning resource
indexed with the title Lesson 1 - Course CS20”, without any description or
keywords will hardly appear in a search for materials about Introduction to Java
Programming”, even if the described resource is, indeed, a good introductory text
to Java. The resource will just be part of the repository but will never be retrieved
in relevant searches. Secondary functions which metadata in a digital repository
must fulfill are also heavily compromised by low metadata quality: the metadata
Page 2
hidden
2 Towards Automatic Evaluation of Metadata Quality in Digital Repositories
record should contain enough information for the user to obtain a good idea of
what the described resource is about without the need to directly access the
object; incorrect or out-dated information about the URI of the resource could
lead to its inaccessibility; repositories with mainly low quality records which
belong to a federation could degrade the performance of distributed search; etc.
In consequence, the usefulness of a digital repository is strongly correlated to
the quality of the metadata that describe its resources.
Due to its importance, metadata quality assurance has always been an inte-
gral part of resource cataloging [5]. Some implementations of digital repositories,
nonetheless, have taken a relaxed approach to metadata quality assurance. Most
of them relied on the assumption that metadata was created by an expert in the
field or a professional cataloguer, and as such it should have an acceptable degree
of quality. In reality, experts in a given field are not necessarily experts in meta-
data creation, and hiring professional indexers to do the cataloging of resources
is usually not feasible for most repositories. As repositories grow exponentially
(through automatic metadata generation [6] or resource decomposition [7]) and
merge (through search federation [8] or metadata harvesting [9]), quality issues
became more apparent. This lead to the translation of techniques developed to
review physical library records to assess the quality of digital metadata. Also
new techniques that take advantage of computers’ ability to perform repetitive
calculations have been proposed to assure a minimum level of quality. A review of
previous literature on metadata quality evaluation for digital repositories reveals
these two general approaches:
– Manual Quality Evaluation. The majority of approaches (see Table 1) man-
ually review a statistical significant sample of metadata records against a
predefined set of quality parameters, similarly to sampling techniques used
for quality assurance of library cataloguing [10]. The human evaluations are
averaged and an estimation of the quality of the metadata in the repository
is obtained. While until now it is the more meaningful way to measure the
metadata quality in a digital repository, this method has two main disad-
vantages. First, the manual quality estimation is only valid for the whole
repository at a given point in time. The quality of each individual meta-
data record can only be obtained for those records contained in the sample.
Also, if a considerable amount of new resources is inserted in the reposi-
tory, the assessment could be no longer accurate and the estimation must
be re-done. Second, and more important, obtaining the quality estimation
is costly. Human experts should review an each time increasing amount of
objects. Dushay and Hillman, in [11], propose the use of visualization tools
to help metadata experts in the task, but it is still mainly a manual activ-
ity. Because of this last disadvantage, manual review of metadata quality is
just a research activity with no practical implications in the functionality or
performance of the digital repository.
– Simple Statisitical Quality Evaluation. From the analyzed studies, three fol-
low a different approach (see Table 1). They collect statistical information
Page 3
hidden
Towards Automatic Evaluation of Metadata Quality 3
from all the metadata instances in the repository to obtain an estimation
of their quality. Hughes, in [12] calculates simple automatic metrics (com-
pleteness, vocabulary use, etc) at repository level for each of the repositories
in the Open Language Archive. Bui and Park [13] perform a wide study in
which more than one million records were reviewed for completeness as qual-
ity measurement. Najjar et al in [14], evaluating the actual use of different
metadata fields for the ARIADNE repository, compare the metadata fields
that are produced with the metadata fields that are consumed, providing a
simplistic estimation of the quality of the metadata in the repository. While
all of these approaches could automatically obtain a basic estimation of the
quality of each individual metadata record, without the cost involved in the
manual quality review, they do not provide a similar level of meaningful-
ness as a human generated estimation. They are mainly used as interesting”
information about the repository without any other real application.
Table 1. Review of different quality evaluation studies
Study Approach # of Records Main focus of evaluation
Greenberg et al [15] Manual 11 Quality of non-expert metadata
Shreeves et al [16] Manual 140 Overall quality of records
Stivila et al [4] Manual 150 Identify quality problems
Wilson [17] Manual 100 Quality of non-expert metadata
Moen et al [18] Manual 80 Overall quality of records
Hughes [12] Statistical 27,000 Completeness of records
Najjar et al [19] Statistical 3,700 Usage of the metadata standard
Bui and Park [13] Statistical 1,040,034 Completeness of records
An ideal measurement of metadata quality for exponentially growing repos-
itories should comply with two requirements: to be automatically calculated for
each one of the metadata records inserted in the repository and to provide a
meaningful measurement of the quality. None of the approaches reviewed could
claim to comply with these requirements. The main contribution of this work
will be the proposal and evaluation of a set of automatic-calculable metadata
metrics based on the same quality parameters used by human reviewers. This
new set of metrics can be transform into an automated metadata quality evalu-
ator that can be used to build tools for any kind of digital repository and could
provide scalable and meaningful metadata quality assurance.
The structure of this paper is the following: A review is conducted in section
2 to select an operationalizable framework to measure metadata quality. In sec-
tion 3, several quality metrics, based on the selected framework, are proposed.
An experiment is conducted in section 4 to establish the degree of correlation
between the values generated by the metrics and the quality rates generated
by human reviewers. Section 5 describes possible applications of the proposed
quality metrics. The paper finalize with conclusions and ideas for further work.
Page 4
hidden
4 Towards Automatic Evaluation of Metadata Quality in Digital Repositories
2 Measuring Metadata Quality
While there is a wide agreement on the need to have high quality metadata, there
is less consensus on what high quality metadata means and much less in how it
should be measured. This work will consider quality as the measure of fitness for
a task. According [20], the tasks that a metadata record should fulfill in a digital
repository are to help the user to find, identify, select and obtain resources. The
quality of the metadata record will be directly proportional to how much it
facilitates those tasks. Also, measurements of the quality of the metadata record
do not assess the quality of the metadata standard (this measurements should
be standard-agnostic) or the vocabularies used. Neither will they evaluate the
quality of the resources themselves. This work will provide metric to estimate
the quality of the information entered by indexers (manually, automatically or a
mixture of the two) about a digital asset and stored in a metadata record inside
a digital repository.
In order to reduce subjectivity in the assessment of information quality, sev-
eral researchers have developed quality evaluation frameworks. These frame-
works define several parameters that information should comply in order to be
considered of high quality. Different frameworks vary widely in their scope and
goals. Some have been inspired by the Total Quality Management paradigm,
such as [21]. Others, from the field of text document evaluation, especially of
Web documents, such as [22]. Particularly interesting for our work, because they
are focused on metadata quality, are the frameworks that have evolved from the
research on library catalogs, for example [23].
While no consensus has been reached on conceptual or operational definitions
of metadata quality, there are three main works that could guide this kind of
evaluation. They summarize the recommendations made in previous information
quality frameworks to eliminated redundant or overly specific quality parame-
ters. Moen et al, [18] identifies 23 quality parameters. However, some of these
parameters (ease of use, ease of creation, protocols, etc) are more focused on the
metadata standard or metadata generation tools. Gasser and Stvilia [24] uses
most of Moen’s parameters (excluding those not related with metadata quality),
adds several more, and groups them in three dimensions of Information Quality
(IQ): Intrinsic IQ, Relational/Contextual IQ and Reputational IQ. Some of the
parameters (accuracy, naturalness, precision, etc) are present in more than one
dimension. The Gasser and Stvilia framework describes 32 parameters in total.
Bruce and Hillman [25], based on previous Information Quality research, con-
densed many of the quality parameters in order to improve their applicability.
They describe seven general characteristics of metadata quality: completeness,
accuracy, provenance, conformance to expectations, logical consistency and co-
herence, timeliness, and accessibility. A relation between the frameworks of Bruce
& Hillman and Gasser & Stvilia is proposed in [16] and it is summarized in 1.
This work will use the Bruce and Hillman framework because its compact-
ness will help to easily operationalize the measurement of quality in a set of
automatically calculated metrics. Another advantage of this choice is that being
the framework deeply rooted on well-known Information Quality parameters,
Page 5
hidden
Towards Automatic Evaluation of Metadata Quality 5
Fig. 1. Mapping between the Bruce & Hillman and the Gasser Stvilia frameworks.
(Taken from [16])
there exist parallel research on how to operationalize them in metrics for quality
assurance of other type of information ([22] for example).
3 Quality Metrics for Metadata in Digital Repositories
While Bruce and Hillman [25] devised their framework to guide human reviewers,
this work will generate automated metrics to assess each one of the parameters
describe in the framework. The metrics presented are standard-agnostic and can
be used for a wide range of digital repositories as digital libraries, learning ob-
ject repositories or museum catalogs. The goal of these metrics is to be easily
implementable in real environments and to produce a quality value that corre-
lates well with the relative quality value of the metatada record, for example,
the value produced by a human reviewer using the same quality framework.
3.1 Completeness Metrics
Completeness is the degree to which the metadata record contents all the infor-
mation needed to have a comprehensive representation of the described object.
This ideal representation varies according to the application and the community
of use. While most metadata standards (as DC or LOM) define all of their fields
as optional, a working definition of the ideal representation could be considered
as the mandatory and suggested fields defined by a community of use in its
Page 6
hidden
6 Towards Automatic Evaluation of Metadata Quality in Digital Repositories
application-profiles. A first approach to assess the completeness of a metadata
record will be to count the number of fields that content a no-null value. In the
case of multi-valued fields, the field is considered complete if at least one instance
exists. Equation 1 expresses the calculation of this metric.
QComp =
N∑
i=1
P (i)
N
(1)
Where P (i) is 1 if the ith field has a non-null value, 0 otherwise. N is the
number of fields.
While straightforward, this metric does not reflect how human measure the
completeness of a record. Not all data elements are relevant for all resources.
Moreover, not all metadata elements are equally relevant to all contexts. For
example, in a digital library context a human expert may assign a higher de-
gree of completeness to a metadata record that has a value for title, but lacks
publication date than to a metadata record that includes the publication date,
but lacks title. Human experts assign a weighting factor to the presence or ab-
sence of each metadata element, representing its relative importance compared
to other fields. This weighting factor can easily be included in the calculation of
the completeness metric as shown in the Equation 2.
QWComp =
N∑
i=1
αi ∗ P (i)
N∑
i=1
αi
(2)
Where αi is the relative importance of the ith field.
The α values should represent the importance (or relevance) of the metadata
element for some context or task. This implies that each community (or even
user) could have a different set of weighting factors to calculate the weighted
completeness. For example, αi could represent the relative frequency with which
users have filled that metadata element to issue queries to the repository. Alter-
natively, αi could represent the score that the ith field obtained in an experiment
to measure the amount of time that the user expend reading each field while se-
lecting an appropriate resource.
3.2 Accuracy Metrics
The accuracy is the degree to which the metadata values are correct”, i.e. how
well they describe the object. The correctness could be a binary value, either
right” or wrong”, for objective information like file size or document format,
but, in the case of subjective information, it is a more complex spectrum with
intermediate values (e.g.: a title of a picture, or the description of the content
of a document). In general, the correctness and, thus the accuracy, could be
considered as the semantic distance between the information that a user could
extract from the metadata record and the information that the same user could
Page 7
hidden
Towards Automatic Evaluation of Metadata Quality 7
obtain from the resource itself. The shortest the distance, the higher the accuracy
of the metadata record.
While humans can asses with relative ease the accuracy of a metadata record,
computers require complex artificial intelligence algorithms to simulate the same
level of understanding. Nevertheless, there exists easy to calculate accuracy met-
rics (proposed in quality evaluations presented in [12] and [18]). These metrics
establish the number of easy-to-spot errors present in the metadata records. For
example if the link to the resource is broken, if the technical properties (size and
format) of the digital resource are wrong, the number of typographical errors
in the text fields, etc. This works propose a more complex and meaningful way
to calculate the semantic difference between the metadata record and resources
that contain textual information. A multidimensional space is constructed in
which each word present in the text of the original resource defines a dimension.
The relative frequency with which a word appears in the text is considered the
value of that text in that word-dimension. Following those definitions, a vector
is created for the text contained in the original resource and the text present
in the textual fields of the metadata record (e.g. title, description, keywords).
Finally a vector distance metric (as the cosine distance) is applied to find the
semantic distance between both texts. To account for synonyms, a variation of
the Latent Semantic Analysis (LSA) [26] can be used to semantically reduce the
dimensionality of the space before the distance calculation. In Equation 3 the
semantic distance formula is presented.
QAccu = 1−
N∑
i=1
tfresourcei ∗ tfmetadatai

N∑
i=1
tfresource2i ∗
N∑
i=1
tfmetadata2i
(3)
Where tf1resourcei and tf2metadatai, are the relative frequency of the ith
word in the text content of both the resource and the metadata. N is the total
number of different words in both texts.
3.3 Provenance Metrics
Provenance measure the reputation that a metadata record has in a community.
For example, a user may trust more metadata generated by a metadata expert
that he knows, than metadata generated automatically by a software tool. While
the automated generated metadata might be of a better quality, provenance is
more related to the subjective perception that the user has about the origin of
the metadata.
In order to be able to capture this subjective perception, information about
the interaction between users and metadata is needed. This information should
be collected from different tools and stored in a repository (or a group of
interoperable repositories). While such usage information repositories are not
widespread at time of writing, ongoing work on Attention.XML [27] and logging
repositories [28] suggest that this approach is feasible. The logging system could
Page 8
hidden
8 Towards Automatic Evaluation of Metadata Quality in Digital Repositories
include components that actively ask the user for perception on the metadata
(for example, asking the user if the metadata shown has been useful) or that infer
that approval from the actions of the user (for example, which metadata records
have lead to more downloads or actual use of objects). This explicit or implicit
information can be used to calculate an average reputation for each metadata
producer of the repository through Equation 6.
rate(resource) =



manualRate; if manual information is present
NofDownloads
NofRetrieves ∗maxrate; if times retrieved > 0
0; otherwise
(4)
Where manualRate is the human evaluation of the quality of the record.
maxrate is the maximum value that the review can have. NofRetrievals is the
number of times than a resoucre has appear in a result list and NofDownloads
is the number of times than a resource has been donwload by a user.
ProducerReputation(author) =
N∑
i=1
rate(resourcei)
N ∗maxrate
(5)
Where N is the number of metadata records created by an author.
QProv = ProducerReputation(resource.author) (6)
3.4 Conformance to Expectation Metrics
The conformance to expectations measures the degree in which the metadata
record fulfills the requirements of a given community of users for a given task.
For example, the metadata fields that are filled should be the ones used by
the users in their searches (find task). The link to the actual resource should be
reachable from the location of the user (obtain task). The amount of information
contained in the record is enough to identify and describe the resource (identify
and select tasks). As weighted completeness and simple accuracy metrics could
estimate the first two expectations, a group of metrics will be created to asses
the information content of the metadata record.
One of the main requirements of any community towards a metadata record
is that it should contain enough information to describe uniquely its referred
resource. Users would consider that the metadata is of high quality if after
reading it, they know (or they think they know) what the resource is about and
if it is interesting to them. While there is no computational algorithm that could
claim to be able to measure the grade in which the metadata describes uniquely
a resource, the amount of useful (unique) information present in the metadata
record relative to the repository where it belongs could be a good approximation.
A proposed conformance-to-expectation metric could be calculated measuring
the Information Content of the metadata fields.
Page 9
hidden
Towards Automatic Evaluation of Metadata Quality 9
QConf =
N∑
i=1
Icontent(fieldi)
N
(7)
Where N is the number of fields and Icontent(fieldi) is the estimation of
the amount of unique information contained by the field ith.
The information content is easy to calculate for categorical and free text
fields. For categorical fields, the Information Content is equal to 1 minus the
entropy of the value (the entropy is the negative log of the probability of the
value). For example, if the difficulty level of a learning object metadata record
is set to high”, where the majority of the repository is set to medium”, it will
provide more unique information about the resource and, thus, a higher score
(high quality). On the other hand, if the record’s nominal fields only content the
default values used in the repository, they will provide less unique information
resource and a lower quality score. For free text, on the other hand the calcu-
lation of the importance” of a word is directly proportional to how frequently
that word appear in the document and inversely proportional to how frequently
documents contain that word. This relation is handled by the Term Frequency-
Inverse Document Frequency [29] calculation. The frequency in which a word
appear in the document is multiplied by the negative log of the frequency in
which that word appear in all the documents in the corpora (could be consid-
ered as a weighted entropy measurement). For example, if the title field of a
record is Lecture in Java”, given that lecture” and java” are common words in
the repository, the record will have lower score (lower quality) than a record in
which the title is Introduction to Java objects and classes”, not only because
”objects” and ”classes” are less frequent in the repository, but also because the
latter title contains more words. The calculation of the Information Content for
this two metrics can be seen in Equations 8 and 9 respectively.
Icontent(categorical field) = − log(f(value)) (8)
Where f(value) is the relative frequency of value in all the elements in the
repository.
Icontent(freetext field) =
N∑
i=1
tf(wordi) ∗ log
(
1
df(wordi)
)
N
(9)
Where tf(wordi) is the term frequency of the ith word, df(wordi) is the
document frequency of the ith word.
This metric could be refined assigning a weighting factor to account for dif-
ferent importance of the fields (similar to has been done in the Weighted Com-
pleteness metric). In this way, for example, the Information Content of document
format will be less important than the Information Content of title. These im-
portance values could be determined by an expert or be based on the preference
of the users.
Page 10
hidden
10 Towards Automatic Evaluation of Metadata Quality in Digital Repositories
3.5 Consistency Coherence Metrics
The logical consistency and coherence is the degree in which a metadata record
matches a standard definition and the values used in the fields correlate positively
among them. For the particular case of metadata records, the first part of this
metric should measure how well the metadata record adjusts to the metadata
standard. This calculation is trivial using any validation parser. The second mea-
surement is more subjective, trying to asses the internal consistent and coherence
of the information stored in the metadata record. While not being mandatory,
certain combination of values are suggested in the standards to maintain internal
consistency of the record. For example, LOM recommends that if the value of
1.7 Structure is atomic”, the 1.8 Aggregation Level should be marked as 1 (Raw
media), other Structure values could be paired with 2 (Lesson), 3 (Course) o 4
(Certificate) in the Aggregation Level field. The consistency of the record can
be estimated as 1 minus the percentage of rules that has been broken (Equation
10).
brokeRulei =
{
0; if record complies with ith rule
1; otherwise
(10)
QCons = 1−
N∑
i=1
brokeRulei
N
(11)
Where N is the number of rules in the metadata standard or community of
use.
The coherence of the record can be estimated analyzing its free text fields.
A coherent metadata record describe to the same topic in title, description and
keywords. To assess this coherence, a procedure similar to the one used in the
Accuracy metric is implemented. The semantic distance is calculated between
the different free text fields. The average distance is used as a measure of the
consistency quality (Equations 12 and 13). To cope with synonyms, a technique
like LSA should be applied before the semantic distance is calculated.
distance(field1, field2) = 1−
N∑
i=1
tffield1 ∗ tffield2

N∑
i=1
tffield12 ∗
N∑
i=1
tffield22
(12)
QCoh =
N∑
i
N∑
j
{
distance(fieldi, fieldj); if i < j
0; otherwise
N
(13)
Where N is the number of textual fields that describe the object.
Page 11
hidden
Towards Automatic Evaluation of Metadata Quality 11
3.6 Timeliness Metrics
The timeliness relates to the degree in which a metadata record remains current
among a certain community. The currency of a metadata record could be mea-
sured as how useful the metadata remains with the pass of time. For example,
if a the metadata describing a resource was created 5 years ago, and the users
could still find, correctly evaluate and download the resource, then the metadata
could be considered current. On the other hand, if the metadata record misleads
the users, because the referred resource or its location has change to the point
where the description in the metadata differed from the one contained in the
resource, the metadata registry is obsolete and must be replaced.
An automated metric to assess the currency of the object is necessarily com-
posed by two components: the rate at which the original object change and the
age of the metadata record. The rate of change can be estimated as the change in
the accuracy metric over a predefined period of change. This value is multiplied
by the number of periods passed since the creation of the metadata record. The
period can be estimated based on the perceived rate of change of the resources.
Equation 14 shows the calculation of this metric.
QCurr = (QAcct2 −QAcct1) ∗
age
t2− t1
(14)
Where t1 is the initial time where the accuracy is measured and t2 is the
time for the last measurement of accuracy.
3.7 Accessibility Metrics
Accessibility implies the level to which a metadata record can be first found and
later understood. The physical accessibility could be understood as how easy is
to find the record in the repository. While it could be possible to automatically
access this characteristic using the frequency of retrieval of the record in the
different searches, it will just represent the accessibility with the current search-
ing tool and historical queries. A more interesting (and accurate) metric should
access the potential accessibility regardless of the accessing tool. This work pro-
poses the use of the linkage of a record as its physical accessibility measure. The
linkage value of a record is equal to the number of other records that reference it
divided by the average number of links per record (Equation 15). Most metadata
standards have a field to related resources between among themselves.
QLink =
links(resource) ∗N
N∑
i=1
links(resourcei)
(15)
Where links(resource) represent the number of pointers to the metadata
record. N is the number of resources in the repository.
The cognitive accessibility measure how easy is for a user to understand the
information contained in the metadata record. In librarian review of this char-
acteristic [30] several simple metrics are used: measuring spelling errors, confor-
mance with the vocabulary, etc. But, they always include a human evaluation
Page 12
hidden
12 Towards Automatic Evaluation of Metadata Quality in Digital Repositories
of the difficulty of the text. The difficulty assessment could be automated using
one of the available readability indexes, for example the Flesch Index [31]), espe-
cially to analyze long text fields of the record (e.g. description). The readability
indexes in general count the number of words per sentence and the length of the
words to provide a value that suggest how easy is to read a text. For example,
a description where only acronyms or complex sentences are used will receive a
higher score (lower quality) than a description where normal words and simple
sentences are used. This calculation is presented in Equation 16.
QRead =
N∑
i
Flesch(fieldtexti)
100 ∗N
(16)
WhereN is the number of textual fields that describe the object, and Flesch()
is the calculation of the readability index.
4 Evaluation of the Quality Metrics
An experiment was designed to evaluate the level of correlation between the qual-
ity metrics presented above and quality assessment scores provided by human
reviewers. During the experiment, several human subjects graded the quality of a
set of records sampled from the ARIADNE Learning Object repository [32]. We
selected metadata records about objects on Information Technologies that were
available in English. From this universe (425 records), we randomly selected 10
with metadata generated manually and 10 with metadata generated by an au-
tomated indexer. Following a common practice to reduce the subjectivity in the
evaluation of the quality of metadata, we used the same evaluation framework
described by Bruce and Hillman on which the metrics are based. The experiment
was carried out online using a web application. After being trained in how to
use the quality framework, the user was presented with a list of the 20 selected
objects in no specific order. When the user selected an object, a representation
of its LOM record was displayed. The user then downloaded the referred object
for inspection. Once the user had reviewed the metadata and the object, he was
asked to provide grades in a 7-point scale (From Extremely low quality” to Ex-
tremely high quality”) for each one of the 7 parameters. Only participants that
graded all the objects were considered in the experiment. The experiment was
available for 2 weeks. During that time, 22 participants completed successfully
the review of all the 20 objects. From those 22, 17 (77%) work with metadata
as part of their study/research activities; 11 (50%) were undergraduate students
in their last years, 9 (41%) were postgraduate students and 2 (9%) had a Ph.D.
degree. All of them had a full understanding of the nature and meaning of the
examined objects and their metadata, and knew how to use the evaluation frame-
work. Parallel to the human evaluation, an implementation of the quality metrics
described before was applied to the same set of data that was presented to the
reviewers. For the weighted completeness calculation, the weights were obtained
from the frequency of use of the fields in searches to the ARIADNE repository
Page 13
hidden
Towards Automatic Evaluation of Metadata Quality 13
[14]. The lack of usage and historical information did not permit to calculate the
Provenance and the Timeliness metrics. Both were excluded from the study.
4.1 Data Analysis
Because of the inherent subjectiveness in measuring quality, the first step in the
analysis of the data was to estimate the reliability of the human evaluation. In
this kind of experiment, the evaluation could be considered reliable if the vari-
ability between the grades given by different reviewers to a record is significantly
smaller than the variability between the average grades given to different objects.
To estimate this difference we use the Intra-Class Correlation (ICC) coefficient
[33] over the average quality grade (the sum of the value given to each of the
seven quality characteristics, divided by 7). We calculated the measure of ICC
using the two-way mixed model, given that all the reviewers grade the same
sample of objects. In this configuration, the ICC is equivalent to another widely
used reliability measure, the Cronbach’s alpha. The result obtained was 0.909,
much higher than the 0.7 threshold needed to be considered acceptable. In other
words, the ICC suggests that the reviewers provided similar quality scores and
that further statistical analysis can be performed. The next step was to average
the value of all the human reviewers for each record and correlate this value with
the values obtained from the calculation of the quality metrics over the same
records. The results are presented in the Table 2.
Table 2. Correlation between the human quality evaluation and the quality metrics
QComp QWComp QConf QConf QCoh QAccu QRead
(Categorical) (Textual)
Person -.395 -.457 -.182 .842 .225 .461 .257
Sig (2t.) .085 .043 .443 .000 .593 .020 .274
The Conformance to Expectation metric based on textual information con-
tent (QConf-Textual) correlates in a high degree (0.842) with the average quality
value given by the human reviewers. The significance of that correlation is very
high (¡0.01), that means that the correlation is real and that it is not produced
by chance. The correlation is even easily visible if both scores are ploted (tex-
tual information content and the average score for quality) for the 20 examined
objects (Figure 2). The line on top is the average quality calculated from the
scores given by the human reviewers. The bottom line is the value that the
QConf-Textual metric returned. While the lines do not follow the same exact
pattern, much of the first line behavior could be explained by the second line.
This result is consistent with the findings of [22], that concludes that Informa-
tion Content of text is highly corelated to the quality of web pages as percived
by human reviewers.
Digging deeper, a multivariate regression analysis shows that roughly the 70%
of the score given by humans could be explained by the QConf-Textual metric.
Page 14
hidden
14 Towards Automatic Evaluation of Metadata Quality in Digital Repositories
Fig. 2. Comparison between the average quality score and the textual information
content metric values)
Another 10% of the variation could be explained by the origin of the metadata
(automatic or manual). Other tested metrics contribute very little (less than 5%)
to explain the human assigned quality score. The results of the regression can
be seen in Table 3.
It can be concluded from the experiment that the Textual Information Con-
tent can be used as a good predictor of the human evaluation of the metadata
record. This suggests that human reviewers mainly focus in the free text fields
when they are examining the record. This result could have implications in the
way that the metadata is presented to final users. For example: only textual
fields should be shown by default and categorical and numerical fields should
only be presented by user’s request.
The failure of the other metrics to estimate the human assigned quality score
does not imply that they are not useful. It is very difficult, even to human experts,
to assess the quality as a whole. For example, just looking at the metadata will
not provide enough information on how findable a metadata record will be. The
preferences of the community, the kind of search algorithm and other objects
present in the repository affect this parameter. More experimentation in real-
world settings is needed to obtain a more conclusive evaluation of the rest of
quality metrics.
Page 15
hidden
Towards Automatic Evaluation of Metadata Quality 15
Table 3. Multivariate regression for the average quality score
Model R R2 Adjusted R2 Std. Error
QConf-Textual .892 .710 .694 .25406
QConf-Textual+Origin of Metadata .905 .819 .768 .20623
5 Applications of Metadata Quality Metrics
The most important aspect of the metrics proposed in the paper is that they
can be automatically calculated. This aspect makes them suitable to be used
inside tools that can improve the functionality that a digital repository offers to
its users. Some example applications are:
– Automatic validation / correction of metadata. While experiments suggest
that automatic metadata generation has a similar quality level as human
generated metadata [34], the main objection against automatic generation
of metadata is how to provide it with some degree of quality assurance [35].
Metadata extraction mechanisms work most of the time, but sometimes they
produce useless records. Without quality assurance those mistakes will be
mixed with the whole repository decreasing its overall value. Reviewing man-
ually the output of an automatic generator is an unfeasible task. The meta-
data quality metrics proposed in this work could be used to implement an
automatic evaluator of metadata that can spot low quality records (for ex-
ample records that do not contain a meaningful description or which title is
not coherent with the description) before they are inserted into the reposi-
tory. On the other hand, if the automatic evaluator of metadata is run over
human generated metadata it could guide an automatic generator of meta-
data to improve the content of low quality records. For example metadata
records that lack description could be improved with an automatic summary
created by the automatic generator of metadata from the textual content of
the resource.
– Visualization of repository wide-quality. The metrics values can be used
to create visualizations of the repository in order to gain a better under-
standing of the distribution of the quality problems. For example, a treemap
visualization [36] could be used to find answers to different questions: Which
authors or sources of metadata cause quality problems? How has the quality
of the repository evolved over time? Which is the most critical problem of the
metadata in the repository?, etc. An example of such visualization is shown
on Figure 3. The treemap represent the structure of the ARIADNE repos-
itory. The global repository contains several local repositories and different
authors publish metadata in their local version of the repository. The boxes
represent the set of learning objects metadata records published by a given
author. The color of the boxes represents the average of the QConf-Textual
metric score of that set of records. The color scale goes from red/dark (low
Page 16
hidden
16 Towards Automatic Evaluation of Metadata Quality in Digital Repositories
quality) to yellow (medium quality), to green/light (high quality). This visu-
alization helps to easily spot authors that provide good textual descriptions
to their objects.
Fig. 3. Visualization of the Textual Information Content of the ARIADNE Repository
– (Automatic) Selection of repositories for federated search. If the reposito-
ries belonging to a federation publish they results for the quality metrics,
that information can be used by federated search engines to provide users
with a ranked list of repositories that could be searched or to automatically
select them based on the preferences of the user. For example, users could
chose to only query repositories where the metadata quality has an equal or
higher value than their local repository. Or depending on the task that will
be perform, the user could chose to only return objects that have a good
textual description of the object. An initial implementation of this kind of
application already has been by devised by Hughes [12] to provide a star-
ranking” for repositories of the Open Language Archive but based mostly
on completeness metrics.
Prototype of the metrics are being integrated in the ARIADNENext frame-
work [37] to provide a Quality Metrics Service. This service will cooperate with
other services present in the framework to provide new functionalities to the
repository. The output of SAmgI [34], the automatic generator of metadata ser-
vice, will use the Quality Metric Service to evaluate their results. Also, the
identification of low quality records will activate the automatic generation of
metadata to improve it. The Quality Metric Service will interact with the Con-
textualized Attention Metadata Repository [38] to obtain usage information to
Page 17
hidden
Towards Automatic Evaluation of Metadata Quality 17
be able to calculate some of its metrics. The Administration Module will make
use of visualization tools to assess the status of the repository. This visualiza-
tion tools will use the results of the Quality Metric Service to provide graphical
representation of present quality issues.
6 Conclusions
Although quality of metadata for digital repositories is a very difficult concept
to measure as a whole, if it is divided in more concrete parameters, as the ones
proposed by several quality frameworks, they can be operationalized in the form
of quality metrics. These metrics, while simple to calculate, could be effective
estimators of quality. In this work, the textual information content metric was
able to account for most of the information carried in a full quality evaluation
done by human experts.
The creation of quality metrics will allow metadata quality researchers to
not only obtain snapshots of the quality of a repository, but to constantly mon-
itor the evolution of quality and how different events affect it without the need
to run costly human-involving experiments. This could lead to the creation of
innovative applications based on metadata quality that would improve the final
user experience.
While a lot more research and experimentation in metadata quality metrics
is needed, it is clear that some type of automatic quality assurance based on
metrics is to be provided in current, exponentially growing digital repositories
to avoid the degradation of its functionality.
7 Further Work
As the title of this work suggest, this is a first step towards the automatic
evaluation of digital repositories metadata. More research is needed in order to
reach that goal:
– Non-synthetic evaluation of the metrics. The experiment proposed mainly
deal with the ability of the metadata to facilitate the identification and se-
lection of resources. The other two main task of the metadata (finding and
obtaining the resource) are not explored. Other kinds of experiments, in-
volving user attention and behavior instead of explicit quality evaluation,
are needed in order to assess the usefulness of the metrics.
– Research on basics laws of metadata creation for digital repositories. While
experimenting with a lot of different quality metrics could lead to find some
useful ones, more basic research on basic properties of metadata creation is
needed in order to guide the quality metric selection. For example, knowing
what can be considered a complete metadata record for a given community
could help to improve the completeness metric.
Page 18
hidden
18 Towards Automatic Evaluation of Metadata Quality in Digital Repositories
– Quality translation on federated / distributed environments. Because quality
is not an absolute value, but it is dependent on the community of use, there
must be some form of translation between the metrics developed to serve a
given community when that information is used inside tools created for a
different community.
enditemize
References
1. Barton, J., Currier, S., Hey, J.M.N.: Building quality assurance into metadata
creation: an analysis based on the learning objects and e-prints communities
of practice. (2004)
2. Beall, J.: Metadata and data quality problems in the digital library. JoDI:
Journal of Digital Information 6(3) (2005)
3. Liu, X., Maly, K., Zubair, M., Nelson, M.L.: Arc - an oai service provider for
digital library federation. D-Lib Magazine 7(4) (2001)
4. Stvilia, B., Gasser, L., Twidale, M.B., Shreeves, S.L., Cole, T.W.: Metadata
quality for federated collections. In Chengalur-Smith, I.N., Raschid, L., Long,
J., Seko, C., eds.: IQ, MIT (2004) 111–125
5. Thomas, S.E.: Quality in bibliographic control. Library Trends 44(3) (1996)
491–505
6. Cardinaels, K., Meire, M., Duval, E.: Automating metadata generation: the
simple indexing interface. In: WWW ’05: Proceedings of the 14th international
conference on World Wide Web, New York, NY, USA, ACM Press (2005) 548–
556
7. Verbert, K., Jovanovic, J., Gasevic, D., Duval, E.: Repurposing learning object
components. In: OTM Workshops. (2005) 1169–1178
8. Simon, B., Massart, D., van Assche, F., Ternier, S., Duval, E., Brantner, S.,
Olmedilla, D., Miklos, Z.: A simple query interface for interoperable learning
repositories. In Simon, B., Olmedilla, D., Saito, N., eds.: Proceedings of the
1st Workshop on Interoperability of Web-based Educational Systems, Chiba,
Japan, CEUR (2005) 11–18
9. Herbert, M.L.N.: Resource harvesting within the oai-pmh framework. D-Lib
Magazine (10(12))
10. Chapman, A., Massey, O.: A catalogue quality audit tool. Library and Infor-
mation Research News 26(82) (2002)
11. Dushay, N., Hillmann, D.: Analyzing metadata for effective use and re-use.
In: DCMI Metadata Conference and Workshop, Seattle, USA (2003)
12. Hughes, B.: Metadata quality evaluation: Experience from the open language
archives community. Digital Libraries: International Collaboration and Cross-
Fertilization (2004) 320–329
13. Bui, Y., ran Park., J.: An assessment of metadata quality: A case study of
the national science digital library metadata repository. In Moukdad, H., ed.:
CAIS/ACSI 2006 Information Science Revisited: Approaches to Innovation.
(2006)
14. Najjar, J., Ternier, S., Duval, E.: The actual use of metadata in ariadne: an
empirical analysis. In: Proceedings of the 3rd Annual ARIADNE Conference,
ARIADNE Foundation (2003) 1–6
Page 19
hidden
Towards Automatic Evaluation of Metadata Quality 19
15. Greenberg, J., Pattuelli, M.C., Parsia, B., Robertson, W.D.: Author-generated
dublin core metadata for web resources: A baseline study in an organization.
In: DC ’01: Proceedings of the International Conference on Dublin Core and
Metadata Applications 2001, National Institute of Informatics, Tokyo, Japan
(2001) 38–46
16. Shreeves, S.L., Knutson, E.M., Stvilia, B., Palmer, C.L., Twidale, M.B., Cole,
T.W.: Is ”quality” metadata ”shareable” metadata? the implications of lo-
cal metadata practices for federated collections. In: ACRL Twelfth National
Conference, Minneapolis, USA, ALA, ALA (2005)
17. Wilson, A.J.: Toward releasing the metadata bottleneck - a baseline evaluation
of contributor-supplied metadata. LIBRARY RESOURCES & TECHNICAL
SERVICES 51(1) (2007) 16–28
18. Moen, W.E., Stewart, E.L., McClure, C.R.: Assessing metadata quality: Find-
ings and methodological considerations from an evaluation of the u.s. gov-
ernment information locator service (gils). In: ADL ’98: Proceedings of the
Advances in Digital Libraries Conference, Washington, DC, USA, IEEE Com-
puter Society (1998) 246
19. Najjar, J., Ternier, S., Duval, E.: User behavior in learning object reposi-
tories: an empirical analysis. In: Proceedings of the ED-MEDIA 2004 World
Conference on Educational Multimedia, Hypermedia and Telecommunications,
AACE, AACE (2004) 4373–4379 URL: http://go.editlib.org/p/11705.
20. O’Neill, E.T.: Frbr: Functional requirements for bibliographic records; appli-
cation of the entity-relationship model to humphry clinker. Library Resources
& Technical Services 46(4) (2002)
21. Strong, D.M., Lee, Y.W., Wang, R.Y.: Data quality in context. Communica-
tions of the ACM 40(5) (1997) 103–110
22. Zhu, X., Gauch, S.: Incorporating quality metrics in centralized/distributed
information retrieval on the world wide web. In: Research and Development
in Information Retrieval. (2000) 288–295
23. Ede, S.: Fitness for purpose: The future evolution of bibliographic records and
their delivery. Catalogue & Index 116 (1995)
24. Gasser, L., Stvilia, B.: A new framework for information quality. Technical
report, ISRN UIUCLIS–2001/1+AMAS. (2001)
25. Bruce, T.R., Hillmann, D. In: The continuum of metadata quality: defining,
expressing, exploiting. ALA Editions, Chicago, IL (2004) 238–256
26. Landauer, T., Foltz, P., Latham, D.: Introduction to latent semantic analysis.
Discourse Processes 25 (1998) 259–284
27. Wolpers, M., Najjar, J., Verbert, K., Duval, E.: Tracking actual usage: the
attention metadata approach. International Journal Educational Technology
and Society 11 (2007) In Press.
28. Broisin, J., Vidal, P., Meire, M., Duval, E.: Bridging the gap between learning
management systems and learning object repositories: Exploiting learning con-
text information. In: AICT-SAPIR-ELETE ’05 (AICT/SAPIR/ELETE’05),
Washington, DC, USA, IEEE Computer Society (2005) 478–483
29. Aizawa, A.: An information-theoretic perspective of tfidf measures. Inf. Pro-
cess. Manage. 39(1) (2003) 45–65
30. Guy, M., Powell, A., Day, M.: Improving the quality of metadata in eprint
archives. Ariadne 38 (2004)
31. McCallum, D.R., Peterson, J.L.: Computer-based readability indexes. In:
ACM 82: Proceedings of the ACM ’82 conference, New York, NY, USA, ACM
Press (1982) 44–48
Page 20
hidden
20 Towards Automatic Evaluation of Metadata Quality in Digital Repositories
32. Duval, E., Forte, E., Cardinaels, K., Verhoeven, B., Durm, R.V., Hendrikx,
K., Forte, M.W., Ebel, N., Macowicz, M., Warkentyne, K., Haenni, F.: The
ariadne knowledge pool system. Commun. ACM 44(5) (2001) 72–78
33. Shrout, P., Fleiss, J.: Intraclass correlations: uses in assessing rater reliability.
Psychol Bull 86 (1977) 420–428
34. Meire, M., Ochoa, X., Duval, E.: Samgi: Automatic metadata generation v2.0.
In: Proceedings of the ED-MEDIA 2004 World Conference on Educational
Multimedia, Hypermedia and Telecommunications. (2007) In Press.
35. Ochoa, X., Cardinaels, K., Meire, M., Duval, E.: Frameworks for the automatic
indexation of learning management systems content into learning object repos-
itories. In Kommers, P., Richards, G., eds.: Proceedings of the ED-MEDIA
2005 World Conference on Educational Multimedia, Hypermedia and Telecom-
munications. (2005)
36. Bederson, B.B., Shneiderman, B., Wattenberg, M.: Ordered and quantum
treemaps: Making effective use of 2d space to display hierarchies. ACM Trans.
Graph. 21(4) (2002) 833–854
37. AriadneFoundation: (Ariadnenext architecture,
http://ariadne.cs.kuleuven.ac.be/mediawiki/index.php/ariadnenextarchitecture,
retrieved 2/04/2007)
38. Najjar, J., Duval, E.: Attention metadata: Collection and management. In
IEEE, ed.: WWW 2006, Edinburgh, Scotland (2006)

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

29 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
24% Student (Master)
 
24% Ph.D. Student
 
14% Professor
by Country
 
10% United States
 
7% Canada
 
7% Portugal