MPEG7ADB: Automatic RDF annotation of audio files from low level MPEG-7 metadata
Available from citeseerx.ist.psu.edu
Page 1
MPEG7ADB: Automatic RDF annotation of audio files from low level MPEG-7 metadata
MPEG7ADB: Automatic RDF annotation of audio files
from low level low level MPEG-7 metadata
Giovanni Tummarello, Christian Morbidoni, Francesco Piazza, Paolo Puliti
DEIT - Università Politecnica delle Marche, Ancona (ITALY)
Abstract. MPEG-7, a ISO standard since 2001, has been created recognizing
the need for standardization within multimedia metadata. While efforts have
been made to link the higher level semantic content to the languages of the se-
mantic web, a big semantic gap remains between the machine extractable meta-
data (Low Level Descriptors) and meaningful, concise RDF annotations. In this
paper we address this problem and present MPEG7ADB, a computational intel-
ligence/signal processing based toolkit that can be used to quickly create com-
ponents capable of producing automatic RDF annotations from MPEG-7 meta-
data coming from heterogeneous sources.
1 Introduction
While MPEG-7 and the tools of the Semantic Web (Notably RDF/S) were developed
concurrently, the two efforts have been largely independent resulting in several inte-
gration challenges . At data model level, MPEG-7 is directly based on XML+Schema
while the tools of Semantic Web use these just as an optional syntax format while
conceptually relying on graph structures. At the semantic description level, it is thanks
to a later effort [8][24] that RDF/DAML+OIL mappings have been made to allow in-
teroperability. While such mappings are possible, their scope (semantic scene descrip-
tion) is currently beyond anything that can be machine automated. Previous works
have also shown [4] that pure XML tools are very ineffective for handling MPEG-7
data. Although the syntax is well specified by the standard, generalized MPEG-7 us-
ability is not simple. In fact, while it is relatively easy to create syntactically compliant
MPEG-7 annotations, the freedom in terms of structures and parameters is such that
generally, understanding MPEG-7 produced by others is difficult or worse. For the
same reason, computational intelligence techniques, which are bound to play a key
role in the applications envisioned for the standard, are not easy to apply directly. As
MPEG-7 descriptions of identical objects could in fact be very different from each
other when coming from different sources. Recognizing the intrinsic difficulty of full
interoperability, work is currently under way [3] to standardize subsets of the base fea-
tures as “profiles” for specific applications, generally trading off generality and ex-
pressivity in favor of the ease and lightness of the implementation. Necessarily, this
also means to give up on interesting scenarios. In this paper we address the hard prob-
lem of “semantic mismatch”, that is, techniques to “distill” concise RDF annotations
from raw, low level, MPEG-7 metadata. These techniques are implemented in a set of
111
from low level low level MPEG-7 metadata
Giovanni Tummarello, Christian Morbidoni, Francesco Piazza, Paolo Puliti
DEIT - Università Politecnica delle Marche, Ancona (ITALY)
Abstract. MPEG-7, a ISO standard since 2001, has been created recognizing
the need for standardization within multimedia metadata. While efforts have
been made to link the higher level semantic content to the languages of the se-
mantic web, a big semantic gap remains between the machine extractable meta-
data (Low Level Descriptors) and meaningful, concise RDF annotations. In this
paper we address this problem and present MPEG7ADB, a computational intel-
ligence/signal processing based toolkit that can be used to quickly create com-
ponents capable of producing automatic RDF annotations from MPEG-7 meta-
data coming from heterogeneous sources.
1 Introduction
While MPEG-7 and the tools of the Semantic Web (Notably RDF/S) were developed
concurrently, the two efforts have been largely independent resulting in several inte-
gration challenges . At data model level, MPEG-7 is directly based on XML+Schema
while the tools of Semantic Web use these just as an optional syntax format while
conceptually relying on graph structures. At the semantic description level, it is thanks
to a later effort [8][24] that RDF/DAML+OIL mappings have been made to allow in-
teroperability. While such mappings are possible, their scope (semantic scene descrip-
tion) is currently beyond anything that can be machine automated. Previous works
have also shown [4] that pure XML tools are very ineffective for handling MPEG-7
data. Although the syntax is well specified by the standard, generalized MPEG-7 us-
ability is not simple. In fact, while it is relatively easy to create syntactically compliant
MPEG-7 annotations, the freedom in terms of structures and parameters is such that
generally, understanding MPEG-7 produced by others is difficult or worse. For the
same reason, computational intelligence techniques, which are bound to play a key
role in the applications envisioned for the standard, are not easy to apply directly. As
MPEG-7 descriptions of identical objects could in fact be very different from each
other when coming from different sources. Recognizing the intrinsic difficulty of full
interoperability, work is currently under way [3] to standardize subsets of the base fea-
tures as “profiles” for specific applications, generally trading off generality and ex-
pressivity in favor of the ease and lightness of the implementation. Necessarily, this
also means to give up on interesting scenarios. In this paper we address the hard prob-
lem of “semantic mismatch”, that is, techniques to “distill” concise RDF annotations
from raw, low level, MPEG-7 metadata. These techniques are implemented in a set of
111
Page 2
tools (MPEG7ADB) by which it is possible to simply build powerful RDF audio au-
tomatic annotation components feeding on MPEG-7 low level descriptors (LLDs).
2 The MPEG7ADB
Sem antic A sse rtions
(A Ma tchs B , A is o f type C )
UR I F ilter
(Loca l to g loba l UR Is )
Mpeg7 P ro jection
(fig . 2 )
Mpeg7 ACT
DB
(fig . 1 )
RDF Annotatio n
Mp
eg
7 A
CT
E x
tr a
c ti
on
Compu tational in te lligence
infe rence
(C luste ring , Ma tch ing, C lassification ..)
RDF Anno tation
W riter
URI
Onto log ies
Figure 1. The overall structure of the proposed architecture.
The simplified representation of the proposed architecture (as currently implemented
by the MPEG7ADB project [7]) is depicted in Figure 1. URIs are both used as refer-
ences to the audio files and become the subjects of the annotations produced in stan-
dard RDF/OWL format.
When the database component is given the URI of a new audio clip to index, it will
first try to locate an appropriate MPEG-7 resource describing it. At this logical point
it is possible to envision several alternative models of metadata research including
calls to Web Services, queries on distributed P2P systems or lookup in a local storage
or cache. If this preliminary search fails to locate the MPEG-7 file, a similar mecha-
nism will attempt to fetch the actual audio file if the URI turns out to be a resolvable
URL and process it with the included, co-developed MPEG7ENC library[6].
Once a schema valid MPEG-7 has been retrieved, the basic raw sequences of data be-
longing to Low Level Descriptors are mapped into flat, array structures. These will not
only serve as a convenient and compact container, but also provide abstraction from
some of the basic free parameters allowed by MPEG-7. As an example, the MPEG7
ACT type provides the basic time interpolation/integration capabilities to handle the
cases when LLDs have different sampling periods and different grouping operators ap-
plied.
To exploit the benefits of computational intelligence (e.g. neural networks) and per-
form clustering, matching, comparisons and classifications, each MPEG-7 resource
will have to be projected to a single, fixed dimension vector in a consistent and mathe-
matically justified way. The projection blocks performs this task, best understood as
driven by a “feature space request”. A “feature space” deemed suitable for the desired
computational intelligence task will be composed of pairs, one per dimension, of fea-
ture names and functions capable of projecting a series of scalars or vectors into a sin-
gle scalar value. Among these, the framework provides a full set of classical statistical
operators (mean, variance, higher data moments, median, percentiles etc.. ) that can be
112
tomatic annotation components feeding on MPEG-7 low level descriptors (LLDs).
2 The MPEG7ADB
Sem antic A sse rtions
(A Ma tchs B , A is o f type C )
UR I F ilter
(Loca l to g loba l UR Is )
Mpeg7 P ro jection
(fig . 2 )
Mpeg7 ACT
DB
(fig . 1 )
RDF Annotatio n
Mp
eg
7 A
CT
E x
tr a
c ti
on
Compu tational in te lligence
infe rence
(C luste ring , Ma tch ing, C lassification ..)
RDF Anno tation
W riter
URI
Onto log ies
Figure 1. The overall structure of the proposed architecture.
The simplified representation of the proposed architecture (as currently implemented
by the MPEG7ADB project [7]) is depicted in Figure 1. URIs are both used as refer-
ences to the audio files and become the subjects of the annotations produced in stan-
dard RDF/OWL format.
When the database component is given the URI of a new audio clip to index, it will
first try to locate an appropriate MPEG-7 resource describing it. At this logical point
it is possible to envision several alternative models of metadata research including
calls to Web Services, queries on distributed P2P systems or lookup in a local storage
or cache. If this preliminary search fails to locate the MPEG-7 file, a similar mecha-
nism will attempt to fetch the actual audio file if the URI turns out to be a resolvable
URL and process it with the included, co-developed MPEG7ENC library[6].
Once a schema valid MPEG-7 has been retrieved, the basic raw sequences of data be-
longing to Low Level Descriptors are mapped into flat, array structures. These will not
only serve as a convenient and compact container, but also provide abstraction from
some of the basic free parameters allowed by MPEG-7. As an example, the MPEG7
ACT type provides the basic time interpolation/integration capabilities to handle the
cases when LLDs have different sampling periods and different grouping operators ap-
plied.
To exploit the benefits of computational intelligence (e.g. neural networks) and per-
form clustering, matching, comparisons and classifications, each MPEG-7 resource
will have to be projected to a single, fixed dimension vector in a consistent and mathe-
matically justified way. The projection blocks performs this task, best understood as
driven by a “feature space request”. A “feature space” deemed suitable for the desired
computational intelligence task will be composed of pairs, one per dimension, of fea-
ture names and functions capable of projecting a series of scalars or vectors into a sin-
gle scalar value. Among these, the framework provides a full set of classical statistical
operators (mean, variance, higher data moments, median, percentiles etc.. ) that can be
112
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
2 Readers on Mendeley
by Discipline
by Academic Status
50% Post Doc
50% Ph.D. Student
by Country
50% Germany


