Semantic Enrichment of Strategic Datacubes
- ISBN: 9781605582504
- DOI: 10.1145/1458432.1458447
Abstract
In the information system view, the reference architecture for strategic and decision support is based on the Data Warehouse architecture, that enables flexible and multidimensional analysis of strategic indexes by means of OLAP tools and reports. In this paper we propose a novel model for semantic annotation of Data Warehouse schema that takes into account domain ontologies as well as a mathematical ontology. Such an ontology describes mathematical formulas underlying elements of the datacube schema, including the semantics of operands and operators. In particular, we discuss and apply the proposed model for the semantic annotation of the schema of a datacube, that is the basis for OLAP analysis and contains information derived from Data Warehouse schema. In the paper, an illustrative case study together with some examples of analysis based on this kind of annotation are provided.
Author-supplied keywords
Semantic Enrichment of Strategic Datacubes
Claudia Diamantini
Dipartimento di Ingegneria Informatica,
Gestionale e dell’Automazione
Università Politecnica delle Marche
via Brecce Bianche, 60131, Ancona, Italy
diamantini@diiga.univpm.it
Domenico Potena
Dipartimento di Ingegneria Informatica,
Gestionale e dell’Automazione
Università Politecnica delle Marche
via Brecce Bianche, 60131, Ancona, Italy
potena@diiga.univpm.it
ABSTRACT
In the information system view, the reference architecture
for strategic and decision support is based on the DataWare-
house architecture, that enables flexible and multidimen-
sional analysis of strategic indexes by means of OLAP tools
and reports. In this paper we propose a novel model for
semantic annotation of Data Warehouse schema that takes
into account domain ontologies as well as a mathematical
ontology. Such an ontology describes mathematical formu-
las underlying elements of the datacube schema, including
the semantics of operands and operators. In particular, we
discuss and apply the proposed model for the semantic an-
notation of the schema of a datacube, that is the basis for
OLAP analysis and contains information derived from Data
Warehouse schema. In the paper, an illustrative case study
together with some examples of analysis based on this kind
of annotation are provided.
Categories and Subject Descriptors
H.4.2 [Information Systems Applications]: Types of
Systems—Decision support
General Terms
Design, Theory
Keywords
Datacube, Data Warehouse, Mathematical Ontology, Se-
mantic Annotation, Strategic Index
1. INTRODUCTION
In any organization, strategic planning is one of the most
critical activities that top management has to deal with. As
stated in [12, p.37], strategic planning is a complex process
of “dynamic, continuous activities of self-analysis” oriented
to the definition of a strategy. As a matter of fact, at the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
DOLAP’08, October 30, 2008, Napa Valley, California, USA.
Copyright 2008 ACM 978-1-60558-250-4/08/10 ...$5.00.
strategic and decision levels, an organization develops com-
plex planning and control cycles, where a model of the orga-
nization is compared against a “to-be” state, being it either
the realization of a given vision and mission, a reference best
practice, or a periodic budget. The models considered at this
level by strategy experts and top management are defined
by a set of high-level measurable performance indexes, that
are analyzed along a number of different dimensions.
Data Warehousing (DW) has been introduced as a tech-
nology to make managers’ work simpler are more effective,
enabling flexible analysis of performance indexes by means
of OLAP tools over multidimensional datacubes [3]. Indeed,
a full exploitation of these tools requires a full understanding
of the exact meaning of datacube elements, both in terms
of business concepts familiar to managers and of the way
they are built. In fact, conceptual definitions can only give
reference to abstract concepts, while the actual implementa-
tion of these concepts in the datacube depends on the way
indexes are computed. Consider for instance the “Return
on Investment” (ROI) index: while the abstract, conceptual
definition of ROI (the income that an investment provides in
a year) is shared enough by managers, the economic litera-
ture reports different ways to calculate a ROI, known as ROI
trees, depending on the economical framework adopted [9,
13]. Hence, it can happen that an apparently shared concep-
tual definition hides subtle operational misunderstandings.
Besides limiting comprehension, differences in the calculus
of an index are also a source of semantic heterogeneity that
limits interoperability. As a matter of fact, although theo-
retically enterprises could/should define a standardized set
of indexes, this is not actually the case for many true enter-
prises. The reason is related to the existence of some form
of autonomy for organization units, e.g. public adminis-
trations, multiple division structures, franchising etc. This
autonomy can lead organization units to define their own
measures and hence to heterogeneous indexes definition. As
another source of heterogeneity, indexes can change in time,
due to different analysis needs, or modified external and
internal conditions like changes in enterprise rules or na-
tional/international laws. In these scenarios, only advanced
tools for making the calculus of an index explicit would al-
low managers to discover subtle discrepancies between ap-
parently similar indexes, enhancing communication, com-
prehension, and reconciliation of analyses produced by dif-
ferent units or at different times. However, at present this
information ultimately lies in IT worker’s mind and in pro-
cedures implementing the ETL process. Hence, in order to
fully support managers’ activities, information contained in
81
be semantically enriched, associating each information to an
explicit and formal description of its meaning and its deriva-
tion process. Nowadays, semantic enrichment is almost a
synonym of annotating source data with formal descriptions
of concepts in a domain ontology, and it is mainly considered
in the Semantic Web [1] scenario. From the original defini-
tion, the Semantic Web vision has pervaded the research in
database and information systems, and it is applied to data
and information structures as different from Web pages as
scientific documents, databases, business processes. In this
paper we propose a model for semantic annotation of dat-
acubes that takes into account domain ontologies as well
as a mathematical ontology, that is the formal representa-
tion of formulas and a conceptualization of the mathematical
domain. The choice to discuss the model for datacubes is
guided by two main reasons: (i) datacubes are the basis for
OLAP analysis and represent the information directly avail-
able to managers; (ii) being the datacube derived from the
DW on the basis of specific analysis requirements, in a dat-
acube schema we can find more information than in the DW
schema, e.g. derived and forecasting indexes. Hence, our ap-
proach could be easily applied to the whole DW schema as
well.
The rest of the paper is organized as follows. Section 2
discusses related works, section 3 gives a brief introduction
to the notion of semantic annotation. Section 4 provides
a discussion about the kind of information contained in a
datacube and how it can be semantically enriched. Then, in
Section 5 the semantic annotation model is proposed, and
in Section 6 an illustrative case study is provided. Finally,
Section 7 ends the paper.
2. RELATED WORKS
The work presented in this paper can be set in the re-
search area on metadata management. Metadata and meta-
data management have been recognized as an essential ele-
ment of a warehousing architecture since the beginning [3,
24]. In [3], Chauduri classifies metadata into three cate-
gories: administrative, business and operational. Admin-
istrative metadata are necessary to set and use the ware-
house. They include the description of source databases as
well as the data warehouse schema. Business metadata in-
cludes business terms, while operational metadata includes
information collected during the operation of the data ware-
house. It is now generally accepted a distinction between
low-level technical metadata and higher-level semantic (or
business) metadata. Another distinction regards descriptive
versus transformational metadata [24]. The former category
includes information related to structures (of data sources,
data warehouses, and data marts), while the latter consider
information associated with data processing.
Technical metadata management have been studied mainly
for the purpose of integration of heterogeneous sources [2,
21]. More recently, in the Model Driven Architectures frame-
work the definition of meta-models is considered. For in-
stance, the Object Management Group, facing the problem
of interoperability and integration of data warehouses, in-
troduced the Common Warehouse Model (CWM) [6] as a
way to exchange metadata via a shared metadata model.
In [8] the Behavioural Meta-model, part of the CWM stan-
dard, is used to describe business indicators in order to ab-
stract from database implementation and enhance interop-
erability between different software products. The work goes
in the direction of representing transformational metadata,
however it remains at the technical level, without consid-
ering the semantic level. Semantic metadata are intended
for business end users, who are not familiar with warehouse
description formats and need a business-oriented view on
technical metadata. Semantic metadata enables more ef-
fective analysis and increases end user’s confidence. In the
direction of linking technical and semantic metadata, [24]
proposes a unifying UML model. Similarly, [15] introduces
a weaving model between enterprise goals and DW data, in
order to enhance the way users access and interpret data
in the DW. Both models are defined at a conceptual level
by UML class diagrams. Techniques borrowed from the se-
mantic web domain, and in particular the use of domain
ontologies, have been recently applied to DW, for linking
the technical and semantic levels. In particular, in [7] these
techniques are used to represent hierarchical relationships
among datacubes. Support to datacube design is consid-
ered in [20, 26], while improvement OLAP functionalities
are presented in [11]. However, as discussed above and in
[4] strategic information has peculiar characteristics that re-
quire novel techniques and models, since traditional map-
ping of terms to ontology concepts cannot by itself express
the whole body of semantic information appearing at the
strategic level. Rather, languages and techniques should be
defined which are expressive enough to semantically describe
transformational metadata.
3. SEMANTIC ANNOTATION
Webopedia (http://www.webopedia.com) defines an anno-
tation as “A comment attached to a particular section of a
document. Many computer applications enable you to enter
annotations on text documents, spreadsheets, presentations,
and other objects. This is a particularly effective way to use
computers in a workgroup environment to edit and review
work. . . ”.
Annotations are used to describe the content of “some-
thing” and therefore can be considered as metadata. Anno-
tations may be provided under different forms, ranging from
a completely unstructured text to formal structures. Also,
they can be embedded in the annotated object or be reach-
able through links. We are especially interested in formal
semantic annotation, that is annotation expressed in a given
structured language which have also a formal semantics. In
fact, the more formal the semantics of the language is, the
more the machine-readability of the annotation increases.
We distinguish an annotation language to describe the
schema of the annotation and an annotation content express-
ing the semantic information carried by the annotation itself.
As a matter of fact, an annotation could simply be a link to
a Web resource, and in this case the meaning of the link is
evident. However one like to describe more characteristics
of the annotation (e.g. the author of the annotation or the
date when the annotation has been written). In this case a
language to organize such characteristics is needed, which is
also shared by the annotation provider and the annotation
consumer. This is typically done by exploiting XML-based
languages. In Section 6 an example of such a language is
given.
In addition to the annotation definition language used, a
common understanding of the annotation content is needed.
Typically such common understanding is given by referring
82
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


