Sign up & Download
Sign in

Towards Knowledge Discovery in the Semantic Web

by Thomas Fischer, Johannes Ruhland
Multikonferenz Wirtschaftsinformatik 2010 Göttingen 2325 Februar 2010 Kurzfassungen der Beitrage (2010)

Cite this document (BETA)

Available from books.google.com
Page 1
hidden

Towards Knowledge Discovery in the Semantic Web

MKWI 2010 – Business Intelligence

1151
Towards Knowledge Discovery in the Semantic Web
Thomas Fischer, Johannes Ruhland
Department of Information Systems,
Friedrich Schiller University Jena
1 Introduction
In the past, data mining and machine learning research has developed various
techniques to learn on data and to extract patterns from data to support decision
makers in various tasks, such as customer profiling, targeted marketing, store lay-
out, and fraud detection (Tan et al., 2005, p.1). In addition, the World Wide Web
increasingly offers distributed information that can be useful for strategic, tactical
or operational decisions, including news, events, financial information, information
about competitors as well as information about the social networks of customers
and employees etc. The Web thus has the potential for a high impact on competi-
tive actions and competitive dynamics of enterprises that should utilize this infor-
mation. However, the growing amount of these distributed information resources
leads to a dilemma:”... the more distributed and independently managed that resources on the
Web become, the greater is their potential value, but the harder it is to extract value…” (Singh
and Huhns, 2005, p.7). On the one hand the human ability for information proc-
essing is limited (Edelmann, 2000, p.168), whilst otherwise the amount of available
information of the Web increases exponentially, which leads to increasing informa-
tion saturation (Krcmar, 2004, p.52). In this context, it becomes more and more
important to detect useful patterns in the Web, thus use it as a rich source for data
mining (Berendt et al., 2002; Han and Kamber, 2006, p.628) in addition to com-
pany internal databases.
The extraction of information and interesting patterns out of the Web is a
complex task, because the current Web is mainly utilized for human consumption.
This means that the available information is represented by mark-up languages
such as XHTML1 and its predecessors that describe only a visual presentation.
Unfortunately, these languages are not sufficient to let software agents ”under-

1 http://www.w3.org/TR/xhtml1/
Page 2
hidden
Thomas Fischer, Johannes Ruhland

1152
stand” the information they are processing. For instance, the character string
”Jena” does neither reflect to a machine that this is the name of a city2, nor does it
reflect that this is also the title of a famous semantic web framework3. Due to this
ambiguity, the discovery of useful patterns in such unstructured information is very
difficult and has been addressed by research on web mining (Stumme et al., 2006).
However, there have been also increasing efforts in the research community to
realize the vision of the so called Semantic Web: ”The Semantic Web is not a separate
Web but an extension of the current one, in which information is given well-defined meaning,
better enabling computers and people to work in cooperation” (Berners-Lee et al., 2001). It
seems therefore to be valuable to perform data mining on information with a well-
defined meaning to improve the knowledge discovery process.
The utilization of data mining on semantic web information for business intel-
ligence has got not much attendance in the research community in comparison to
the overall research investments in this field. Furthermore, there are a lot of open
topics that have to be addressed. In this paper we motivate this field of research by
a scenario to outline the differences of the knowledge discovery process as well as
to deduce requirements.
The remainder of this paper is organized as follows. Section 2 outlines the re-
search context. Section 3 describes a scenario for relational association rules, which
serves as a basis for an overview about the requirements for knowledge discovery
in the semantic web. Finally, Section 4 concludes this paper.
2 Qualifying the Research Context
2.1 Semantic Web
The Semantic Web (Berners-Lee et al., 2001) focuses on the extension of the cur-
rent Web by machine readable and ”understandable” meta information. The vo-
cabulary of these statements is typically derived from one or more ontologies,
which are a shared conceptualization of the domain of discourse (Gruber, 1993).
The semantic description (meaningful to a machine) of Web data has been driven
by the research community through the creation of different standards, for in-
stance, the Resource Description Framework (RDF) (Klyne et al., 2004), the Re-
source Description Framework Schema RDF(S) (Brickley et al., 2004) and the Web
Ontology Language (OWL) (Smith et al., 2004).4 These approaches provide a for-
mal way to specify shared vocabularies that can be used in statements about re-
sources. Furthermore, they utilize a syntax based on the Extensible Markup Lan-

2 http://www.jena.de/
3 http://jena.sourceforge.net/
4 There are also other ontology languages, but the mentioned standards are widely accepted in the
research literature. The proposed approach is independent from the ontology language as long as the
language is based on description or first-order-logics.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

2 Readers on Mendeley
by Discipline
 
by Academic Status
 
50% Researcher (at an Academic Institution)
 
50% Researcher (at a non-Academic Institution)
by Country
 
50% Vietnam
 
50% Germany