Sign up & Download
Sign in

Grid-based data mining in real-life business scenario

by Tianchao Li, Toni Bollinger, Nikolaus Breuer, HD Wehle
2004 IEEEWICACM (2004)

Abstract

This paper presents a Grid-based distributed and parallel data mining system targeting a real-life application scenario typical in the business realm -franchise supermarket basket analysis. Following a layered design of three tiers, this system enables parallel association rule mining on a farm of Grid servers, offers a standard service interface for custom applications, and provides a friendly user portal. The work presented in this paper reveals specific requirements for applying Grid-based data mining in the business realm, which is helpful for the design and implementation of a generic Grid-based data mining system. 2004 IEEE.

Cite this document (BETA)

Available from www.scopus.com
Page 1
hidden

Grid-based data mining in real-life business scenario

Grid-based Data Mining in Real-life Business Scenario
Tianchao Li
Institut für Informatik, Technische Universität München, Germany
tianchao.li@in.tum.de
Toni Bollinger, Nikolaus Breuer, Hans-Dieter Wehle
IBM Development Laboratory Böblingen, Germany
{toni.bollinger,hdwehle,nbreuer}@de.ibm.com
Abstract
This paper presents a Grid-based distributed and
parallel data mining system targeting a real-life ap-
plication scenario typical in the business realm – fran-
chise supermarket basket analysis. Following a layered
design of three tiers, this system enables parallel asso-
ciation rule mining on a farm of Grid servers, offers a
standard service interface for custom applications, and
provides a friendly user portal. The work presented in
this paper reveals specific requirements for applying
Grid-based data mining in the business realm, which is
helpful for the design and implementation of a generic
Grid-based data mining system.
1. Introduction
Data mining, which targets the goal of retrieving
information automatically from large data sets, is one of
the most important business intelligence technologies.
Because of its high computational intensiveness and
data intensiveness, data mining serves a good field of
application for Grid technology.
The idea of data mining on the Grid is not new, but it
has become a hot research topic only recently. The
number of research efforts up to now is still quite lim-
ited (for a short summary see [1]). Many of the existing
systems, such as NASA’s Information Power Grid [2],
TeraGrid [3] and Discovery Net [4] are either utilizing
non-standard data mining techniques, or restricted to a
special domain in the scientific realm.
The implementation of a Grid-based data mining
system targeting real-life application scenarios in the
business realm will reveal the importance of specific
requirements that are not so evident in the scientific
realm, and will contribute to a generic design and im-
plementation.
The rest of the paper is organized as follows. Section
2 describes a typical application scenario in the real-
world business realm, the analysis of which reveals
some specific requirements for implementing such a
Grid-based data mining system. Section 3 describes the
system infrastructure, which follows a layered design of
three tiers – Grid tier, service tier, and client/portal tier.
Focusing on a general description of the parallel mining
task submission workflow, Section 4 summarizes the
design and implementation. Section 5 discusses future
developments and the paper concludes with section 6.
2. A Real-life Business Application Scenario
2.1. Franchise Supermarket Basket Analysis
A typical application scenario of Grid-based data
mining in real-life enterprises is presented in Figure 1,
where there is a franchise supermarket formed by
headquarter, regional branches and distributed member
stores.
Figure 1. Franchise-supermarket scenario
Each store collects transaction data by scanning bar
codes at the till when customers buy products. Most
stores rely on regional branch to store their transaction
data, while some stores have local databases. For the
Boston Store
Paris Store
New York Store Miami Store
Germany Branch
Data Center
Berlin Store
Headquarter
Chicago Store
America Branch
Data Center
Stuttgart Store Munich Store
Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’04)
0-7695-2100-2/04 $ 20.00 IEEE

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

2 Readers on Mendeley
by Discipline
 
by Academic Status
 
50% Ph.D. Student
 
50% Associate Professor
by Country
 
50% China