Sign up & Download
Sign in

Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams

by Albert Bifet
Knowledge and Information Systems (2010)

Abstract

This book is a significant contribution to the subject of mining time-changing data streams and addresses the design of learning algorithms for this purpose. It introduces new contributions on several different aspects of the problem, identifying research opportunities and increasing the scope for applications. It also includes an in-depth study of stream mining and a theoretical analysis of proposed methods and algorithms. The first section is concerned with the use of an adaptive sliding window algorithm (ADWIN). Since this has rigorous performance guarantees, using it in place of counters or accumulators, it offers the possibility of extending such guarantees to learning and mining algorithms not initially designed for drifting data. Testing with several methods, including Naïve Bayes, clustering, decision trees and ensemble methods, is discussed as well. The second part of the book describes a formal study of connected acyclic graphs, or trees, from the point of view of closure-based mining, presenting efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. Lastly, a general methodology to identify closed patterns in a data stream is outlined. This is applied to develop an incremental method, a sliding-window based method, and a method that mines closed trees adaptively from data streams. These are used to introduce classification methods for tree data streams.

Cite this document (BETA)

Page 46
hidden

Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams

33
3
Mining Evolving Data Streams
In order to deal with evolving data streams, the model learned from the
streaming data must be able to capture up-to-date trends and transient pat-
terns in the stream [Tsy04, WFYH03]. To do this, as we revise the model by
incorporating new examples, we must also eliminate the effects of outdated
examples representing outdated concepts. This is a nontrivial task. Also,
we propose a new experimental data stream framework for studying con-
cept drift.
3.1 Introduction
Dealing with time-changing data requires strategies for detecting and quan-
tifying change, forgetting stale examples, and for model revision. Fairly
generic strategies exist for detecting change and deciding when examples
are no longer relevant. Model revision strategies, on the other hand, are in
most cases method-specific.
Most strategies for dealing with time change contain hardwired con-
stants, or else require input parameters, concerning the expected speed or
frequency of the change; some examples are a priori definitions of sliding
window lengths, values of decay or forgetting parameters, explicit bounds
on maximum drift, etc. These choices represent preconceptions on how
fast or how often the data are going to evolve and, of course, they may
be completely wrong. Even more, no fixed choice may be right, since the
stream may experience any combination of abrupt changes, gradual ones,
and long stationary periods. More in general, an approach based on fixed
parameters will be caught in the following tradeoff: the user would like
to use values of parameters that give more accurate statistics (hence, more
precision) during periods of stability, but at the same time use the opposite
values of parameters to quickly react to changes, when they occur.
Many ad-hoc methods have been used to deal with drift, often tied to
particular algorithms. In this chapter we propose a more general approach
based on using two primitive design elements: change detectors and es-
timators. The idea is to encapsulate all the statistical calculations having
to do with detecting change and keeping updated statistics from a stream
Page 87
hidden
74 CHAPTER 4. ADAPTIVE SLIDING WINDOWS
ent values. Recall that upon receiving an unlabelled instance I = (x1 =
v1, . . . , xk = vk), the Naı¨ve Bayes predictor computes a “probability” of I
being in class c as:
Pr[C = c|I] ∼
=
k

i=1
Pr[xi = vi|C = c]
= Pr[C = c] ·
k

i=1
Pr[xi = vi ∧ C = c]
Pr[C = c]
The values Pr[xi = vj ∧ C = c] and Pr[C = c] are estimated from
the training data. Thus, the summary of the training data is simply a 3-
dimensional table that stores for each triple (xi, vj, c) a count Ni,j,c of train-
ing instances with xi = vj, together with a 1-dimensional table for the
counts of C = c. This algorithm is naturally incremental: upon receiving a
new example (or a batch of new examples), simply increment the relevant
counts. Predictions can be made at any time from the current counts.
We compare two time-change management strategies. The first one
uses a static model to make predictions. This model is rebuilt every time
that an external change detector module detects a change. We use DDM
detection method and ADWIN as change detectors. DDM method gener-
ates a warning example some time before actually declaring change; see
section 2.2.1 for the details; the examples received between the warning
and the change signal are used to rebuild the model. In ADWIN, we use the
examples currently stored in the window to rebuild the static model.
The second one is incremental: we simply create an instance Ai,j,c of
ADWIN for each count Ni,j,c, and one for each value c of C. When a labelled
example is processed, add a 1 to Ai,j,c if xi = v ∧ C = c, and a 0 otherwise,
and similarly for Nc. When the value of Pr[xi = vj ∧ C = c] is required to
make a prediction, compute it using the estimate of Ni,j,c provided by Ai,j,c.
This estimate varies automatically as Pr[xi = vj∧C = c] changes in the data.
Note that different Ai,j,c may have windows of different lengths at the
same time. This will happen when the distribution is changing at different
rates for different attributes and values, and there is no reason to sacrifice
accuracy in all of the counts Ni,j,c, only because a few of them are changing
fast. This is the intuition why this approach may give better results than
one monitoring the global error of the predictor: it has more accurate infor-
mation on at least some of the statistics that are used for the prediction.
4.4.1 Experiments on Synthetic Data
For the experiments with synthetic data we use a changing concept based
on a rotating hyperplane explained in [HSD01]. A hyperplane in d-dimen-

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

8 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
50% Ph.D. Student
 
25% Researcher (at a non-Academic Institution)
 
13% Student (Master)
by Country
 
13% United Kingdom
 
13% South Korea
 
13% Switzerland