Mining Multi-label Concept-Drifting Data Streams Using Dynamic Classifier Ensemble
Abstract
The problem of mining single-label data streams has been extensively studied in recent years. However, not enough attention has been paid to the problem of mining multi-label data streams. In this paper, we propose an improved binary relevance method to take advantage of dependence information among class labels, and propose a dynamic classifier ensemble approach for classifying multi-label concept-drifting data streams. The weighted majority voting strategy is used in our classification algorithm. Our empirical study on both synthetic data set and real-life data set shows that the proposed dynamic classifier ensemble with improved binary relevance approach outperforms dynamic classifier ensemble with binary relevance algorithm, and static classifier ensemble with binary relevance algorithm.
Mining Multi-label Concept-Drifting Data Streams Using Dynamic Classifier Ensemble
Streams Using Dynamic Classifier Ensemble
Wei Qu, Yang Zhang, Junping Zhu, and Qiang Qiu
College of Information Engineering, Northwest A&F University
Yangling, Shaanxi Province, P.R. China, 712100
{lex,zhangyang,junpinzhu,qiuqiang}@nwsuaf.edu.cn
Abstract. The problem of mining single-label data streams has been
extensively studied in recent years. However, not enough attention has
been paid to the problem of mining multi-label data streams. In this pa-
per, we propose an improved binary relevance method to take advantage
of dependence information among class labels, and propose a dynamic
classifier ensemble approach for classifying multi-label concept-drifting
data streams. The weighted majority voting strategy is used in our clas-
sification algorithm. Our empirical study on both synthetic data set and
real-life data set shows that the proposed dynamic classifier ensemble
with improved binary relevance approach outperforms dynamic classifier
ensemble with binary relevance algorithm, and static classifier ensemble
with binary relevance algorithm.
Keywords:Multi-label; Data Stream; Concept Drift; Binary Relevance;
Dynamic Classifier Ensemble.
1 Introduction
Modem organization generate tremendous amount of data by real-time produc-
tion systems at unprecedented rates, which is known as data streams [1]. Other
than the data volume, the underlying processes generating the data changes
during the time, sometimes radically [2]. These changes can induce more or less
changes in target concept, and is known as concept drift [3].
For the traditional classification tasks, each example is assigned to a single
label l from a set of disjoint class labels L [5], while for multi-label classification
tasks, each example can be assigned to a set of class labels Y ⊆ L. The problem
of mining single-label data streams has been extensively studied in recent years.
However, not enough attention has been paid to the problem of mining multi-
label data streams, and such data streams are common in the real world. For
example, a user is interested in the articles of certain topics over time, which
may belong to more than one text category. Similarly, in medical diagnosis, a
patient may suffer from multiple illnesses at the same time. During the medical
treatments at different periods, some of them are cured while others still remain.
This work is supported by Young Cadreman Supporting Program of Northwest A&F
University (01140301). Corresponding author: Zhang Yang.
Z.-H. Zhou and T. Washio (Eds.): ACML 2009, LNAI 5828, pp. 308–321, 2009.
c© Springer-Verlag Berlin Heidelberg 2009
In this paper, an dynamic classifier ensemble approach is proposed to tackle
multi-label data streams with concept drifting. We partition the incoming data
stream into chunks, and then we learn an improved binary relevance classifier
from each chunk. For classifying a test example, the weight of each base classifier
in the ensemble is set by the performance of the classifier on the neighbors of the
test example, which are found in most up-to-date chunk by k-nearest neighbor
algorithm. Our experiment on both synthetic data set and real-life data set
shows that the proposed approach has better performance than the comparing
algorithms.
The rest of the paper is organized as follows. Section 2 reviews the related
works. Section 3 presents our improved binary relevance methods. Section 4 out-
lines our algorithm for mining multi-label concept-drifting data streams. Section
5 gives our experiment result, and section 6 concludes this paper and gives our
future work.
2 Related Works
To the best of our knowledge, the only work on multi-label data stream classifica-
tion is our previous work [4], while a lot of works on classification of multi-label
data sets and classification of single-label data streams could be found in the
literature.
Multi-label Classification. Multi-label classification methods could be cat-
egorized into two groups [5]: problem transformation methods, and algorithm
adaptation methods. The first group of methods transforms the original multi-
label classification problems into one or more binary classification problems [6].
The second group of methods extends traditional algorithms to cope with multi-
label classification problems directly [6], including decision trees [7], boosting
[8], probabilistic methods [9], neural networks and support vector machines
[10,11,12,13], lazy methods [14,15] and associative methods [16]. Compared with
problem transformation methods, the algorithm adaptation methods is always
time-consuming, which makes it not suitable for mining multi-label data streams,
as data streams are always characterized by large volume of data.
Data Stream Classification. Incremental or online data stream approaches,
such as CVFDT [2], are always time and space consuming. Ensemble learning
is among the most popular and effective approaches to handle concept drift, in
which a set of concept descriptions built over different time intervals is main-
tained, predictions of which are combined using a form of voting, or the most rel-
evant description is selected [17]. Many ensemble based classification approaches
have been proposed by the research community [1,18,19,20,21,22]. Two initial
papers on classifying data streams by ensemble methods combine the results
of base classifiers by static majority voting [18] and static weighted voting [1].
And later, some dynamic methods [19,20,21,22] were proposed. In dynamic in-
tegration, each base classifier receive a weight proportional to its local accuracy
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


