A selectively re-train approach based on clustering to classify concept-drifting data streams with skewed distribution

6Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Classification is an important and practical tool which uses a model built on historical data to predict class labels for new arrival data. In the last few years, there have been many interesting studies on classification in data streams. However, most such studies assume that those data streams are relatively balanced and stable. Actually, skewed data streams (e.g., few positive but lots of negatives) are very important and typical, which appear in many real world applications. Concept drifts and skewed distributions, two common properties of data streams, make the task of learning in streams particularly difficult and the traditional data mining algorithms no longer work. In this paper, we propose a method (Selectively Re-train Approach Based on Clustering) which can deal with concept-drifting and skewed distribution simultaneously. We evaluate our algorithm on both synthetic and real data sets simulating skewed data streams. Empirical results show the proposed method yields better performance than the previous work. © 2014 Springer International Publishing.

Cite

CITATION STYLE

APA

Zhang, D., Shen, H., Hui, T., Li, Y., Wu, J., & Sang, Y. (2014). A selectively re-train approach based on clustering to classify concept-drifting data streams with skewed distribution. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8444 LNAI, pp. 413–424). Springer Verlag. https://doi.org/10.1007/978-3-319-06605-9_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free