A Survey of Data Stream Processing Tools

Marcin Gorawski; Anna Gorawska; Krzysztof Pasterak

Book Chapter

A Survey of Data Stream Processing Tools

Gorawski M
Gorawska A
Pasterak K

Springer International Publishing, (2014), 295-303

DOI: 10.1007/978-3-319-09465-6_31

N/ACitations

23Readers

Get full text

Abstract

Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, Virtual, that combines the benefits of oversampling and active learning. Unlike traditional resampling methods which require preprocessing of the data, Virtual generates synthetic examples for the minority class during the training process, therefore it removes the need for an extra preprocessing stage. In the context of learning with Support Vector Machines, we demonstrate that Virtual outperforms competitive oversampling techniques both in terms of generalization performance and computational complexity. © 2013 Springer International Publishing.

Cite

CITATION STYLE

APA

Gorawski, M., Gorawska, A., & Pasterak, K. (2014). A Survey of Data Stream Processing Tools. In Information Sciences and Systems 2014 (pp. 295–303). Springer International Publishing. https://doi.org/10.1007/978-3-319-09465-6_31

A Survey of Data Stream Processing Tools

Abstract

Cite

Register to see more suggestions