On classification of high-cardinality data streams

Charu C. Aggarwal; Philip S. Yu

Conference Proceedings

On classification of high-cardinality data streams

Proceedings of the 10th SIAM International Conference on Data Mining, SDM 2010 (2010) 802-813

DOI: 10.1137/1.9781611972801.70

28Citations

18Readers

Get full text

Abstract

The problem of massive-domain stream classification is one in which each attribute can take on one of a large number of possible values. Such streams often arise in applications such as IP monitoring, super-store transactions and financial data. In such cases, traditional models for stream classification cannot be used because the size of the storage required for intermediate storage of model statistics can increase rapidly with domain size. Furthermore, the one-pass constraint for data stream computation makes the problem even more challenging. For such cases, there are no known methods for data stream classification. In this paper, we propose the use of massive-domain counting methods for effective modeling and classification. We show that such an approach can yield accurate solutions while retaining space-and time-efficiency. We show the effectiveness and efficiency of the sketch-based approach. Copyright © by SIAM.

Cite

CITATION STYLE

APA

Aggarwal, C. C., & Yu, P. S. (2010). On classification of high-cardinality data streams. In Proceedings of the 10th SIAM International Conference on Data Mining, SDM 2010 (pp. 802–813). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611972801.70

On classification of high-cardinality data streams

Abstract

Cite

Register to see more suggestions