Mining high-speed data streams

Pedro Domingos; Geoff Hulten

Conference Proceedings

Mining high-speed data streams

Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000) 71-80

DOI: 10.1145/347090.347107

1.8kCitations

865Readers

Get full text

Abstract

Many organizations today have more than very large data-bases; they have databases that grow without limit at a rate of several million records per day. Mining these continuous data streams brings unique opportunities, but also new challenges. This paper describes and evaluates VFDT, an anytime system that builds decision trees using constant memory and constant time per example. VFDT can incorporate tens of thousands of examples per second using off-the-shelf hardware. It uses Hoeffding bounds to guarantee that its output is asymptotically nearly identical to that of a conventional learner. We study VFDT's properties and demonstrate its utility through an extensive set of experiments on synthetic data. We apply VFDT to mining the continuous stream of Web access data from the whole University of Washington main campus.

Author supplied keywords

Cite

CITATION STYLE

APA

Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 71–80). Association for Computing Machinery (ACM). https://doi.org/10.1145/347090.347107

Mining high-speed data streams

Abstract

Author supplied keywords

Cite

Register to see more suggestions