WavingSketch: An Unbiased and Generic Sketch for Finding Top-k Items in Data Streams

77Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Finding top-k items in data streams is a fundamental problem in data mining. Existing algorithms that can achieve unbiased estimation suffer from poor accuracy. In this paper, we propose a new sketch, WavingSketch, which is much more accurate than existing unbiased algorithms. WavingSketch is generic, and we show how it can be applied to four applications: finding top-k frequent items, finding top-k heavy changes, finding top-k persistent items, and finding top-k Super-Spreaders. We theoretically prove that WavingSketch can provide unbiased estimation, and then give an error bound of our algorithm. Our experimental results show that, compared with the state-of-the-art, WavingSketch has 4.50 times higher insertion speed and up to 9 x 106 times (2 x 104 times in average) lower error rate in finding frequent items when memory size is tight. For other applications, WavingSketch can also achieve up to 286 times lower error rate. All related codes are open-sourced and available at Github anonymously.

References Powered by Scopus

Space/time trade-offs in hash coding with allowable errors

5828Citations
N/AReaders
Get full text

Raptor codes

2306Citations
N/AReaders
Get full text

An improved data stream summary: The count-min sketch and its applications

1589Citations
N/AReaders
Get full text

Cited by Powered by Scopus

CocoSketch: High-performance sketch-based measurement over arbitrary partial key query

92Citations
N/AReaders
Get full text

BurstSketch: Finding Bursts in Data Streams

40Citations
N/AReaders
Get full text

An efficient approach for cross-silo federated learning to rank

31Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Li, J., Li, Z., Xu, Y., Jiang, S., Yang, T., Cui, B., … Zhang, G. (2020). WavingSketch: An Unbiased and Generic Sketch for Finding Top-k Items in Data Streams. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1574–1584). Association for Computing Machinery. https://doi.org/10.1145/3394486.3403208

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 6

86%

Professor / Associate Prof. 1

14%

Readers' Discipline

Tooltip

Computer Science 8

100%

Save time finding and organizing research with Mendeley

Sign up for free