Out of Many We are One: Measuring Item Batch with Clock-Sketch

21Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Item batch denotes a consecutive sequence of identical items that are close in time in a data stream. It is a useful data stream pattern in cache, burst detection, APT detection, \etc Basic item batch measurement tasks include membership, cardinality, time span and size. Currently, there is no algorithm tailored for item batch measurement. The greatest challenge lies in accurately estimating the time gap between two consecutive identical items. In this paper, we propose Clock-sketch, a framework that introduces the well-known CLOCK algorithm into item batch measurement. The methodology of Clock-sketch is to clean outdated information as much as possible, while guaranteeing that the information of all items visited within the time window $\mathcalT $ is preserved. We conduct experiments on three real-world datasets that feature in item batch pattern. We compare the accuracy and throughput performance of our Clock-sketch against the state-of-the-art and two naive approaches without using Clock-sketch technique. Results of item batch activeness show that Clock-sketch outperforms the state-of-the-art SWAMP in generating 50 times less false positive rate when memory is small. All source codes are open-sourced and released at Github.

Cite

CITATION STYLE

APA

Chen, P., Chen, D., Zheng, L., Li, J., & Yang, T. (2021). Out of Many We are One: Measuring Item Batch with Clock-Sketch. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 261–273). Association for Computing Machinery. https://doi.org/10.1145/3448016.3452784

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free