Data Sketching for Real Time Analytics: Theory and Practice

Daniel Ting; Jonathan Malkin; Lee Rhodes

Conference ProceedingsOPEN ACCESS

Data Sketching for Real Time Analytics: Theory and Practice

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2020) 3567-3568

DOI: 10.1145/3394486.3406480

9Citations

7Readers

Get full text

Abstract

Speed, cost, and scale. These are 3 of the biggest challenges in analyzing big data. While modern data systems continue to push the boundaries of scale, the problems of speed and cost are fundamentally tied to the size of data being scanned or processed. Processing thousands of queries that each access terabytes of data with sub-second latency remains infeasible. Data sketching techniques provide means to drastically reduce this size, allowing for real-time or interactive data analysis with reduced costs but with approximate answers. This tutorial covers a number of useful data sketching and sampling methods and demonstrate their use using the Apache DataSketches project. We focus particularly on common problems in analytic problems such as counting distinct items, quantiles, histograms, heavy hitters, and aggregations with large group bys. For these, we covers algorithms, techniques, and theory that can aid both practitioners and theorists in constructing sketches and designing systems that achieve desired error guarantees. For practitioners and implementers, we show how some of these sketches can be easily instantiated using the Apache Datasketches project.

Author supplied keywords

Cite

CITATION STYLE

APA

Ting, D., Malkin, J., & Rhodes, L. (2020). Data Sketching for Real Time Analytics: Theory and Practice. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 3567–3568). Association for Computing Machinery. https://doi.org/10.1145/3394486.3406480

Data Sketching for Real Time Analytics: Theory and Practice

Abstract

Author supplied keywords

Cite

Register to see more suggestions