Data Streams as Random Permutations: the Distinct Element Problem

  • Helmi A
  • Lumbroso J
  • Martínez C
  • et al.
N/ACitations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we show that data streams can sometimes usefully be studied as random permutations. This simple observation allows a wealth of classical and recent results from combinatorics to be recycled, with minimal effort, as estimators for various statistics over data streams. We illustrate this by introducing RECORDINALITY, an algorithm which estimates the number of distinct elements in a stream by counting the number of $k$-records occurring in it. The algorithm has a score of interesting properties, such as providing a random sample of the set underlying the stream. To the best of our knowledge, a modified version of RECORDINALITY is the first cardinality estimation algorithm which, in the random-order model, uses neither sampling nor hashing.

Cite

CITATION STYLE

APA

Helmi, A., Lumbroso, J., Martínez, C., & Viola, A. (2012). Data Streams as Random Permutations: the Distinct Element Problem. Discrete Mathematics & Theoretical Computer Science, DMTCS Proceedings vol. AQ,...(Proceedings). https://doi.org/10.46298/dmtcs.3002

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free