Estimating rarity and similarity over data stream windows

65Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the windowed data stream model, we observe items coming in over time. At any time t, we consider the window of the last N observations at-(N-1), at-(N-2),…, at, each ai Є{1,…, u}; we are required to support queries about the data in the window. A crucial restriction is that we are only allowed o(N) (often polylogarithmic in N) storage space, so not all items within the window can be archived. We study two basic problems in the windowed data stream model. The first is the estimation of the rarity of items in the window. Our second problem is one of estimating similarity between two data stream windows using the Jacard’s coefficient. The problems of estimating rarity and similarity have many applications in mining massive data sets. We present novel, simple algorithms for estimating rarity and similarity on windowed data streams, accurate up to factor 1 ± e using space only logarithmic in the window size.

Cite

CITATION STYLE

APA

Datar, M., & Muthukrishnan, S. (2002). Estimating rarity and similarity over data stream windows. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2461, pp. 323–335). Springer Verlag. https://doi.org/10.1007/3-540-45749-6_31

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free