We consider the problem of tracking with small relative error an integer function f(n) defined by a distributed update stream f′(n) in the distributed monitoring model. In this model, there are k sites over which the updates f′(n) are distributed, and they must communicate with a central coordinator to maintain an estimate of f(n). Existing streaming algorithms with worst-case guarantees for this problem assume f(n) to be monotone; there are very large lower bounds on the space requirements for summarizing a distributed non-monotonic stream, often linear in the size n of the stream. However, the input streams obtaining these lower bounds are highly variable, making relatively large jumps from one timestep to the next; in practice, the impact on f(n) of any single update f′(n) is usually small. What has heretofore been lacking is a framework for nonmonotonic streams that admits algorithms whose worst-case performance is as good as existing algorithms for monotone streams and degrades gracefully for non-monotonic streams as those streams vary more quickly. In this paper we propose such a framework. We introduce a stream parameter, the "variability" v, deriving its definition in a way that shows it to be a natural parameter to consider for non-monotonic streams. It is also a useful parameter. From a theoretical perspective, we can adapt existing algorithms for monotone streams to work for non-monotonic streams, with only minor modifications, in such a way that they reduce to the monotone case when the stream happens to be monotone, and in such a way that we can refine the worst-case communication bounds from Θ(n) to O(v). From a practical perspective, we demonstrate that v can be small in practice by proving that v is O(log f(n)) for monotone streams and o(n) for streams that are "nearly" monotone or that are generated by random walks. We expect v to be o(n) for many other interesting input classes as well.
CITATION STYLE
Felber, D., & Ostrovsky, R. (2016). Variability in data streams. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Vol. 26-June-01-July-2016, pp. 251–260). Association for Computing Machinery. https://doi.org/10.1145/2902251.2902277
Mendeley helps you to discover research relevant for your work.