The problem of sketching a large mathematical object is to produce a compact data structure that approximately represents it. Much work has focused on the problem of sketching large vectors to provide a small "sketch" of the vector from which key properties-such as the norm of the vector, or estimates of entries-can be retrieved. The Count-Min (CM) Sketch is an example of a sketch that allows a number of related quantities to be estimated with accuracy guarantees, including point queries and dot product queries. Such queries are at the core of many computations, so the structure can be used in order to answer a variety of other queries, such as frequent items (heavy hitters), quantile finding, join size estimation, and more. Since the sketch can process updates in the form of additions or subtractions to dimensions of the vector (which may correspond to insertions or deletions, or other transactions), it is capable of working over streams of updates, at high rates. The data structure maintains the linear projection of the vector with a number of other random vectors. These vectors are defined implicitly by simple hash functions. Increasing the range of the hash functions increases the accuracy of the summary, and increasing the number of hash functions decreases the probability of a bad estimate. These tradeoffs are quantified precisely below. Because of this linearity, CM sketches can be scaled, added and subtracted, to produce summaries of the corresponding scaled and combined vectors.
CITATION STYLE
Cormode, G. (2016). Count-Min Sketch. In Encyclopedia of Algorithms (pp. 464–468). Springer New York. https://doi.org/10.1007/978-1-4939-2864-4_579
Mendeley helps you to discover research relevant for your work.