The frequent items problem is to process a stream of items and find all items occurring more than a given fraction of the time. It is one of the most heavily studied problems in data stream algorithms, dating back to the 1980s. Many applications rely directly or indirectly on finding the frequent items, and implementations are in use in large scale industrial systems. Informally, given a sequence of items, the problem is simply to find those items which occur most frequently. Typically, this is formalized as finding all items whose frequency exceeds a specified fraction of the total number of items. Variations arise when the items are given weights, and further when these weights can also be negative. Definition 1. Given a stream S of n items t 1 . . . t n , the frequency of an item i is f i = |{j|t j = i}|. The exact φ-frequent items comprise the set {i|f i > φn}. Example. The stream S = (a, b, a, c, c, a, b, d) has f a = 3, f b = 2, f c = 2, f d = 1. For φ = 0.2, the frequent items are a, b and c. A streaming algorithm which solves this problem must use a linear amount of space, even for large values of φ: Given an algorithm that claims to solve this problem, we could insert a set S of N items, where every item has frequency 1. Then, we could also insert N copies of item i. If i is then reported as a frequent item (occurring more than 50% of the time) then i ∈ S, else i ∈ S. Consequently, since set membership requires Ω(N) space, Ω(N) space is also required to solve the frequent items problem. Instead, an approximate version is defined based on a tolerance for error
CITATION STYLE
Cormode, G. (2014). Misra-Gries Summaries. In Encyclopedia of Algorithms (pp. 1–5). Springer US. https://doi.org/10.1007/978-3-642-27848-8_572-1
Mendeley helps you to discover research relevant for your work.