Quantiles on Streams

Chiranjeeb Buragohain; Subhash Suri

Book Chapter

Quantiles on Streams

Buragohain C
Suri S

Springer New York, (2016), 1-6

DOI: 10.1007/978-1-4899-7993-3_290-2

N/ACitations

6Readers

Get full text

Abstract

SYNONYMS Median; histogram; selection; order statistics DEFINITION Quantiles are order statistics of data: the φ-quantile (0 ≤ φ ≤ 1) of a set S is an element x such that φ|S| elements of S are less than or equal to x and the remaining (1 − φ)|S| are greater than x. This article describes data stream (single-pass) algorithms for computing an approximation of such quantiles. HISTORICAL BACKGROUND The need to summarize data has been around since the earliest days of data processing. Large volumes of raw, unstructured data easily overwhelm human ability to comprehend or digest, and tools that help identify the major underlying trends or patterns in data have enormous value. Quantiles characterize distributions of real world data sets in ways that are less sensitive to outliers than simpler alternatives such as the mean and the variance. Consequently, quantiles are of interest to both database implementers and users: for instance, they are a fundamental tool for query optimization, splitting of data in parallel database systems, and statistical data analysis. Quantiles are closely related to the familiar concepts of frequency distributions and histograms. The cumulative frequency distribution F () is commonly used to summarize the distribution of a (totally ordered) set S. Specifically, for any value x, F (x) = Number of values less than x. (1) The quantile Q(φ), or the φ-th quantile is simply the inverse of F (x). Specifically, if the set S has n elements, then the element x has the property that Q(F (x)/n) = x. (2)

Cite

CITATION STYLE

APA

Buragohain, C., & Suri, S. (2016). Quantiles on Streams. In Encyclopedia of Database Systems (pp. 1–6). Springer New York. https://doi.org/10.1007/978-1-4899-7993-3_290-2

Quantiles on Streams

Abstract

Cite

Register to see more suggestions