A Supplement to Sampling-Based Methods for Query Size Estimation in a Database System

29Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Sampling-based methods for estimating relation sizes after relational operators such as selections, joins and projections have been intensively studied in recent years. Methods of this type can achieve high estimation accuracy and efficiency. Since the dominating overhead involved in a sampling-based method is the sampling cost, different variants of sampling methods are proposed so as to minimize the sampling percentage 1992 while maintaining the estimation accuracy in terms of the confidence level and relative error (to be precisely defined later in Section 2). In order to determine the minimal sampling percentage, the overall characteristics of the data such as the mean and variance are needed. Currently, the representative sampling-based methods in literature are based on the assumption that overall characteristics of data are unavailable, and thus a significant amount of effort is dedicated to estimating these characteristics so as to approach the optimal (minimal) sampling percentage. The estimation for these characteristics incurs cost as well as suffers the estimation error. In this short essay, we point out that the exact values of these characteristics of data can be kept track of in a database system at a negligible overhead. As a result, the minimal sampling percentage while ensuring the specified relative error and confidence level can be precisely determined. © 1992, ACM. All rights reserved.

Cite

CITATION STYLE

APA

Ling, Y., & Sun, W. (1992). A Supplement to Sampling-Based Methods for Query Size Estimation in a Database System. ACM SIGMOD Record, 21(4), 12–15. https://doi.org/10.1145/141818.141820

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free