On the relative cost of sampling for join selectivity estimation

Peter J. Haas; Jeffrey F. Naughton; Arun N. Swami

Conference ProceedingsOPEN ACCESS

On the relative cost of sampling for join selectivity estimation

Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (1994) 14-24

DOI: 10.1145/182591.182594

37Citations

6Readers

Abstract

We compare the cost of estimating the selectivity of a 'star join' using sampling procedure t-cross to the cost of simply computing the join and obtaining the exact answer. Our bounds and approximation for the relative cost of sampling show how this cost depends on the size of the input relations, the number of input relations, and the precision criterion used by the estimation procedure. We also demonstrate the deleterious effect of dangling tuples and the mixed effect of data skew on the relative cost of sampling. These results provide insight into when sampling should or should not be used for join selectivity estimation.

Cite

CITATION STYLE

APA

Haas, P. J., Naughton, J. F., & Swami, A. N. (1994). On the relative cost of sampling for join selectivity estimation. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 14–24). Publ by ACM. https://doi.org/10.1145/182591.182594

On the relative cost of sampling for join selectivity estimation

Abstract

Cite

Register to see more suggestions