On the relative cost of sampling for join selectivity estimation

37Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

We compare the cost of estimating the selectivity of a 'star join' using sampling procedure t-cross to the cost of simply computing the join and obtaining the exact answer. Our bounds and approximation for the relative cost of sampling show how this cost depends on the size of the input relations, the number of input relations, and the precision criterion used by the estimation procedure. We also demonstrate the deleterious effect of dangling tuples and the mixed effect of data skew on the relative cost of sampling. These results provide insight into when sampling should or should not be used for join selectivity estimation.

Cite

CITATION STYLE

APA

Haas, P. J., Naughton, J. F., & Swami, A. N. (1994). On the relative cost of sampling for join selectivity estimation. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 14–24). Publ by ACM. https://doi.org/10.1145/182591.182594

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free