We compare the cost of estimating the selectivity of a 'star join' using sampling procedure t-cross to the cost of simply computing the join and obtaining the exact answer. Our bounds and approximation for the relative cost of sampling show how this cost depends on the size of the input relations, the number of input relations, and the precision criterion used by the estimation procedure. We also demonstrate the deleterious effect of dangling tuples and the mixed effect of data skew on the relative cost of sampling. These results provide insight into when sampling should or should not be used for join selectivity estimation.
CITATION STYLE
Haas, P. J., Naughton, J. F., & Swami, A. N. (1994). On the relative cost of sampling for join selectivity estimation. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 14–24). Publ by ACM. https://doi.org/10.1145/182591.182594
Mendeley helps you to discover research relevant for your work.