On Random Sampling Over Joins

20Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree completely. We undertake a detailed study of this problem and attempt to analyze it in a variety of settings. We present theoretical results explaining the difficulty of this problem and setting limits on the efficiency that can be achieved. Based on new insights into the interaction between join and sampling, we develop join sampling techniques for the settings where our negative results do not apply. Our new sampling algorithms are significantly more efficient than those known earlier. We present experimental evaluation of our techniques on Microsoft's SQL Server 7.0.

Cite

CITATION STYLE

APA

Chaudhuri, S., Motwani, R., & Narasayya, V. (1999). On Random Sampling Over Joins. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 263–274). Association for Computing Machinery. https://doi.org/10.1145/304182.304206

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free