Two-level sampling for Join size estimation

59Citations
Citations of this article
43Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Join size estimation is a critical step in query optimization, and has been extensively studied in the literature. Among the many techniques, sampling based approaches are particularly appealing, due to their ability to handle arbitrary selection predicates. In this paper, we propose a new sampling algorithm for join size estimation, called two-level sampling, which combines the advantages of three previous sampling methods while making further improvements. Both analytical and empirical comparisons show that the new algorithm outperforms all the previous algorithms on a variety of joins, including primary key-foreign key joins, many-to-many joins, and multi-table joins. The new sampling algorithm is also very easy to implement, requiring just one pass over the data. It only relies on some basic statistical information about the data, such as the 4-norms and the heavy hitters.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, Y., & Yi, K. (2017). Two-level sampling for Join size estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Vol. Part F127746, pp. 759–774). Association for Computing Machinery. https://doi.org/10.1145/3035918.3035921

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free