Two-level sampling for Join size estimation

Yu Chen; Ke Yi

Conference Proceedings

Two-level sampling for Join size estimation

Proceedings of the ACM SIGMOD International Conference on Management of Data (2017) Part F127746 759-774

DOI: 10.1145/3035918.3035921

59Citations

43Readers

Get full text

Abstract

Join size estimation is a critical step in query optimization, and has been extensively studied in the literature. Among the many techniques, sampling based approaches are particularly appealing, due to their ability to handle arbitrary selection predicates. In this paper, we propose a new sampling algorithm for join size estimation, called two-level sampling, which combines the advantages of three previous sampling methods while making further improvements. Both analytical and empirical comparisons show that the new algorithm outperforms all the previous algorithms on a variety of joins, including primary key-foreign key joins, many-to-many joins, and multi-table joins. The new sampling algorithm is also very easy to implement, requiring just one pass over the data. It only relies on some basic statistical information about the data, such as the 4-norms and the heavy hitters.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, Y., & Yi, K. (2017). Two-level sampling for Join size estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Vol. Part F127746, pp. 759–774). Association for Computing Machinery. https://doi.org/10.1145/3035918.3035921

Two-level sampling for Join size estimation

Abstract

Author supplied keywords

Cite

Register to see more suggestions