Abstract
Join size estimation is a critical step in query optimization, and has been extensively studied in the literature. Among the many techniques, sampling based approaches are particularly appealing, due to their ability to handle arbitrary selection predicates. In this paper, we propose a new sampling algorithm for join size estimation, called two-level sampling, which combines the advantages of three previous sampling methods while making further improvements. Both analytical and empirical comparisons show that the new algorithm outperforms all the previous algorithms on a variety of joins, including primary key-foreign key joins, many-to-many joins, and multi-table joins. The new sampling algorithm is also very easy to implement, requiring just one pass over the data. It only relies on some basic statistical information about the data, such as the 4-norms and the heavy hitters.
Cite
CITATION STYLE
Chen, Y., & Yi, K. (2017). Two-level sampling for Join size estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Vol. Part F127746, pp. 759–774). Association for Computing Machinery. https://doi.org/10.1145/3035918.3035921
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.