Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries

5Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

SPJ (select-project-join) queries form the backbone of many SQL queries used in practice. Accurate cardinality estimation of these queries is thus an important problem, with applications in query optimization, approximate query processing, and data analytics. However, this problem has not been rigorously addressed in the literature, despite the fact that cardinality estimation techniques of the three relational operators, selection, projection, and join, have each been extensively studied (but not when used in combination) in the past 30+ years. The major technical difficulty is that (distinct) projection seems to be difficult to combine with the other two operators when it comes to cardinality estimation. In this paper, we give the first formal study of cardinality estimation for SP queries. While it was studied in a prior work in 2001, there is no guarantee on its optimality. We define a class of algorithms, which we call weighted distinct sampling, for estimating SP query sizes, and show how to find a near-optimal sampling strategy that is away from the optimum only by a lower order term. We then extend it to handling SPJ queries, giving the first non-trivial solution for SPJ cardinality estimation. We have also performed an extensive experimental evaluation to complement our theoretical findings.

Cite

CITATION STYLE

APA

Qiu, Y., Wang, Y., Yi, K., Li, F., Wu, B., & Zhan, C. (2021). Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1465–1477). Association for Computing Machinery. https://doi.org/10.1145/3448016.3452821

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free