Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries

Yuan Qiu; Yilei Wang; Ke Yi; Feifei Li; Bin Wu; Chaoqun Zhan

Conference ProceedingsOPEN ACCESS

Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries

Proceedings of the ACM SIGMOD International Conference on Management of Data (2021) 1465-1477

DOI: 10.1145/3448016.3452821

5Citations

21Readers

Get full text

Abstract

SPJ (select-project-join) queries form the backbone of many SQL queries used in practice. Accurate cardinality estimation of these queries is thus an important problem, with applications in query optimization, approximate query processing, and data analytics. However, this problem has not been rigorously addressed in the literature, despite the fact that cardinality estimation techniques of the three relational operators, selection, projection, and join, have each been extensively studied (but not when used in combination) in the past 30+ years. The major technical difficulty is that (distinct) projection seems to be difficult to combine with the other two operators when it comes to cardinality estimation. In this paper, we give the first formal study of cardinality estimation for SP queries. While it was studied in a prior work in 2001, there is no guarantee on its optimality. We define a class of algorithms, which we call weighted distinct sampling, for estimating SP query sizes, and show how to find a near-optimal sampling strategy that is away from the optimum only by a lower order term. We then extend it to handling SPJ queries, giving the first non-trivial solution for SPJ cardinality estimation. We have also performed an extensive experimental evaluation to complement our theoretical findings.

Author supplied keywords

Cite

CITATION STYLE

APA

Qiu, Y., Wang, Y., Yi, K., Li, F., Wu, B., & Zhan, C. (2021). Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1465–1477). Association for Computing Machinery. https://doi.org/10.1145/3448016.3452821

Weighted Distinct Sampling: Cardinality Estimation for SPJ Queries

Abstract

Author supplied keywords

Cite

Register to see more suggestions