Abstract
Relational database systems rely on the join operator to assemble data for answering queries. Although the order of (natural) joins—here called the strategy for cmnpatwzg the joins—does not affect the final result, it does determine to a large extent the response time of the query. Query optimizers therefore try to pick an optimal strategy. In practice, optimizers usually restrict their search for an optimal strategy to strategies that are linear (e g., of the form ((RI M Rz) M Rj ) M R~), or that avoid Cartesian products, or both. The purpose of this paper is to examine the conditions under which an optimizer can find an optimal strategy, despite having restricted the scope of its search. Specifically, sufficient conditions are given under which (1) a linear strategy that is optimum will not use Cartesian products, (2) there is an optimum strategy that does not use Cartesian products, and (3) there is an optimum strategy that is linear and that does not use Cartesian products. (Optimality is with respect to the number of tuples generated by a strategy. ) The necessity of these conditions is illustrated through examples. The conditions do not assume uniformity in the distribution of attribute values, nor independence in the attributes. Instead, they are either a formalization of heurmtic assumptions, or based on semantic constraints. For example, the conditions are satisfied if all join attributes form superkeys. The analytic framework can be adapted for database acyclicity, losdess joins, unions, and intersections. © 1993, ACM. All rights reserved.
Author supplied keywords
Cite
CITATION STYLE
Tay, Y. C. (1993). On the Optimality of Strategies for Multiple Join. Journal of the ACM (JACM), 40(5), 1067–1086. https://doi.org/10.1145/174147.174151
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.