Abstract
The design of SQL is based on a three-valued logic (3VL), rather than the familiar two-valued Boolean logic (2VL). In addition totrue andfalse, 3VL addsunknown to handle nulls. Viewed as indispensable for SQL expressiveness, it is often criticized for unintuitive behavior of queries and for being a source of programmer mistakes. We show that, contrary to the widely held view, SQL could have been designed based on 2VL, without any loss of expressiveness. Similarly to SQL's WHERE clause, which only keeps true tuples, we conflate false and unknown for conditions involving nulls to obtain an equally expressive 2VL-based version of SQL. This applies to the core of the 1999 SQL Standard. Queries written under the 2VL semantics can be efficiently translated into the 3VL SQL and thus executed on any existing RDBMS. We show that 2VL enables additional optimizations. To gauge its applicability, we establish criteria under which 2VL and 3VL semantics coincide, and analyze common benchmarks such as TPC-H and TPC-DS to show that most of their queries are such. For queries that behave differently under 2VL and 3VL, we undertake a user study to show a consistent preference for the 2VL semantics.
Author supplied keywords
Cite
CITATION STYLE
Libkin, L., & Peterfreund, L. (2023). SQL Nulls and Two-Valued Logic. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 11–20). Association for Computing Machinery. https://doi.org/10.1145/3584372.3588661
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.