T5QL: Taming language models for SQL generation

Samuel Arcadinho; David Aparício; Hugo Veiga; António Alegria

Conference ProceedingsOPEN ACCESS

T5QL: Taming language models for SQL generation

GEM 2022 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, Proceedings of the Workshop (2022) 276-286

DOI: 10.18653/v1/2022.gem-1.23

2Citations

30Readers

Abstract

Automatic SQL generation has been an active research area, aiming at streamlining the access to databases by writing natural language with the given intent instead of writing SQL. Current state-of-the-art (SOTA) methods for semantic parsing depend on large language models (LLMs) to achieve high predictive accuracy on benchmark datasets. This reduces their applicability, since LLMs require expensive GPUs. Furthermore, SOTA methods are ungrounded and thus not guaranteed to always generate valid SQL. Here we propose T5QL, a new SQL generation method that improves the performance in benchmark datasets when using smaller LMs, namely T5-Base, by ≈ 13pp when compared against SOTA methods. Additionally, T5QL is guaranteed to always output valid SQL using a context-free grammar to constrain SQL generation. Finally, we show that dividing semantic parsing in two tasks, candidate SQLs generation and candidate re-ranking, is a promising research avenue that can reduce the need for large LMs.

Cite

CITATION STYLE

APA

Arcadinho, S., Aparício, D., Veiga, H., & Alegria, A. (2022). T5QL: Taming language models for SQL generation. In GEM 2022 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, Proceedings of the Workshop (pp. 276–286). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.gem-1.23

T5QL: Taming language models for SQL generation

Abstract

Cite

Register to see more suggestions