T5QL: Taming language models for SQL generation

2Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.

Abstract

Automatic SQL generation has been an active research area, aiming at streamlining the access to databases by writing natural language with the given intent instead of writing SQL. Current state-of-the-art (SOTA) methods for semantic parsing depend on large language models (LLMs) to achieve high predictive accuracy on benchmark datasets. This reduces their applicability, since LLMs require expensive GPUs. Furthermore, SOTA methods are ungrounded and thus not guaranteed to always generate valid SQL. Here we propose T5QL, a new SQL generation method that improves the performance in benchmark datasets when using smaller LMs, namely T5-Base, by ≈ 13pp when compared against SOTA methods. Additionally, T5QL is guaranteed to always output valid SQL using a context-free grammar to constrain SQL generation. Finally, we show that dividing semantic parsing in two tasks, candidate SQLs generation and candidate re-ranking, is a promising research avenue that can reduce the need for large LMs.

Cite

CITATION STYLE

APA

Arcadinho, S., Aparício, D., Veiga, H., & Alegria, A. (2022). T5QL: Taming language models for SQL generation. In GEM 2022 - 2nd Workshop on Natural Language Generation, Evaluation, and Metrics, Proceedings of the Workshop (pp. 276–286). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.gem-1.23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free