LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning

13Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.

Abstract

Many database optimization problems, e.g., slow SQL diagnosis, database testing, optimizer tuning, require a large volume of SQL queries. Due to privacy issues, it is hard to obtain real SQL queries, and thus SQL generation is a very important task in database optimization. Existing SQL generation methods either randomly generate SQL queries or rely on human-crafted SQL templates to generate SQL queries, but they cannot meet various user specific requirements, e.g., slow SQL queries, SQL queries with large result sizes. To address this problem, this paper studies the problem of constraint-aware SQL generation, which, given a constraint (e.g., cardinality within [1k,2k]), generates SQL queries satisfying the constraint. This problem is rather challenging, because it is rather hard to capture the relationship from query constraint (e.g., cardinality and cost) to SQL queries and thus it is hard to guide a generation method to explore the SQL generation direction towards meeting the constraint. To address this challenge, we propose a reinforcement learning (RL) based framework LearnedSQLGen, for generating queries satisfying the constraint. LearnedSQLGen adopts an exploration-exploitation strategy that exploits the generation direction following the query constraint, which is learned from query execution feedback. We judiciously design the reward function in RL to guide the generation process accurately. We integrate a finite-state machine to generate valid SQL queries. Experimental results on three benchmarks showed that LearnedSQLGen significantly outperformed the baselines in terms of both accuracy (30% better) and efficiency (10-35 times).

Cite

CITATION STYLE

APA

Zhang, L., Chai, C., Zhou, X., & Li, G. (2022). LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 945–958). Association for Computing Machinery. https://doi.org/10.1145/3514221.3526155

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free