Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion

11Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.

Abstract

Text-to-SQL parsers map natural language questions to programs that are executable over tables to generate answers, and are typically evaluated on large-scale datasets like SPIDER (Yu et al., 2018). We argue that existing benchmarks fail to capture a certain out-of-domain generalization problem that is of significant practical importance: matching domain specific phrases to composite operations over columns. To study this problem, we propose a synthetic dataset and a re-purposed train/test split of the SQUALL dataset (Shi et al., 2020) as new benchmarks to quantify domain generalization over column operations. Our results indicate that existing state-of-the-art parsers struggle in these benchmarks. We propose to address this problem by incorporating prior domain knowledge by preprocessing table schemas, and design a method that consists of two components: schema expansion and schema pruning. This method can be easily applied to multiple existing base parsers, and we show that it significantly outperforms baseline parsers on this domain generalization problem, boosting the underlying parsers' overall performance by up to 13.8% relative accuracy gain (5.1% absolute) on the new SQUALL data split.

Cite

CITATION STYLE

APA

Zhao, C., Su, Y., Pauls, A., & Platanios, E. A. (2022). Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 5568–5578). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.381

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free