Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers

Abhijeet Awasthi; Ashutosh Sathe; Sunita Sarawagi

Conference Proceedings

Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (2022) 11548-11562

DOI: 10.18653/v1/2022.emnlp-main.794

4Citations

30Readers

Get full text

Abstract

Text-to-SQL parsers typically struggle with databases unseen during the train time. Adapting parsers to new databases is a challenging problem due to the lack of natural language queries in the new schemas. We present REFILL, a framework for synthesizing high-quality and textually diverse parallel datasets for adapting a Text-to-SQL parser to a target schema. REFILL learns to retrieve-and-edit text queries from the existing schemas and transfers them to the target schema. We show that retrieving diverse existing text, masking their schema-specific tokens, and refilling with tokens relevant to the target schema, leads to significantly more diverse text queries than achievable by standard SQL-to-Text generation methods. Through experiments spanning multiple databases, we demonstrate that fine-tuning parsers on datasets synthesized using REFILL consistently outperforms the prior data-augmentation methods.

Cite

CITATION STYLE

APA

Awasthi, A., Sathe, A., & Sarawagi, S. (2022). Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 11548–11562). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.794

Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers

Abstract

Cite

Register to see more suggestions