Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

4Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The task of context-dependent text-to-SQL aims to convert multi-turn user utterances to formal SQL queries. This is a challenging task due to both the scarcity of training data from which to learn complex contextual dependencies and to generalize to unseen databases. In this paper we explore augmenting the training datasets using self-play, which leverages contextual information to synthesize new interactions to adapt the model to new databases. We first design a SQL-to-text model conditioned on a sampled goal query, which represents a user's intent, that then converses with a text-to-SQL semantic parser to generate new interactions. We then filter the synthesized interactions and retrain the models with the augmented data. We find that self-play improves the accuracy of a strong baseline on SParC and CoSQL, two widely used cross-domain text-to-SQL datasets. Our analysis shows that self-play simulates various conversational thematic relations, enhances cross-domain generalization and improves beam-search.

Cite

CITATION STYLE

APA

Liu, Q., Ye, Z., Yu, T., Blunsom, P., & Song, L. (2022). Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play. In Findings of the Association for Computational Linguistics: EMNLP 2022 (pp. 5637–5649). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-emnlp.141

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free