MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

13Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.

Abstract

Text-to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MULTISPIDER, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MULTISPIDER, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVE (SchemaAugmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.

Cite

CITATION STYLE

APA

Dou, L., Gao, Y., Pan, M., Wang, D., Che, W., Zhan, D., & Lou, J. G. (2023). MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (Vol. 37, pp. 12745–12753). AAAI Press. https://doi.org/10.1609/aaai.v37i11.26499

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free