Automated Translation of Functional Big Data Queries to SQL

Guoqiang Zhang; Benjamin Mariano; Xipeng Shen; Işll Dillig

Journal ArticleOPEN ACCESS

Automated Translation of Functional Big Data Queries to SQL

Proceedings of the ACM on Programming Languages (2023) 7(OOPSLA1)

DOI: 10.1145/3586047

4Citations

13Readers

Abstract

Big data analytics frameworks like Apache Spark and Flink enable users to implement queries over large, distributed databases using functional APIs. In recent years, these APIs have grown in popularity because their functional interfaces abstract away much of the minutiae of distributed programming required by traditional query languages like SQL. However, the convenience of these APIs comes at a cost because functional queries are often less efficient than their SQL counterparts. Motivated by this observation, we present a new technique for automatically transpiling functional queries to SQL. While our approach is based on the standard paradigm of counterexample-guided inductive synthesis, it uses a novel column-wise decomposition technique to split the synthesis task into smaller subquery synthesis problems. We have implemented this approach as a new tool called RDD2SQL for translating Spark RDD queries to SQL and empirically evaluate the effectiveness of RDD2SQL on a set of real-world RDD queries. Our results show that (1) most RDD queries can be translated to SQL, (2) our tool is very effective at automating this translation, and (3) performing this translation offers significant performance benefits.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, G., Mariano, B., Shen, X., & Dillig, I. (2023). Automated Translation of Functional Big Data Queries to SQL. Proceedings of the ACM on Programming Languages, 7(OOPSLA1). https://doi.org/10.1145/3586047

Automated Translation of Functional Big Data Queries to SQL

Abstract

Author supplied keywords

Cite

Register to see more suggestions