Automated Translation of Functional Big Data Queries to SQL

4Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Big data analytics frameworks like Apache Spark and Flink enable users to implement queries over large, distributed databases using functional APIs. In recent years, these APIs have grown in popularity because their functional interfaces abstract away much of the minutiae of distributed programming required by traditional query languages like SQL. However, the convenience of these APIs comes at a cost because functional queries are often less efficient than their SQL counterparts. Motivated by this observation, we present a new technique for automatically transpiling functional queries to SQL. While our approach is based on the standard paradigm of counterexample-guided inductive synthesis, it uses a novel column-wise decomposition technique to split the synthesis task into smaller subquery synthesis problems. We have implemented this approach as a new tool called RDD2SQL for translating Spark RDD queries to SQL and empirically evaluate the effectiveness of RDD2SQL on a set of real-world RDD queries. Our results show that (1) most RDD queries can be translated to SQL, (2) our tool is very effective at automating this translation, and (3) performing this translation offers significant performance benefits.

Cite

CITATION STYLE

APA

Zhang, G., Mariano, B., Shen, X., & Dillig, I. (2023). Automated Translation of Functional Big Data Queries to SQL. Proceedings of the ACM on Programming Languages, 7(OOPSLA1). https://doi.org/10.1145/3586047

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free