An adaptive data partitioning scheme for accelerating exploratory spark SQL queries

Chenghao Guo; Zhigang Wu; Zhenying He; X. Sean Wang

Conference Proceedings

An adaptive data partitioning scheme for accelerating exploratory spark SQL queries

Guo C
Wu Z
He Z
et al.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10177 LNCS 114-128

DOI: 10.1007/978-3-319-55753-3_8

4Citations

7Readers

Get full text

Abstract

For data analysis, it’s useful to explore the data set with a sequence of queries, frequently using the results from the previous queries to shape the next queries. Thus, data used in the previous queries are often reused, at least in part, in the next queries. This fact may be used to accelerate queries with data partitioning, a widely used technique that enables skipping the irrelevant data for better I/O performance. For getting effective partitions which are likely to cover the query workload in the future, we propose an adaptive partitioning scheme, combining the data-driven metrics and user-driven metrics to guide the data partitioning as well as a heuristic model using the metric plugin system to support different exploratory patterns. For partition storage and management, we propose an effective partition index structure for quickly searching for appropriate partitions to answer queries. The system is quite helpful in improving the performance of exploratory queries.

Cite

CITATION STYLE

APA

Guo, C., Wu, Z., He, Z., & Sean Wang, X. (2017). An adaptive data partitioning scheme for accelerating exploratory spark SQL queries. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10177 LNCS, pp. 114–128). Springer Verlag. https://doi.org/10.1007/978-3-319-55753-3_8

An adaptive data partitioning scheme for accelerating exploratory spark SQL queries

Abstract

Cite

Register to see more suggestions