An adaptive data partitioning scheme for accelerating exploratory spark SQL queries

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

For data analysis, it’s useful to explore the data set with a sequence of queries, frequently using the results from the previous queries to shape the next queries. Thus, data used in the previous queries are often reused, at least in part, in the next queries. This fact may be used to accelerate queries with data partitioning, a widely used technique that enables skipping the irrelevant data for better I/O performance. For getting effective partitions which are likely to cover the query workload in the future, we propose an adaptive partitioning scheme, combining the data-driven metrics and user-driven metrics to guide the data partitioning as well as a heuristic model using the metric plugin system to support different exploratory patterns. For partition storage and management, we propose an effective partition index structure for quickly searching for appropriate partitions to answer queries. The system is quite helpful in improving the performance of exploratory queries.

Cite

CITATION STYLE

APA

Guo, C., Wu, Z., He, Z., & Sean Wang, X. (2017). An adaptive data partitioning scheme for accelerating exploratory spark SQL queries. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10177 LNCS, pp. 114–128). Springer Verlag. https://doi.org/10.1007/978-3-319-55753-3_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free