GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing

8Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As a popular distributed data warehouse system, Apache Hive has been widely used for big data analytics in many organizations. Meanwhile, exploiting the massive parallelism of GPU to accelerate online analytical processing (OLAP) has been extensively explored in the database community. In this paper, we present GHive, which enhances CPU-based Hive via CPU-GPU heterogeneous computing. GHive is designed for the business intelligence applications and provides the same API as Hive for compatibility. To run SQL queries jointly on both CPU and GPU, GHive comes with three key techniques: (i) a novel data model gTable, which is column-based and enables efficient data movement between CPU memory and GPU memory; (ii) a GPU-based operator library Panda, which provides a complete set of SQL operators with extensively optimized GPU implementations; (iii) a hardware-aware MapReduce job placement scheme, which puts jobs judiciously on either GPU or CPU via a cost-based approach. In the experiments, we observe that GHive outperforms Hive in both query processing speed and operating expense on the Star Schema Benchmark (SSB).

Cite

CITATION STYLE

APA

Liu, H., Tang, B., Zhang, J., Deng, Y., Yan, X., Zheng, X., … Luo, Z. (2022). GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing. In SoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing (pp. 158–172). Association for Computing Machinery, Inc. https://doi.org/10.1145/3542929.3563503

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free