GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing

Haotian Liu; Bo Tang; Jiashu Zhang; Yangshen Deng; Xiao Yan; Xinying Zheng; Qiaomu Shen; Dan Zeng; Zunyao Mao; Chaozu Zhang; Zhengxin You; Zhihao Wang; Runzhe Jiang; Fang Wang; Man Lung Yiu; Huan Li; Mingji Han; Qian Li; Zhenghai Luo

Conference ProceedingsOPEN ACCESS

GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing

SoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing (2022) 158-172

DOI: 10.1145/3542929.3563503

14Citations

6Readers

Get full text

Abstract

As a popular distributed data warehouse system, Apache Hive has been widely used for big data analytics in many organizations. Meanwhile, exploiting the massive parallelism of GPU to accelerate online analytical processing (OLAP) has been extensively explored in the database community. In this paper, we present GHive, which enhances CPU-based Hive via CPU-GPU heterogeneous computing. GHive is designed for the business intelligence applications and provides the same API as Hive for compatibility. To run SQL queries jointly on both CPU and GPU, GHive comes with three key techniques: (i) a novel data model gTable, which is column-based and enables efficient data movement between CPU memory and GPU memory; (ii) a GPU-based operator library Panda, which provides a complete set of SQL operators with extensively optimized GPU implementations; (iii) a hardware-aware MapReduce job placement scheme, which puts jobs judiciously on either GPU or CPU via a cost-based approach. In the experiments, we observe that GHive outperforms Hive in both query processing speed and operating expense on the Star Schema Benchmark (SSB).

Cite

CITATION STYLE

APA

Liu, H., Tang, B., Zhang, J., Deng, Y., Yan, X., Zheng, X., … Luo, Z. (2022). GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing. In SoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing (pp. 158–172). Association for Computing Machinery, Inc. https://doi.org/10.1145/3542929.3563503

GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing

Abstract

Cite

Register to see more suggestions