As a popular distributed data warehouse system, Apache Hive has been widely used for big data analytics in many organizations. Meanwhile, exploiting the massive parallelism of GPU to accelerate online analytical processing (OLAP) has been extensively explored in the database community. In this paper, we present GHive, which enhances CPU-based Hive via CPU-GPU heterogeneous computing. GHive is designed for the business intelligence applications and provides the same API as Hive for compatibility. To run SQL queries jointly on both CPU and GPU, GHive comes with three key techniques: (i) a novel data model gTable, which is column-based and enables efficient data movement between CPU memory and GPU memory; (ii) a GPU-based operator library Panda, which provides a complete set of SQL operators with extensively optimized GPU implementations; (iii) a hardware-aware MapReduce job placement scheme, which puts jobs judiciously on either GPU or CPU via a cost-based approach. In the experiments, we observe that GHive outperforms Hive in both query processing speed and operating expense on the Star Schema Benchmark (SSB).
CITATION STYLE
Liu, H., Tang, B., Zhang, J., Deng, Y., Yan, X., Zheng, X., … Luo, Z. (2022). GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing. In SoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing (pp. 158–172). Association for Computing Machinery, Inc. https://doi.org/10.1145/3542929.3563503
Mendeley helps you to discover research relevant for your work.