Efficient query algorithm of Coallocation-Parallel-Hash-Join in the cloud data center

Yao Shen; Ping Lu; Xiaolin Qin; Yuming Qian; Sheng Wang

Conference Proceedings

Efficient query algorithm of Coallocation-Parallel-Hash-Join in the cloud data center

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9483 306-320

DOI: 10.1007/978-3-319-27051-7_26

1Citations

4Readers

Get full text

Abstract

In the hybrid architecture of cloud data center, the data division is an important factor that affects the performance of query. For the costly join operations which applies the way of hybrid mapreduce, the overhead of network transmission and I/O is huge that requires large-scale transmission of data across the nodes. In order to reduce the data traffic and improve the efficiency of join queries, this paper proposes an efficient algorithm of Coallocation Parallel Hash Join (CPHJ). First, CPHJ designs a consistent multi-redundant hashing algorithm that distributes the table with join relationship in the cluster according to its join properties, which improves the data locality in the join query processing, but also ensures the availability of the data. Then, On the basis of consistent multi-redundant hashing algorithm, parallel algorithm of join query called ParallelHashJoin is proposed that effectively improves the efficiency of join queries. The CPHJ method applies in the data warehouse system of Alibaba and experimental results indicate that the workpiece ratio of CPHJ in that query is nearly five times more likely than the hive system.

Author supplied keywords

Cite

CITATION STYLE

APA

Shen, Y., Lu, P., Qin, X., Qian, Y., & Wang, S. (2015). Efficient query algorithm of Coallocation-Parallel-Hash-Join in the cloud data center. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9483, pp. 306–320). Springer Verlag. https://doi.org/10.1007/978-3-319-27051-7_26

Efficient query algorithm of Coallocation-Parallel-Hash-Join in the cloud data center

Abstract

Author supplied keywords

Cite

Register to see more suggestions