Efficient query algorithm of Coallocation-Parallel-Hash-Join in the cloud data center

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the hybrid architecture of cloud data center, the data division is an important factor that affects the performance of query. For the costly join operations which applies the way of hybrid mapreduce, the overhead of network transmission and I/O is huge that requires large-scale transmission of data across the nodes. In order to reduce the data traffic and improve the efficiency of join queries, this paper proposes an efficient algorithm of Coallocation Parallel Hash Join (CPHJ). First, CPHJ designs a consistent multi-redundant hashing algorithm that distributes the table with join relationship in the cluster according to its join properties, which improves the data locality in the join query processing, but also ensures the availability of the data. Then, On the basis of consistent multi-redundant hashing algorithm, parallel algorithm of join query called ParallelHashJoin is proposed that effectively improves the efficiency of join queries. The CPHJ method applies in the data warehouse system of Alibaba and experimental results indicate that the workpiece ratio of CPHJ in that query is nearly five times more likely than the hive system.

Cite

CITATION STYLE

APA

Shen, Y., Lu, P., Qin, X., Qian, Y., & Wang, S. (2015). Efficient query algorithm of Coallocation-Parallel-Hash-Join in the cloud data center. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9483, pp. 306–320). Springer Verlag. https://doi.org/10.1007/978-3-319-27051-7_26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free