As the coming of the big data age, there is a new hot spot on how to handle and process huge amounts of data. The MapReduce parallel computing framework is increasingly being used in large-scale data analysis. Although there have been many studies about the join operation in the traditional relational database, join algorithms in MapReduce are inefficient. In this paper, we describe a number of well-known join algorithms in MapReduce, and present an experimental comparison of these join algorithms based on Hadoop cluster. An optimization algorithm for map side chain is proposed. © 2013 Springer Science+Business Media New York.
CITATION STYLE
Zhang, L., Xu, S., & Peng, C. (2013). Join optimization for large-scale data analysis in mapreduce. In Lecture Notes in Electrical Engineering (Vol. 236 LNEE, pp. 651–657). Springer Verlag. https://doi.org/10.1007/978-1-4614-7010-6_73
Mendeley helps you to discover research relevant for your work.