Optimization analysis of hadoop

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Hadoop is a distributed data processing platform supporting MapReduce parallel computing framework. In order to deal with general problems, there is always a need of accelerating Hadoop under certain circumstance such as Hive jobs. By outputting current time to logs at specially selected points, we traced the workflow of a typical MapReduce job generated by Hive and making time statistics for every phase of the job. Using different data quantities, we compared the proportion of each phase and located the bottleneck points of Hadoop. We make two major optimization advices: (1) focus on using combine and optimizing Net Work and Disk IO when dealing with big jobs having a large number of intermediate results; (2) optimizing map function and Disk IO when dealing with short jobs.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, J., Shi, S., & Wang, H. (2016). Optimization analysis of hadoop. In Communications in Computer and Information Science (Vol. 623, pp. 520–532). Springer Verlag. https://doi.org/10.1007/978-981-10-2053-7_46

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free