Improved resource provisioning in Hadoop

M. Divya; B. Annappa

Conference Proceedings

Improved resource provisioning in Hadoop

Smart Innovation, Systems and Technologies (2016) 44 39-49

DOI: 10.1007/978-81-322-2529-4_4

0Citations

3Readers

Get full text

Abstract

Extensive use of the Internet is generating large amount of data. The mechanism to handle and analyze these data is becoming complicated day by day. The Hadoop platform provides a solution to process huge data on large clusters of nodes. Scheduler play a vital role in improving the performance of Hadoop. In this paper, MRPPR: MapReduce Performance Parameter based Resource aware Hadoop Scheduler is proposed. In MRPPR, performance parameters of Map task such as the time required for parsing the data, map, sort and merge the result, and of Reduce task, such as the time to merge, parse and reduce is considered to categorize the job as CPU bound, Disk I/O bound or Network I/O bound. Based on the node status obtained from the TaskTracker’s response, nodes in the cluster are classified as CPU busy, Disk I/O busy or Network I/O busy. A cost model is proposed to schedule a job to the node based on the classification to minimize the makespan and to attain effective resource utilization. A performance improvement of 25–30 % is achieved with our proposed scheduler.

Author supplied keywords

Cite

CITATION STYLE

APA

Divya, M., & Annappa, B. (2016). Improved resource provisioning in Hadoop. In Smart Innovation, Systems and Technologies (Vol. 44, pp. 39–49). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-81-322-2529-4_4

Improved resource provisioning in Hadoop

Abstract

Author supplied keywords

Cite

Register to see more suggestions