Improved resource provisioning in Hadoop

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Extensive use of the Internet is generating large amount of data. The mechanism to handle and analyze these data is becoming complicated day by day. The Hadoop platform provides a solution to process huge data on large clusters of nodes. Scheduler play a vital role in improving the performance of Hadoop. In this paper, MRPPR: MapReduce Performance Parameter based Resource aware Hadoop Scheduler is proposed. In MRPPR, performance parameters of Map task such as the time required for parsing the data, map, sort and merge the result, and of Reduce task, such as the time to merge, parse and reduce is considered to categorize the job as CPU bound, Disk I/O bound or Network I/O bound. Based on the node status obtained from the TaskTracker’s response, nodes in the cluster are classified as CPU busy, Disk I/O busy or Network I/O busy. A cost model is proposed to schedule a job to the node based on the classification to minimize the makespan and to attain effective resource utilization. A performance improvement of 25–30 % is achieved with our proposed scheduler.

Cite

CITATION STYLE

APA

Divya, M., & Annappa, B. (2016). Improved resource provisioning in Hadoop. In Smart Innovation, Systems and Technologies (Vol. 44, pp. 39–49). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-81-322-2529-4_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free