System status aware hadoop scheduling methods for job performance improvement

Masatoshi Kawarasaki; Hyuma Watanabe

Journal Article

System status aware hadoop scheduling methods for job performance improvement

IEICE Transactions on Information and Systems (2015) E98D(7) 1275-1285

DOI: 10.1587/transinf.2014EDP7385

3Citations

8Readers

Get full text

Abstract

Map Reduce and its open software implementation Hadoop are now widely deployed for big data analysis. As MapReduce runs over a cluster of massive machines, data transfer often becomes a bottleneck in job processing. In this paper, we explore the influence of data transfer to job processing performance and analyze the mechanism of job performance deterioration caused by data transfer oriented congestion at disk I/O and/or network I/O. Based on this analysis, we update Hadoop's Heartbeat messages to contain the real time system status for each machine, like disk I/O and link usage rate. This enhancement makes Hadoop's scheduler be aware of each machine's workload and make more accurate decision of scheduling. The experiment has been done to evaluate the effectiveness of enhanced scheduling methods and discussions are provided to compare the several proposed scheduling policies.

Author supplied keywords

Cite

CITATION STYLE

APA

Kawarasaki, M., & Watanabe, H. (2015). System status aware hadoop scheduling methods for job performance improvement. IEICE Transactions on Information and Systems, E98D(7), 1275–1285. https://doi.org/10.1587/transinf.2014EDP7385

System status aware hadoop scheduling methods for job performance improvement

Abstract

Author supplied keywords

Cite

Register to see more suggestions