A data distribution aware task scheduling strategy for MapReduce system

Leitao Guo; Hongwei Sun; Zhiguo Luo

Conference Proceedings

A data distribution aware task scheduling strategy for MapReduce system

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5931 LNCS 694-699

DOI: 10.1007/978-3-642-10665-1_74

6Citations

17Readers

Get full text

Abstract

MapReduce is a parallel programming system to deal with massive data. It can automatically parallelize MapReduce jobs into multiple tasks, schedule to a cluster built by PCs. This paper describes a data distribution aware MapReduce task scheduling strategy. When worker nodes requests for tasks, it will compute and obtain nodes' priority according to the times for request, the number of tasks which can be executed locally and so on. Meanwhile, it can also calculate tasks' priority according to the numbers of copies executed by the task, latency time of tasks and so on. This strategy is based on node and task's scheduling priority, fully considers data distribution in the system and thus schedules Map tasks to nodes having data in high probability, to reduce network overhead and improve system efficiency. © 2009 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Guo, L., Sun, H., & Luo, Z. (2009). A data distribution aware task scheduling strategy for MapReduce system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5931 LNCS, pp. 694–699). https://doi.org/10.1007/978-3-642-10665-1_74

A data distribution aware task scheduling strategy for MapReduce system

Abstract

Author supplied keywords

Cite

Register to see more suggestions