A Data-Aware Scheduling Strategy for Executing Large-Scale Distributed Workflows

5Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Task scheduling is a crucial key component for the efficient execution of data-intensive applications on distributed environments, by which many machines must be coordinated to reduce execution times and bandwidth consumption. This paper presents ADAGE, a data-aware scheduler designed to efficiently execute data-intensive workflows in large-scale computers. The proposed scheduler is based on three key features: i) critical path analysis, for discovering the critical tasks of a workflow and reducing data transferring between nodes; ii) work giving, a new dynamic planning strategy for migrating tasks from overloaded to unloaded nodes; and iii) task replication, which executes task replicas on different nodes for improving both execution time and fault tolerance. Experiments performed on a distributed computing environment composed of up to 1,024 processing nodes show that ADAGE achieves better performances than existing scheduling systems, obtaining an average reduction of up to 66% in execution time.

Cite

CITATION STYLE

APA

Giampa, S., Belcastro, L., Marozzo, F., Talia, D., & Trunfio, P. (2021). A Data-Aware Scheduling Strategy for Executing Large-Scale Distributed Workflows. IEEE Access, 9, 47354–47364. https://doi.org/10.1109/ACCESS.2021.3067815

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free