Task scheduling is a crucial key component for the efficient execution of data-intensive applications on distributed environments, by which many machines must be coordinated to reduce execution times and bandwidth consumption. This paper presents ADAGE, a data-aware scheduler designed to efficiently execute data-intensive workflows in large-scale computers. The proposed scheduler is based on three key features: i) critical path analysis, for discovering the critical tasks of a workflow and reducing data transferring between nodes; ii) work giving, a new dynamic planning strategy for migrating tasks from overloaded to unloaded nodes; and iii) task replication, which executes task replicas on different nodes for improving both execution time and fault tolerance. Experiments performed on a distributed computing environment composed of up to 1,024 processing nodes show that ADAGE achieves better performances than existing scheduling systems, obtaining an average reduction of up to 66% in execution time.
CITATION STYLE
Giampa, S., Belcastro, L., Marozzo, F., Talia, D., & Trunfio, P. (2021). A Data-Aware Scheduling Strategy for Executing Large-Scale Distributed Workflows. IEEE Access, 9, 47354–47364. https://doi.org/10.1109/ACCESS.2021.3067815
Mendeley helps you to discover research relevant for your work.