Reliable DAG scheduling on grids with rewinding and migration

Israel Hernandez; Murray Cole

Conference ProceedingsOPEN ACCESS

Reliable DAG scheduling on grids with rewinding and migration

GridNets 2007 - Proceedings of the 1st International Conference on Networks for Grid Applications (2007)

DOI: 10.4108/gridnets.2007.2137

6Citations

5Readers

Abstract

Fault tolerance is an important issue in Grid Computing as the availability of Grid resources can not be guaranteed. Effective scheduling methods must include fault tolerant mechanisms to preserve the execution of DAG applications, despite the presence of a processor failure. To address this, we designed the DAG rewinding mechanism, an event-driven process executed when a failure is detected at some rescheduling point. The rewinding mechanism preserves the execution of the application by recomputing and migrating those tasks which will disrupt the forward execution of succeeding tasks. The mechanism rewinds the progress of the application to a previous state, thereby preserving the execution despite the failed processor(s). This paper extends our work in the area by adding the rewinding mechanism to our previous dynamic scheduling methods GTP and GTP=c. We show how to integrate the rewinding mechanism within our dynamic execution models.

Author supplied keywords

Cite

CITATION STYLE

APA

Hernandez, I., & Cole, M. (2007). Reliable DAG scheduling on grids with rewinding and migration. In GridNets 2007 - Proceedings of the 1st International Conference on Networks for Grid Applications. Association for Computing Machinery, Inc. https://doi.org/10.4108/gridnets.2007.2137

Reliable DAG scheduling on grids with rewinding and migration

Abstract

Author supplied keywords

Cite

Register to see more suggestions