Eeny meeny miny moe: Choosing the fault tolerance technique for my cloud workflow

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Scientific workflows are models composed of activities, data and dependencies whose objective is to represent a computer simulation. Workflows are managed by Scientific Workflow Management System (SWfMS). Such workflows commonly demand for many computational resources once their executions may involve a number of different programs processing a huge volume of data. Thus, the use of High Performance Computing (HPC) environments allied to parallelization techniques provides the support for the execution of such experiments. Some resources provided by clouds can be used to build HPC environments. Although clouds offer advantages such as elasticity and availability, failures are a reality rather than a possibility. Thus, SWfMS must be fault-tolerant. There are several types of fault tolerance techniques used in SWfMS such as checkpoint-restart and replication, but which fault tolerance technique best fits with a specific workflow? This work aims at analyzing several fault tolerance techniques in SWfMSs and recommending the suitable one for the user’s workflow using machine learning techniques and provenance data, thus improving resiliency.

Cite

CITATION STYLE

APA

de Jesus, L. A., Drummond, L. M. A., & de Oliveira, D. (2018). Eeny meeny miny moe: Choosing the fault tolerance technique for my cloud workflow. In Communications in Computer and Information Science (Vol. 796, pp. 321–336). Springer Verlag. https://doi.org/10.1007/978-3-319-73353-1_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free