Examining the Challenges of Scientific Workflows
Computer (2007)
- ISSN: 00189162
- DOI: 10.1109/MC.2007.421
Available from eprints.ecs.soton.ac.uk
or
Abstract
Workflows have recently emerged as a paradigm for representing and managing complex distributed scientific computations and therefore accelerate the pace of scientific progress. A recent workshop on the Challenges of Scientific Workflows, sponsored by the National Science Foundation and held on May 1-2, 2006, brought together domain scientists, computer scientists, and social scientists to discuss requirements of future scientific applications and the challenges that they present to current workflow technologies. This paper reports on the discussions and recommendations of the workshop, the full report can be found at http://www.isi.edu/nsf-workflows06.
Page 1
Examining the Challenges of Scientific Workflows
Examining the Challenges of Scientific Workflows Yolanda Gil1, Ewa Deelman1, Mark Ellisman2, Thomas Fahringer3, Geoffrey Fox4, Dennis Gannon4, Carole Goble5, Miron Livny6, Luc Moreau7, Jim Myers8 1USC Information Sciences, 2University of California San Diego, 3Innsbruck University, 4Indiana University, 5Manchester University, 6University of Wisconsin Madison, 7University of Southampton, 8 National Center for Supercomputing Applications gil@isi.edu, deelman@isi.edu, mellisman@ucsd.edu, tf@dps.uibk.ac.at, gcf@grids.ucs.indiana.edu, gannon@cs.indiana.edu, carole.goble@manchester.ac.uk, miron@cs.wisc.edu , L.Moreau@ecs.soton.ac.uk, jimmyers@ncsa.uiuc.edu Abstract Workflows have recently emerged as a paradigm for representing and managing complex distributed scientific computations and therefore accelerate the pace of scientific progress. A recent workshop on the Challenges of Scientific Workflows, sponsored by the National Science Foundation and held on May 1-2, 2006, brought together domain scientists, computer scientists, and social scientists to discuss requirements of future scientific applications and the challenges that they present to current workflow technologies. This paper reports on the discussions and recommendations of the workshop, the full report can be found at http://www.isi.edu/nsf-workflows06. 1. Introduction Significant scientific advances are increasingly achieved through complex sets of computations and data analyses. These computations may comprise thousands of steps, where each step may integrate diverse models and data sources developed by different groups. The applications and data may be also distributed in the execution environment. The assembly and management of such complex distributed computations present many challenges, and increasingly ambitious scientific inquiry is continuously pushing the limits of current technology. Workflows have recently emerged as a paradigm for representing and managing complex distributed scientific computations and therefore accelerate the pace of scientific progress [1,2,3,4,5,6]. Scientific workflows capture the individual data transformations and analysis steps as well as the mechanisms to carry them out in a distributed environment. Each step in the
workflow specifies a process or computation to be executed (e.g., a software program to be executed, a web service to be invoked). The steps are linked according to the data flow and dependencies among them. The representation of these computational workflows contain many details required to carry out each analysis step, including the use of specific execution and storage resources in distributed environments, Workflow systems can exploit these explicit representations of the complex computational processes to manage their lifecycle and to automate their execution. Workflows can capture complex analysis processes at various levels of abstraction, and also provide the provenance information necessary for scientific reproducibility, result publication, and result sharing among collaborators. By providing formalism and by supporting automation, workflows have the potential to accelerate and transform the scientific analysis process. Existing workflow systems have been demonstrated in a variety of scientific applications, were workflows composed of thousands of components processed large-distributed data sets on high-end computing resources. Some workflow systems have been deployed for routine use in scientific collaboratories. Figure 1 shows an image created by an astronomy application, Montage [7]. Montage uses workflow technologies [8] to generate science-grade mosaics of the sky. Such mosaics were recently used to verify a bar in the M31 galaxy [9]. Although there have been hints of a bar in M31 from optical data, none of the analyses were convincing because the effects of interstellar extinction at optical wavelengths were severe. However, the universe is much more transparent in the infrared, and this enabled astronomers to overcome the effects of interstellar extinction. There was one more problem: the variable background in the infrared images hid the structure of
IEEE Computer, vol. 40, no. 12, pp. 24-32, December, 2007.
workflow specifies a process or computation to be executed (e.g., a software program to be executed, a web service to be invoked). The steps are linked according to the data flow and dependencies among them. The representation of these computational workflows contain many details required to carry out each analysis step, including the use of specific execution and storage resources in distributed environments, Workflow systems can exploit these explicit representations of the complex computational processes to manage their lifecycle and to automate their execution. Workflows can capture complex analysis processes at various levels of abstraction, and also provide the provenance information necessary for scientific reproducibility, result publication, and result sharing among collaborators. By providing formalism and by supporting automation, workflows have the potential to accelerate and transform the scientific analysis process. Existing workflow systems have been demonstrated in a variety of scientific applications, were workflows composed of thousands of components processed large-distributed data sets on high-end computing resources. Some workflow systems have been deployed for routine use in scientific collaboratories. Figure 1 shows an image created by an astronomy application, Montage [7]. Montage uses workflow technologies [8] to generate science-grade mosaics of the sky. Such mosaics were recently used to verify a bar in the M31 galaxy [9]. Although there have been hints of a bar in M31 from optical data, none of the analyses were convincing because the effects of interstellar extinction at optical wavelengths were severe. However, the universe is much more transparent in the infrared, and this enabled astronomers to overcome the effects of interstellar extinction. There was one more problem: the variable background in the infrared images hid the structure of
IEEE Computer, vol. 40, no. 12, pp. 24-32, December, 2007.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
71 Readers on Mendeley
by Discipline
by Academic Status
46% Ph.D. Student
10% Researcher (at an Academic Institution)
8% Post Doc
by Country
25% United States
20% United Kingdom
15% Germany



