Running workflows in the cloud with Pegasus

Project Details

Project Lead
Gideon Juve 
Project Manager
Gideon Juve 
Project Members
Sepideh Azarnoosh  
Supporting Experts
Hyungro Lee  
Institution
University of Southern California, Information Sciences Institute  
Discipline
Computer Science (401) 

Abstract

In this work we intend to study the benefits and drawbacks of using cloud computing for scientific workflows. In particular, we are interested in the benefits of specifying the execution environment of a workflow application as a virtual machine image. Using VM images has the potential to reduce the complexity of deploying workflow applications in distributed environments, and allow scientists to easily reproduce their experiments. In addition, we are interested in investigating the challenges of on-demand provisioning for scientific workflows in the cloud.

Intellectual Merit

Cloud computing is an important platform for future computational science applications. It is particularly well-suited for loosely-coupled applications such as scientific workflows, which do not require the high-speed interconnects and large, shared file systems typical of existing HPC systems. However, many of the current generation of workflow tools have been developed for the grid and may not be ready for use in the cloud. Although the cloud has many potential benefits, it also brings many additional challenges. We plan to investigate the use of clouds for workflows to determine what tools and techniques the workflow community will need to develop so that scientists using workflow technologies can take advantage of cloud computing.

Broader Impacts

Many different science applications in physics, astronomy, molecular biology and earth science are using the Pegasus workflow management system in their research. These groups are interested in the potential benefits of cloud computing to improve the speed, quality, and reproducibility of their computational workloads. We intend to apply what we learn in using FutureGrid to develop tools and techniques to help scientists do their work better.

Scale of Use

I only need a few VMs. No more than 128 cores at a time.

Results

Gideon Juve, Ewa Deelman, Automating Application Deployment in Infrastructure Clouds, 3rd IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2011), 2011.

Jens-S. Vöckler, Gideon Juve, Ewa Deelman, Mats Rynge, G. Bruce Berriman, Experiences Using Cloud Computing for A Scientific Workflow Application, Proceedings of 2nd Workshop on Scientific Cloud Computing (ScienceCloud 2011), 2011.

Gideon Juve and Ewa Deelman, Wrangler: Virtual Cluster Provisioning for the Cloud, short paper, Proceedings of the 20th International Symposium on High Performance Distributed Computing (HPDC 2011), 2011.