ReCAP: Workflow reproducibility using Cloud-aware Provenance

Project Details

Project Lead
Khawar Ahmad 
Project Manager
Khawar Ahmad 
Institution
University of the West of England (UWE), FET, CCCS  
Discipline
Computer Science (401) 

Abstract

The transformations, analyses and interpretations of data in scientific workflows are vital for the repeatability and reliability of scientific workflows. This provenance of scientific workflows has been effectively carried out in Grid based scientific workflow systems. However, recent adoption of Cloud-based scientific workflows present an opportunity to investigate the suitability of existing approaches or propose new approaches to collect provenance information from the Cloud and to utilize it for workflow repeatability in the Cloud infrastructure. The dynamic nature of the Cloud in comparison to the Grid makes it difficult because resources are provisioned on-demand unlike the Grid. This research presents a novel approach that can assist in mitigating this challenge. This approach can collect Cloud infrastructure information along with workflow provenance and can establish a mapping between them. This mapping is later used to re-provision resources on the Cloud. The repeatability of the workflow execution is performed by: (a) capturing the Cloud infrastructure information (virtual machine configuration) along with the workflow provenance, and (b) re-provisioning the similar resources on the Cloud and re-executing the workflow on them.

Intellectual Merit

This is the research work of my PhD study. The idea of this research has been discussed in the following paper. This research is still in progress. Hasham, K.; Munir, K.; Shamdasani, J.; McClatchey, R., "Scientific Workflow Repeatability through Cloud-Aware Provenance," in Utility and Cloud Computing (UCC), 2014 IEEE/ACM 7th International Conference on , vol., no., pp.951-956, 8-11 Dec. 2014

Broader Impacts

The aim of this research is to assist researchers in reproducing workflow-based experiment executions on the Cloud infrastructure. This further helps in achieving reproducible research for workflow-based experiments executed on the Cloud.

Scale of Use

A few VMs (in the range of 40-50 instances) for conducting test workflow such as Montage experiments regarding workflow execution on the Cloud.