Federating HPC, Cyberinfrastructure and Clouds using CometCloud

Project Details

Project Lead
Javier Diaz Montes 
Project Manager
Javier Diaz Montes 
Project Members
Moustafa AbdelBaky, Ivan Rodero, Mengsong Zou, Daihou Wang, Jaroslaw Zola, Hoang Bui, Mehmet Aktas, Alejandro Pelaez, Rafael Tolosana Calasanz, Manuel Diaz-Granados, Ali Reza Zamani  
Institution
Rutgers, The State University of New Jersey, Rutgers Discovery Informatics Institute (RDI2) / NSF Center for Cloud and Autonomic Computing (CAC)  
Discipline
Computer Science (401) 
Subdiscipline
11.04 Information Sciences and Systems 

Abstract

Clouds are rapidly joining high-performance Grids as viable computational platforms for scientific exploration and discovery, and it is clear that production computational infrastructures will integrate both paradigms in the near future. As a result, understanding usage modes that are meaningful in such a hybrid infrastructure is critical. For example, there are interesting application workflows that can benefit from such hybrid usage modes to, perhaps, reduce times to solutions, reduce costs (in terms of currency or resource allocation), or handle unexpectedruntime situations (e.g. unexpected delays in scheduling queues or unexpected failures). The primary goal of this project is to use CometCloud to create a federation that integrate resources from different infrastructures namely FutureGrid, XSEDE, Grid’5000, Rutgers Discovery Informatics Institute resources and, possibly, resources from other collaborating institutions. Moreover, the infrastructure will enable scale out to public clouds such as Amazon EC2 on-demand. In this way, we will be able to expose the different cyber-infrastructure ecosystems to scientific and engineering applications as Cloud services. Additionally, scientifics will be able to create their own applications on top of CometCloud using different programming models like master/worker, workflows or map/reduce.

Intellectual Merit

The proposed federation will be built on top of the CometCloud [1] dynamic infrastructure, which has been effectively used to federate US cyber-infrastructure such as XSEDE, OSG, FutureGrid, NERSC and Amazon EC2 resources. CometCloud is an autonomic computing engine that enables dynamic and on-demand federation of computational resources as well as the deployment and robust execution of applications on federated infrastructures. It combines highly heterogeneous and dynamic Cloud/Grid infrastructures, enabling the integration of public/private Clouds and autonomic Cloudbursts, i.e., dynamic scale-out to Clouds to address extreme requirements such as heterogeneous and dynamics workloads, and spikes in demands [2]. The CometCloud programming layer provides a platform for application development and management. It supports a range of paradigms including MapReduce, Workflow, and Master/Worker/BOT. [1] CometCloud web site. http://nsfcac.rutgers.edu/CometCloud/ [2] H. Kim, Y. el-Khamra, S. Jha, I. Rodero and M. Parashar, “Autonomic Management of Application Workflow on Hybrid Computing Infrastructure”, Scientific Programming, 19(2-3): 75-89 (2011).

Broader Impacts

We intend to explore different computational models to better support Science. Furthermore, users will be able to develop and run their applications on top of CometCloud to use the whole Federation (as long as they have access to such resources).

Scale of Use

We will need a few VMs to setup the environment and prepare the actual experiments. It may take some weeks. Then, we will run a set of run at different scales using different systems and different configurations. Analysis will be performed between experiments in order to evaluate only key scenarios of interest. This process may take some months but the use of the resources will not be continued.