Cloud-Based Support for Distributed Multiscale Applications

Project Details

Project Lead
Katarzyna Rycerz 
Project Manager
Katarzyna Rycerz 
Project Members
Pawel Pierzchala, Maciej Malawski, Marcin Nowak, Pawel Koperek, Darko Chadievski, Wojciech Kruczkowski  
Supporting Experts
Javier Diaz Montes, Saliya Ekanayake  
Institution
AGH, Krakow, Institute of Computer Science  
Discipline
Computer Science (401) 

Abstract

Multiscale modeling is one of the most significant challenges which science faces today. The goal of our research is to build an environment that supports composition of multiscale simulations from single scale models encapsulated as scientific software components and distributed in the grid and cloud e-infrastructures. We plan to construct a programming and execution environment for such applications. We are going to investigate and integrate solutions from: 1. virtual experiment frameworks, such as the GridSpace Platform (http://dice.cyfronet.pl/gridspace) 2. tools suporting multiscale computing such as MUSCLE (http://muscle.berlios.de) 3. Cloud, Grid and HPC infrastructures We plan to extend the capabilities of the GridSpace platform developed as a basis for the Virtual Laboratory in the ViroLab project (http://www.virolab.org) and currently further developed in Mapper project (http://www.mapper-project.eu). GS is a framework enabling researchers to conduct virtual experiments on Grid-based resources, Cloud resources and HPC infrastructures. We have already performed several experiments using GridSpace with multiscale simulations: 1. modules taken from AMUSE framework (http://www.amusecode.org) were orchestrated by a GridSpace experiment and communicated using High Level Architecture (IEEE standard 1516) [ComHLA]; 2. modules of the computational biology application were orchestrated by GridSpace experiment and communicated using MUSCLE. Both experiments shared a local cluster with a Portable Batch System (PBS). Thanks to Future Grid resources we hope to acquire the possibility to experiment and compare results of multiscale simulations on Cloud resources. As case studies, we plan to investigate the following applications: 1. In-stent Restenosis; an application which simulates biological responses of cellular tissue for the treatment of atheriosclerosis based on complex automata [ISR]. 2. The Nanopolymer application which uses the LAMMPS Molecular Dynamics Simulator (http://lammps.sandia.gov/). 3. The Brain Aneurism application from the VPH-Share project, which attempts to model cerebral blood flow dynamics (http://uva.computationalscience.nl/research/projects/vph-share); References: [GS] E. Ciepiela, D. Harezlak, J. Kocot, T. Bartynski, M. Kasztelnik, P. Nowakowski, T. Gubała, M.Malawski, M. Bubak; Exploratory Programming in the Virtual Laboratory, in Proceedings of the International Multiconference on Computer Science and Information Technology pp. 621–628 [ISR] Alfons G. Hoekstra, Alfonso Caiazzo, Eric Lorenz, Jean-Luc Falcone, Bastien Chopard, Complex Automata: multi-scale Modeling with coupled Cellular Automata, in A. G. Hoekstra, J. Kroc, and P. M. A. Sloot (Eds.) Modelling Complex Systems with Cellular Automata, Spinger Verlag, July 2010. [ComHLA] K. Rycerz, M. Bubak, P. M. A. Sloot: HLA Component Based Environment For Distributed Multiscale Simulations In: T. Priol and M. Vanneschi (Eds.), From Grids to Service and Pervasive Computing, Springer, 2008, pp. 229-239

Intellectual Merit

Using cloud computing for scientific applications remains an open field. In our research we focus on multiscale simulations and how they can benefit from cloud solutions. We have already conducted preliminary experiments using GridSpace with cloud computing [ICCS11]. Moreover, we have experimented with multiscale simulations using various supporting tools (MUSCLE, HLA), local resource management systems on a cluster (PBS) and scripting languages (distributed Ruby). We have designed and implemented environment supporting building and execution of multiscale applications consisting of HLA-based components in the Grid environment (with test on DAS3 http://www.cs.vu.nl/das3/ infrastructure)[HLA11]. We have also developed a tool based on distributed Ruby that orchestrates setting up MUSCLE-based multiscale applications using PBS allocation. In this project we would like to: 1. design a system supporting execution of multiscale applications on cloud resources, 2. comparatively evaluate the performance of local cluster approach with cloud-based solutions, namely: performance of their resource management mechanims, multiscale applications execution performance, 3. study the differences between various cloud computing stacks and asses the relevant programming models. 4. design a user friendly interface, suitable for scientists working on multiscale problems (computational biologists, physicists) without computer science background. The aim is to extend our knowledge on scientific multiscale applications and how their requirements can be matched to the features of cloud computing. We plan to investigate a solution that allows to run each of the modules from multiscale simulation on different VMs and communicate using various mechanisms - either direct (as in MUSCLE or HLA) or indirect (file system, database, messaging systems). Part of this work will be performed in the scope of M.Sc. theses prepared by students from the AGH University of Technology. The results will also be published in peer-reviewed journals and conferences. [ICCS11] M. Malawski, J. Meizner, M. Bubak, P. Gepner, Component Approach to Computational Applications on Clouds, accepted for publication at the ICCS 2011 conference [HLA11] K. Rycerz, M.Bubak, Collaborative Environment For HLA Component-Based Distributed Multiscale Simulations accepted by: W. Dubitzky at al “Large Scale Computing Technologies for Complex System Applications”, Wiley&Sons.

Broader Impacts

The proposed activity will impact several domains of research infrastructure, teaching and society. The proposed environment will contribute towards better use of novel cyberinfrastructure technologies such as cloud computing to a new class of multiscale applications. The results are planned to be adapted into academic curricula of the partner universities, where involvement of undergraduate and graduate students is significant. Finally, the applications we are supporting are of great importance to society, since they tackle vital problems of modern medicine, including cardiovascular issues related to restenosis and aneurysms. The results of this project will be published in peer-reviewed journals (e.g. Future Generation Computer Systems, International Journal of High Performance Computing Applications, International Journal for Multiscale Computational Engineering etc.) and conferences (e.g. International Conference on Computational Science). Additionally, the work conducted within the project will be used in two Master of Science theses. The results of the project will also be exploited as educational aids in computer science courses at AGH and UvA.

Scale of Use

We would like to use the Eucalyptus installation on India and Sierra clusters and compare the results with HPC jobs (PBS). For instent restenosis application we plan to run about 8-10 VMs for a single experiement (run on 8-10 nodes). For prototyping and development, we plan to run a set of simple experiments that will not consume much resources. For performance tests we plan to conduct larger experiment (execution time on the order of 72 hours, 4 GB of output data.) For nanopolymer application we will need ca. 32 nodes. For aneurysm simulation applications we will need ca. 128 nodes. Additionally, we would like to compare Eucalyptus with Nimbus, OpenStack and OpenNebula. We would require approximately 12 months to develop and test the whole system.

Results


The one part of our research was aimed at investigating the usability of business metrics for scaling policies in the cloud using the SAMM monitoring and management system. This system allows for autonomous decision making on the actions to be applied to the
monitored systems based on the retrospective analysis of their behavior over a period of time.   The development work and tests were carried out using the FutureGrid project environment .The India Eucalyptus cluster was
used. The following virtual machine types were provided: small - 1 CPU, 512 MB of RAM, 5 GB of storage space, medium  - 1 CPU, 1024 MB of RAM, 5 GB of storage space, large  - 2 CPUs, 6000 MB of RAM, 10 GB of storage space, xlarge - 2 CPUs, 12000 MB of RAM, 10 GB of storage space, xlarge - 8 CPUs, 20000 MB of RAM, 10 GB of storage space. The test  involved a numerical integration algorithm, while exploiting a master-slave paradigm.

The cluster is built up from 50 nodes and each node is able to run up to 8 small instances. Slave nodes in our application do not use much storage space and memory. To have got a fine-grained level of the management of the computing power, we decided to use small instances for them. The Master node application had higher memory requirements, thus we deployed it on a medium instance.  To evaluate the quality of our approach we compared two strategies of automatic scaling. The first one exploits a generic metric - the CPU usage. The second strategy uses a business metric . The average time spent by computation requests while waiting in Slave Dispatcher's queue for processing. Upper or lower limits for such a metric could be explicitly included in a Service Level Agreement}, e.g. the service provider might be obliged to ensure that a request won't wait for processing for longer than one minute. In case the computing power was not suffiecient to process a task, additional virtual machines were launched.
The infrastructure was used in parallel with other users, thus, e.g., the startup time of virtual machines differed over time.

Our experiments on the FutureGrid infrastructure allowed to infer that using the average wait time metric has a positive impact on the system when considering  its operation from the business point of view. Since the end users are mostly interested in making the time required to wait as short as possible, the amount of  the resources involved should be increased according to this demand. By improving this factor, the potential business value of the presented service grows. The system was automatically scaled by SAMM not only from the technical point of view but also from the business value perspective.

The results of our research are presented in two papers [1,2].

References

[1] Koperek, P., Funika, W. Automatic Scaling in Cloud Computing Environments Based on Business Metrics, in: Proc. of  International
Conference on Parallel Programming and Applied Mathematics (PPAM'2011), 11-14 September 2011, Torun, Poland, LNCS, Springer, 2012 (to be
published)

[2] Funika, W., Koperek, P. Scalable Resource Provisioning in the Cloud Using Business Metrics, in: Proc. of the Fifth International Conference on
Advanced Engineering Computing and Applications in Sciences (ADVCOMP'2011), 20-25 November 2011, Lisbon, Portugal, 2011 (to be
published)

The other part of our research was to invesigate possibility of usage of FG resources for multiscale  MUSCLE-based applications (in particular Instent restenosis application). We have already developed a system based on Amazon AWS API and we are in a process of testing it on FG resources (Eucaliptus).  This is still ongoing work.