Evaluation of MPI Collectives for HPC Applications on Distributed Virtualized Environments

Project ID
Project Categories
Non-Life Science
NSF Grant Number
Natural hazards such as earthquakes, tsunamis or hurricanes impact different levels of society. As a result, there is a critical need for providing accurate and timely information that can enable effective planning for and response to potential natural disasters. In this project, we consider as a main use case, the Weather Research and Forecasting (WRF), which is a state-of-the-art numerical modeling application developed by the National Center for Atmospheric Research (NCAR) for both operational forecasting and atmospheric research. A measure of its significance and success is that many meteorological services and researchers worldwide are rapidly adopting WRF. WRF simulations demand a large amount of computational power in order to enable practical (i.e. accurate and high-resolution) simulations. Since the window of time between the detection of a hurricane and its arrival is relatively small, these simulations need to execute as rapidly as possible. This necessitates that the inter-process communication time delay be as small as possible in order to speed up the time to completion. However, on distributed virtualized environments the overhead of communication increases due to the abstraction layer. We intend to explore how the MPI collective communication is affected by this abstraction and where the bottlenecks occur. This would help in understanding which short-comings need to be addressed in introducing HPC applications, like WRF, to distributed virtualized environments (i.e. Clouds) where the scalability and on-demand resource provisioning may prove vital for urgent need.
Use of FutureSystems
FutureGrid will be utilized as our distributed, virtualized environment. The FutureGrid infrastructure provides a high performance computing platform where virtual machines can take advantage of the high-speed switches, faster CPUs and buses. We plan to provision a virtualized cluster on FutureGrid and evaluate the performance of the WRF application at large scale. Thus we intend to use FutureGrid resources as a high performance, scalable, virtualized cluster.
Scale of Use
We will need a few VMs to setup the environment and prepare the actual experiments. It may take some weeks.
Then, we will run a set of run at different scales using different systems and different configurations. Analysis will be performed between experiments in order to evaluate only key scenarios of interest. This process may take some months but the use of the resources will not be continued.