Cloud Virtualization Environment Analysis towards High Performance Storage Solutions

Project Details

Project Lead
Malek Musleh 
Project Manager
Malek Musleh 
Project Members
Vijay Pai, , John Paul Walters  
Supporting Experts
Purdue University, ECE  
Engineering, n.e.c. (114) 
14.09 Computer Engineering 


Infiniband, as a high-bandwidth, low-latency network interconnect has for most of the last decade been regarded as the fabric of choice for HPC clusters. It is deployed in many commodity cluster, and as of TOP500 list of June 2011, used as the communication network in 41.2% of all systems. However, even as Infiniband usage continues to grow, several factors continue to hinder full utilization of the technology’s capabilities. Recent advances in virtualization tools, namely Single Root I/O Virtualization (SR-IOV), have significantly reduced performance overhead. Specifically, SR-IOV exposes the host machine’s physical cards to multiple virtual machines (VMs), rather than requiring full emulation or passthrough of the device into the VM. This advancement allows VMs to not only concurrently share the same physical device, but does so with minimal overhead as compared to previous virtualization techniques that impose 10-15% performance penalty. We intend to perform an evaluation study of network interconnect performance & overhead analysis under different virtual- ization environments. Specifically, we are looking to begin evaluating Infiniband performance of HPC & Cloud Applications on both native and virtual machines. Further analysis includes evaluating parallel file system (PFS) implementations such as Lustre, and glusterFS towards providing a more effective cloud environment integration.

Intellectual Merit

Our project will be the first to provide a comprehensive evaluation of virtualization impact on both application and network performance. Isolating and subsequently categorizing and sampling the frequency of measured performance overheads provides multiple benefits. First, developers will have a better sense of where the performance bottlenecks are, and how they can be optimized. Second, application users will be able to gauge how certain applications will perform under different environmental constraints.

Broader Impacts

Our project may pave the way for the HPC community to adopt virtualization techniques if it is shown that virtualized performance is on par with native for majority of execution time. It will also allow current users of virtualization, primarily the Cloud IT community to better schedule and allocate resources depending on the expected virtualization impact.

Scale of Use

I want to run a set of comparisons on the entire set of allocated machines, for which we may need 2-3 weeks. This time period should include installation and setup time as well.