V3VEE Project

Project Details

Project Lead
Peter Dinda 
Project Manager
Peter Dinda 
Project Members
ZHENG CUI, Lei Xia, Kyle Hale, Maciej Swiech  
Supporting Experts
Gregor von Laszewski  
Institution
Northwestern University, EECS  
Discipline
Computer Science (401) 
Subdiscipline
14.09 Computer Engineering 

Abstract

We plan to use FutureGrid to help to evaluate virtualization technologies for high performance computing. In particular we seek a testbed for scaling studies involving our Palacios VMM and its components (e.g. the VNET/P overlay).

Intellectual Merit

The V3VEE project (v3vee.org) is creating a virtual machine monitor framework for modern architectures (those with hardware virtualization support) that will permit the compile-time creation of VMMs with different structures, including those optimized for computer architecture research and use in high performance computing. V3VEE began as an NSF-funded collaborative project between Northwestern University and the University of New Mexico. It currently involves five DOE-funded partner institutions: Northwestern University, the University of New Mexico, the University of Pittsburgh, Sandia National Laboratories, and Oak Ridge National Laboratory. V3VEE is a community resource development effort that anyone can contribute to.

Broader Impacts

The infrastructure developed in the V3VEE project is used extensively in research and education. The codebase is freely available and BSD licensed. Underrepresented groups are impacted through Northwestern's AGEP program and through U NM, a minority serving university. More information on the project can be found at http://v3vee.org.

Scale of Use

There are four phases I envision currently: 1. We will want to log in to the various resources (or have someone do so) to interrogate the hardware. Palacios and VNET/P have some specific hardware requirements, and we must first determine which FG hardware would work. This will take only an hour per environment and we only need a single representative machine in each cluster. 2. We will initially need a small number of machines (2, say) to bring up Palacios and VNET/P in the FG environment. This will let us create a configuration (either kernel module + images + tools) or a whole OS image that can then be repllicated. The time to do this depends a lot on the hurdles that might be encountered. Anywhere from a day to a month. We would know very quickly if things will go fast. 3. Scaling studies. Here, we would use a single cluster (the largest possible) to study the performance of benchmarks and applications as a function of scale. The following paper describes the experimental protocol: L. Xia, et al, VNET/P: Bridging the Cloud and High Performance Computing Through Fast Overlay Networking, HPDC 2012 (and tech report), available from v3vee.org shows the test suite we would likely use. Based on our experience with running on Red Storm, I would anticipate this would take several days. 4. Possibly, we would want to do some cross-cluster experiments. This would depend on the challenges of porting (2), and if we do it, it would consume a couple of clusters for a day or two.

Results

All V3VEE project papers, presentations, and the Palacios codebase are available from v3vee.org. The most relevant papers for this proposal are:

  1. L. Xia, Z. Cui, J. Lange, Y. Tang, P. Dinda, P. Bridges, VNET/P: Bridging the Cloud and High Performance Computing Through Fast Overlay Networking, Proceedings of the 21st ACM Symposium on High-performance Parallel and Distributed Computing (HPDC 2012), accepted, to appear. (also TR version) J. Lange, P. Dinda, K. Hale, L. Xia, An Introduction to the Palacios Virtual Machine Monitor---Version 1.3, Technical Report NWU-EECS-11-10, Department of Electrical Engineering and Computer Science, Northwestern University, November, 2011.
  2. J. Lange, K. Pedretti, P. Dinda, P. Bridges, C. Bae, P. Soltero, A. Merritt, Minimal Overhead Virtualization of a Large Scale Supercomputer, Proceedings of the 2011 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2011), March, 2011.
  3. J. Lange, K. Pedretti, T. Hudson, P. Dinda, Z. Cui, L. Xia, P. Bridges, A. Gocke, S. Jaconette, M. Levenhagen, R. Brightwell, Palacios and Kitten: New High Performance Operating Systems for Scalable Virtualized and Native Supercomputing, Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), April, 2010.