Optimizing Shared Resource Contention in HPC Clusters

Project Information

Computer Science (401) 

Contention for shared resources in HPC clusters occurs when jobs are concurrently executing on the same multicore node (there is a contention for allocated CPU time, shared caches, memory bus, memory controllers, etc.) and when jobs are concurrently accessing cluster interconnects as their processes communicate data between each other. The cluster network also has to be used by the cluster scheduler in a virtualized environment to migrate job virtual machines across the nodes. We argue that contention for cluster shared resources incurs severe degradation to workload performance and stability and hence must be addressed. We also found that the state-of-the-art HPC cluster schedulers are not contention-aware. The goal of this work is the design, implementation and evaluation of a scheduling framework that optimizes shared resource contention in a virtualized HPC cluster environment.

Intellectual Merit

The proposed research demonstrates how the shared resource contention in HPC clusters can be addressed via contention-aware scheduling of HPC jobs. The proposed framework is comprised of a novel scheduling algorithm and a set of Open Source software that includes the original code and patches to the widely-used tools in the field. The solution (a) allows an online monitoring of the cluster workload and (b) provides a way to make and enforce contention-aware scheduling decisions on practice.

Broader Impacts

This research suggests a way to upgrade the HPC infrastructure used by U.S. academic institutions, industry and government. The goal of the upgrade is a better performance for general cluster workload.

Project Contact

Project Lead
Sergey Blagodurov (blagodurov) 
Project Manager
Sergey Blagodurov (blagodurov) 

Resource Requirements

Hardware Systems
  • alamo (Dell optiplex at TACC)
  • hotel (IBM iDataPlex at U Chicago)
  • india (IBM iDataPlex at IU)
  • sierra (IBM iDataPlex at SDSC)
  • xray (Cray XM5 at IU)
Use of FutureGrid

I would like to perform experiments on the FutureGrid hardware to test the efficiency of the proposed solution. To do that, I was thinking of doing the following: 1) Create an image that contains a Linux distribution, a set of widely-used software tools and the code of my framework like so: (a) The Linux distro is preferably Gentoo (but can be different). In it, I need to install a number of standard Linux packages, turn some kernel options on and modify the default Linux perf tool with my framework patches. (b) The cluster software I use is Torque for Resource Manager and Maui for cluster scheduler. I need to modify them with my framework patches. For the scheduling algorithm, I use Matlab and/or Choco. (c) I also need to install a user-level monitoring daemon (original code). 2) Book 16-64 nodes from any FutureGrid resource that supports HPC (e.g. india, alamo, etc.) in exclusive mode for several hours. 3) Netboot them bare-metal with the created image (via xCat), thus creating a small HPC cluster out of the booked nodes. 4) Run MPI workload (mostly SPEC MPI 2007) on this cluster with and without contention-awareness. 5) Gather the workload execution time, amount of resources consumed. The ability to measure power consumption via tools like HP iLO3 or Dell iDRAC is highly desirable. 5) Analyze the results, make modifications as necessary. I am planning to only run multinode experiments on FutureGrid resources. I can perform small modifications and testing on my lab machines. It is the scale (16-64) nodes that I'm looking for in FutureGrid. I do require baremetal access to the nodes as opposed to VM access with Grid Aplliances. The reasons are: (a) I have made various modifications to the source code of Linux, Maui and Torque to implement my solution, so I would like to recreate the experimental setup from my lab machines as fully as possible. (b) It is usually very hard to get access to the baremetal hardware counters from guest OSes (running on Xen or KVM). My solution relies on the counters. (c) I would like to use container-based virtual solution with OpenVZ as opposed to Xen or KVM.

Scale of Use

Book 16-64 nodes from any FutureGrid resource that supports HPC (e.g. india, alamo, etc.) in exclusive mode for several hours.

Project Timeline

01/30/2012 - 18:26