HPC Scheduling

Project Details

Project Lead
Kenneth Yoshimoto 
Project Manager
Kenneth Yoshimoto 
Project Members
Subhashini Sivagnanam, Ismael Farfán  
Institution
UCSD, SDSC  
Discipline
Biosciences, n.e.c. (617) 
Subdiscipline
11.01 Computer and Information Sciences, General 

Abstract

Catalina is an open source external scheduler for use with resource managers such as LoadLeveler, PBS, and SLURM. It's capabilities include: system reservations, user reservations, user-settable reservations, standing reservations, priority-ordered queueing, and scheduling policies, Grid Universal Remote (GUR) and Master Control Program (MCP) are metascheduling open source tools that are being currently used in TeraGrid. Catalina has been used in production for almost a decade on NSF supercomputers (Blue Horizon, DataStar, SDSC IA64) for local scheduling. To accomodate network topologies (3D-torus, with leaf nodes) extension of Catalina with topology-aware scheduling would greatly benefit scheduling of workloads on 3D-torus machines. GUR is a tool for conducting multi-site,coordinated calculations. It has been used to create synchronized reservations across separate compute clusters to run single MPI jobs. Originally, GUR had the capability to stage data and remotely compile code. These capabilities are being revamped. GUR requires user-settable reservations on the local scheduler. Moab has this feature, as does the Catalina Scheduler. If virtualization is available, we would use that feature to bring up virtual Catalina clusters for testing and development. http://www.teragrid.org/userinfo/jobs/gur.php MCP is a command line utility that provides automatic resource selection for running a single parallel job on high performance computing resources. MCP optimizes job start times by submitting copies of a job and canceling the extra job request after one copy begins to execute. We would use FutureGrid to explore using MCP for ensemble runs. http://www.teragrid.org/userinfo/jobs/mcp.php We would like to use FutureGrid as a test bed to further develop Catalina/GUR/MCP with access to a wide variety of experimental and production architectures. Software requirements include Python 2.2 or greater (2.6.4 is current version), expect, C compiler, and openssh for communication. It would be nice to have the Remote Login kit from CTSS, but we can get by with openssh instead.

Intellectual Merit

Catalina, GUR, and MCP development will lead to novel strategies of job scheduling on local clusters and metascheduling across geographically distributed clusters. Currently, GUR and MCP are the TeraGrid CTSS components deployed for co-scheduling and metascheduling. Catalina is the scheduler for the Trestles machine.

Broader Impacts

Catalina scheduling would benefit all jobs with improved turnaround, better utilization and increased performance (from correct topology scheduling for efficient communication). GUR and MCP have been integrated with Globus infrastructure and are therefore suitable for the wide range of compute resources accesible through that middleware. Any scientific application that requires co-scheduling or metascheduling can benefit from GUR and MCP capabilities. Improvement in GUR and MCP would allow more efficient use of grids.

Scale of Use

A moderate to high number of very small footprint VMs to simulate an HPC cluster. (I'm not actually going to run work on the VMs, so they don't need a lot of cputime nor memory.)

Results


Topology scheduling:  To support 3d-torus switch clusters, such as the SDSC Gordon cluster,
topology-aware scheduling code was added to the Catalina Scheduler http://www.sdsc.edu/catalina.
When the relationship of nodes to each other, rather than simply individual node attributes,
affects job performance, it becomes necessary to generate schedules using relationship information.
A method to do this for arbitrary switch topologies was added to Catalina Scheduler. The Futuregrid
Nimbus capability was used to create virtual clusters for scheduler development and testing.
Without this facility, this development effort would not have been possible.
This will allow more efficient resource allocation of resources to jobs, depending on job switch requirements.

Virtual Machine Scheduling:  Use of Virtual Machines in batch systems to increase scheduling
flexibility has been proposed by others.  With the goal of developing a prototype system
capable of doing this, existing OpenNebula facilities were used to start development of
a SLURM-based http://www.schedmd.com VM scheduling system, with suspend/resume
and migrate capabilities.  Because we needed access to low-level OpenNebula functions,
Futuregrid very graciously accomodated us with direct access to compute resources.
Using these resources, we were able to develop the system to the point where SLURM
and a prototype external scheduler are able to convert a job script to a set of OpenNebula
VM machine specifications and start the job with those VMs.