CCTools Scalability Testing

Project Details

Project Lead
Douglas Thain 
Project Manager
Dinesh Rajan Pandiarajan 
Project Members
Dinesh Rajan Pandiarajan, Peter Sempolinski, Patrick Donnelly, Michael Albrecht, Chris Bauschka, Iheanyi Ekechukwu, Joe Fetsch, Li Yu, Kyle Mulholland, Rob Wirthman, Nate Wickham, Casey Robinson, Nick Jaeger, Benjamin Tovar, Nicholas Hazekamp  
Supporting Experts
Gregor von Laszewski  
Institution
University of Notre Dame, Department of Computer Science and Engineering  
Discipline
Computer Science (401) 

Abstract

This FG allocation will enable extended scalability and correctness testing of the Cooperative Computing Tools, a software project supported by the NSF SI2 program. The CCTools software enable non-privileged users to harness hundreds to thousands of cores from multiple clusters, clouds, and grids simultaneously. The main components of the software package include Parrot, a virtual file system that interfaces with multiple distributed storage systems, and Makeflow, a workflow engine that interfaces with multiple computing systems. Using existing services (such as the NMI Build and Test Lab) we are currently able to perform basic verification of portability across operating systems. However, full functionality testing requires regular access to a reproducible distributed system to verify, e.g., that the software can achieve the desired throughput at the scale of 1000 cores. Using FG, we will establish a distributed testing methodology to obtain rigorous quality control in our software development process.

Intellectual Merit

To our knowledge, there is no well-established methodology -- much less software -- for evaluating the correctness of distributed systems at scale in a continuous integration environment. This project will break new ground in the distributed testing and evaluation of complex software.

Broader Impacts

This FG allocation will enhance the impact of an existing NSF award, which supports a variety of high impact scientific applications in fields such as bioinformatics, biometrics, data mining, high energy physics, and molecular dynamics. Users of these applications run on a wide variety of infrastructure, ranging from national scale (XSEDE and OSG) to local private clusters.

Scale of Use

For continuous build activities: Up to 10 VMs continuously. For distributed scalability and correctness testing: Burst to hundreds of VMs for a day every few weeks. Burst to thousands of VMs for a few days several times a year.