Testing of Network Facing Services for the Open Science Grid

Project Details

Project Lead
Igor Sfiligoi 
Project Manager
Igor Sfiligoi 
Project Members
Igor Sfiligoi  
Institution
University of California San Diego, Physics Department  
Discipline
Computer Science (401) 
Subdiscipline
11.07 Computer Science 

Abstract

The Open Science Grid (OSG) selects and distributes software for its user community. One of the activities is to validate the behavior of network-facing services beyond the currently deployed scenarios. One of the most difficult things to test is the effect of unreliable networking, so FutureGrid provides an excellent platform for this activity. We plan to make extensive use of the Network Impairment Device to verify the behavior of the main network facing services used in OSG.

Intellectual Merit

The Open Science Grid (OSG) is a production distributed computing environment and relies on several network facing services for its operation. The network traffic arriving to these services is chaotic in nature, being driven by O(1k) users who run O(100k) jobs containing user provided applications and scheduled over extended periods of time. These services thus must be able to gracefully handle such traffic patterns in order to provide value to the users. OSG is actively testing the software providing such services in both expected and edge conditions, to verify their behavior and report any significant problems to the providers of such software. One aspect of such testing is to verify the behavior in the presence of network problems. These tests are very hard to simulate in production environments, while they should be easy with the FutureGrid's Network Impairment Device. The net result will be better understanding of the behavior of the network facing services used in OSG in such edge conditions, and possibly improved software that addresses any problems found.

Broader Impacts

The Open Science Grid (OSG) provides computing resources for a wide breath of sciences. By exercising the critical components of OSG in extreme conditions, and reporting to the software providers any problems found, we will significantly reduce the chance of those services experiencing a downtime in production environment, thus increasing the amount of science being produced.

Scale of Use

Need the Network Imparment Device and a few nodes on each side. I will use the system for a few days at a time, with possibly significant dead times in between.