STAMPEDE

Project Details

Project Lead
Dan Gunter 
Project Manager
Dan Gunter 
Project Members
Gaurang Mehta, Taghrid Samak, Ahmed El-Hassany, Karan Vahi  
Institution
LBNL, Dan Gunter  
Discipline
Computer Science (401) 
Subdiscipline
30.06 Systems Science 

Abstract

Large-scale applications today make use of distributed resources to support computations and as part of their execution, generate large amounts of log information. Up to now, we have been using the Netlogger analysis tools to perform off-line log analysis. Stampede extends the current offline workflow log analysis capability and develops a comprehensive middleware solution that will allow users of complex scientific applications to track the status of their jobs in real time, to detect execution anomalies automatically, and to perform on-line troubleshooting without logging in to remote nodes or searching through thousands of log files.

Intellectual Merit

The system will be able to capture application-level logs from jobs as they are executing on the cyberinfrastructure. At the same time, it will also collect log information from the underlying cyberinfrastructure services, such as resource management and data transfer. These end-to-end logs will be combined and brokered through a subscription interface. External components will use the subscription interface to provide monitoring services.

Broader Impacts

We build on an important class of applications, scientific workflows, that are being used today in a number of scientific disciplines including astronomy, biology, ecology, earthquake science, gravitational-wave physics, and many others that are running on today's large-scale infrastructure such as the OSG or the TeraGrid. This solution will be modular and distributed, and reusable across a broad class of applications and workflow systems.

Scale of Use

From one to hundreds of VMs for hours at a time see also http://pegasus.isi.edu/projects/stampede