Maintenance and Outages

Status Type Title Start of Outage Anticipated End of Outage Resolution
Resolved Outage Juliet unavailable

New kernel was hanging on boot-up. Seemed related to Infiniband (mlx5_core driver).

Booted with prior kernel and OK for now. Will need to test future updates to see if resolved.

Resolved Outage Romeo r-006

The system partition table was recovered. Data appears to be intact. The node is available for use.

Resolved Maintenance Maintenance Day
Resolved Maintenance Storage battery replacement

Battery replaced. fgstor01 and fgstor02 both have connections to storage controller B, so both had to be rebooted to recover from the disconnected eSCSI devices.

Resolved Outage Tango cluster unavailable

DHCP service had not started automatically.

Resolved Outage Power outage March 8, 2020
Ongoing Outage Victor v-016 reserved
Outage Nodes Reserved on Romeo, Juliet
Ongoing Outage Hardware Issues