From ghouchin at andrew.cmu.edu Wed Apr 22 19:38:58 2020 From: ghouchin at andrew.cmu.edu (Gregory Houchins) Date: Wed, 22 Apr 2020 19:38:58 -0400 Subject: Fwd: data center cooling issue In-Reply-To: References: Message-ID: All jobs have been cancelled and all compute nodes shutdown and login to the headnode will be restricted to prevent hardware damage. -- Gregory Houchins | WH3402 | 412-268-2486 Arjuna System Administrator PhD Candidate, Physics Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213 ---------- Forwarded message --------- From: Bryan Webb Date: Wed, Apr 22, 2020 at 7:07 PM Subject: data center cooling issue To: CMU-CoE GPU temp email Cc: Clint Perrone , Ed Hanna Folks, We appear to be experiencing cooling problems in the data center at the moment. We are awaiting more details from a facility engineer, but nodes on Bridges are already overheating. It may be best for you to shutdown your clusters as much as possible as soon as possible from your remote management capabilities. ..Bryan -- Bryan R. Webb, Systems and Facilities Administrator Pittsburgh Supercomputing Center (Carnegie Mellon University) office: 412-268-5134 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ghouchin at andrew.cmu.edu Wed Apr 22 22:02:49 2020 From: ghouchin at andrew.cmu.edu (Gregory Houchins) Date: Wed, 22 Apr 2020 22:02:49 -0400 Subject: data center cooling issue In-Reply-To: References: Message-ID: Cluster has returned to service. -Greg On Wed, Apr 22, 2020 at 7:38 PM Gregory Houchins wrote: > All jobs have been cancelled and all compute nodes shutdown and login to > the headnode will be restricted to prevent hardware damage. > > -- > Gregory Houchins | WH3402 | 412-268-2486 > Arjuna System Administrator > PhD Candidate, Physics > Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213 > > > ---------- Forwarded message --------- > From: Bryan Webb > Date: Wed, Apr 22, 2020 at 7:07 PM > Subject: data center cooling issue > To: CMU-CoE GPU temp email > Cc: Clint Perrone , Ed Hanna > > > Folks, > > We appear to be experiencing cooling problems in the data center at the > moment. We are awaiting more details from a facility engineer, but nodes > on Bridges are already overheating. > > It may be best for you to shutdown your clusters as much as possible as > soon as possible from your remote management capabilities. > > ..Bryan > > -- > Bryan R. Webb, Systems and Facilities Administrator > Pittsburgh Supercomputing Center (Carnegie Mellon University) > office: 412-268-5134 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From trose at andrew.cmu.edu Thu Apr 23 16:59:31 2020 From: trose at andrew.cmu.edu (Timothy Rose) Date: Thu, 23 Apr 2020 20:59:31 +0000 Subject: Storage space very low on Arjuna Message-ID: <9a01d1ff27bf40f0867a259f180524b5@andrew.cmu.edu> Dear Arjuna users, Please check if there are any folders you aren't actively using on Arjuna that you could move to another place to free up some space. Arjuna has very limited storage capacity and so cannot support long term storage. The space usage on /home is greater than 99% currently. When it reaches 100%, jobs will crash until storage space is alleviated. If you have any questions or concerns feel free to reach out to us. Thanks for your cooperation on this shared resource. -------------- next part -------------- An HTML attachment was scrubbed... URL: