Difference between revisions of "Previous messages"

From SciNet Users Documentation
Jump to: navigation, search
Line 1: Line 1:
 +
<b>January 21, 4:00 PM</b>: HPSS is back in service. Thank you for your patience.
 +
 +
<b>January 18, 5:00 PM</b>: We did practically all of the HPSS upgrades (software/hardware), however the main client node - archive02 - is presenting an issue we just couldn't resolve yet. We will try to resume work over the weekend with cool heads, or on Monday. Sorry, but this is an unforeseen delay. Jobs on the queue we'll remain there, and we'll delay the scratch purging by 1 week. <br>
 +
 +
<b>January 16, 11:00 PM</b>: HPSS is being upgraded, as announced. <br>
 +
 +
<b>January 16, 8:00 PM</b>: System are coming back up and should be accessible for users now.<br>
 +
 +
<b>January 15, 8:00 AM</b>: Data centre downtime in effect.<br>
  
 
* <font color=red>Downtime Announcement for January 15 and 16, 2019</font><br>
 
* <font color=red>Downtime Announcement for January 15 and 16, 2019</font><br>

Revision as of 00:55, 1 February 2019

January 21, 4:00 PM: HPSS is back in service. Thank you for your patience.

January 18, 5:00 PM: We did practically all of the HPSS upgrades (software/hardware), however the main client node - archive02 - is presenting an issue we just couldn't resolve yet. We will try to resume work over the weekend with cool heads, or on Monday. Sorry, but this is an unforeseen delay. Jobs on the queue we'll remain there, and we'll delay the scratch purging by 1 week.

January 16, 11:00 PM: HPSS is being upgraded, as announced.

January 16, 8:00 PM: System are coming back up and should be accessible for users now.

January 15, 8:00 AM: Data centre downtime in effect.

  • Downtime Announcement for January 15 and 16, 2019

The SciNet datacentre will need to undergo a two-day maintenance shutdown in order to perform electrical work, repairs and maintenance. The electrical work is in preparation for the upcoming installation of an emergency power generator and a larger UPS, which will result in increased resilience to power glitches and outages. The shutdown is scheduled to start on Tuesday January 15, 2019, at 7 am and will last until Wednesday 16, 2019, some time in the evening. There will be no access to any of the SciNet systems (Niagara, P7, P8, BGQ, SGC, HPSS, Teach cluster, or the filesystems) during this time. Check back here for up-to-date information on the status of the systems.

Note: this downtime was originally scheduled for Dec. 18, 2018, but has been postponed and combined with the annual maintenance downtime.

  • December 24, 2018, 11:35 AM EST: Most systems are operational again. If you had compute jobs running yesterday at around 3:30PM, they likely crashed - please check them and resubmit if needed.
  • December 24, 2018, 10:40 AM EST: Repairs have been made, and the file systems are starting to be mounted on the cluster.
  • December 23, 2018, 3:38 PM EST: Issues with the file systems (home, scratch and project). We are investigating, it looks like a hardware issue that we are trying to work around. Note that the absence of /home means you cannot log in with ssh keys. All compute jobs crashed around 3:30 PM EST on Dec 23. Once the system is properly up again, please resubmit your jobs. Unfortunately, at this time of year, it is not possible to give an estimate on when the system will be operational again.
  • Tue Nov 22 14:20:00 EDT 2018: HPSS back in service
  • Tue Nov 22 08:55:00 EDT 2018: HPSS offline for scheduled maintenance
  • Tue Nov 20 16:30:00 EDT 2018: HPSS offline on Thursday 9AM for installation of new LTO8 drives in the tape library.
  • Tue Oct 9 12:16:00 EDT 2018: BGQ compute nodes are up.
  • Sun Oct 7 20:24:26 EDT 2018: SGC and BGQ front end are available, BGQ compute nodes down related to a cooling issue.
  • Sat Oct 6 23:16:44 EDT 2018: There were some problems bringing up SGC & BGQ, they will remain offline for now.
  • Sat Oct 6 18:36:35 EDT 2018: Electrical work finished, power restored. Systems are coming online.
  • July 18, 2018: login.scinet.utoronto.ca is now disabled, GPC $SCRATCH and $HOME are decommissioned.
  • July 12, 2018: There was a short power interruption around 10:30 am which caused most of the systems (Niagara, SGC, BGQ) to reboot and any running jobs to fail.
  • July 11, 2018: P7's moved to BGQ filesystem, P8's moved to Niagara filesystem.
  • May 24, 2018, 9:25 PM EST: The data center is up, and all systems are operational again.
  • May 24, 2018, 7:00 AM EST: The data centre is under annual maintenance. All systems are offline. Systems are expected to be back late afternoon today; check for updates on this page.
  • May 18, 2018: Announcement: Annual scheduled maintenance downtime: Thursday May 24, starting 7:00 AM
  • May 16, 2018: Cooling restored, systems online
  • May 16, 2018: Cooling issue at datacentre again, all systems down
  • May 15, 2018: Cooling restored, systems coming online
  • May 15, 2018: Cooling issue at datacentre, all systems down
  • May 4, 2018: HPSS is now operational on Niagara.
  • May 3, 2018: Burst Buffer is available upon request.
  • May 3, 2018: The Globus endpoint for Niagara is available: computecanada#niagara.
  • May 1, 2018: System status moved here.
  • Apr 23, 2018 GPC-compute is decommissioned, GPC-storage available until 30 May 2018.
  • April 10, 2018: Niagara commissioned.