Difference between revisions of "Main Page"

From SciNet Users Documentation
Jump to: navigation, search
(Tutorials, Manuals, etc.)
 
(131 intermediate revisions by 9 users not shown)
Line 7: Line 7:
 
<!-- Use "Up" or "Down"; these are templates. -->
 
<!-- Use "Up" or "Down"; these are templates. -->
 
{|style="width:100%"  
 
{|style="width:100%"  
|{{Up|Niagara|Niagara_Quickstart}}
+
|{{Up |Niagara|Niagara_Quickstart}}
|{{Up|HPSS|HPSS}}
+
|{{Up |Mist|Mist}}
|{{Up|Mist|Mist}}
+
|{{Up |Teach|Teach}}
|{{Up|Teach|Teach}}
+
|{{Up |Rouge|Rouge}}
 
|-
 
|-
|{{Up|Jupyter Hub|Jupyter_Hub}}
+
|{{Up |Jupyter Hub|Jupyter_Hub}}
|{{Up|Scheduler|Niagara_Quickstart#Submitting_jobs}}
+
|{{up |Scheduler|Niagara_Quickstart#Submitting_jobs}}
|{{Up|File system|Niagara_Quickstart#Storage_and_quotas}}
+
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}
|{{Up|Burst Buffer|Burst_Buffer}}
+
|{{Up |Burst Buffer|Burst_Buffer}}
 
|-
 
|-
|{{Up|Login Nodes|Niagara_Quickstart#Logging_in}}  
+
|{{Up |HPSS|HPSS}}
|{{Meh|External Network|Niagara_Quickstart#Logging_in}}  
+
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}}  
 +
|{{Up |External Network|Niagara_Quickstart#Logging_in}}  
 
|{{Up|Globus|Globus}}
 
|{{Up|Globus|Globus}}
 
|}
 
|}
  
 
<!-- Current Messages: -->
 
<!-- Current Messages: -->
<b> August 24, 2020, 6:35 PM EST: </b> We have partial connectivity back, but are still investigating.
+
<b>July 24, 2021, 6:00 PM EDT:</b> There appear to be file system issues, which may affect users' ability to login.  We are investigating.
  
<b> August 24, 2020, 3:15 PM EST: </b> There are issues connecting to the data centre. We're investigating.
+
<b> July 23th, 2021, 9:00 AM EDT:</b> <b> Security update: </b> Due to a severe vulnerability in the Linux kernel (CVE-2021-33909), our team is currently patching and rebooting all login nodes and compute nodes, as well as the JupyterHub. There should be no affect on running jobs, however sessions on login and datamover nodes will be disrupted.  
  
<b> August 21, 2020, 6:00 PM EST: </b> The pump has been repaired, cooling is restored, systems are up.  <br/>Scratch purging is postponed until the evening of Friday Aug 28th, 2020.
+
<b> July 20th, 2021, 7:00 PM EDT:</b> <b> SLURM configuration</b> - Changed the default behaviour to kill a job step if any task exits with a non-zero exit code. If your code is able to handle failures gracefully, please add srun's option --no-kill to recover the previous default behaviour.
  
<b>August 19, 2020, 4:40 PM EST:</b> Update: The current estimate is to have the cooling restored on Friday and we hope to have the systems available for users on Saturday August 22, 2020.
+
<b> July 20th, 2021, 7:00 PM EDT:</b> Maintenance finished, systems are back online.  
  
<b>August 17, 2020, 4:00 PM EST:</b> Unfortunately after taking the pump apart it was determined there was a more serious failure of the main drive shaft, not just the seal. As a new one will need to be sourced or fabricated we're estimating that it will take at least a few more days to get the part and repairs done to restore cooling. Sorry for the inconvenience. 
+
<b>SciNet Downtime July 20th, 2021 (Tuesday):</b> There will be a maintenance shutdown of the SciNet data center on Tuesday July 20th, starting at 7 am EDT. There will be no access to any of the SciNet systems (Niagara, Mist, HPSS, Teach cluster, or the file systems) during this time. We expect to be able to bring the systems back online in the evening of July 20th.  The status of the Niagara cluster can be checked on status.computecanada.ca. For up-to-date and more detailed information on the status of all the SciNet systems, you can always check back here.
  
<b>August 15, 2020, 1:00 PM EST:</b> Due to parts availablity to repair the failed pump and cooling system it is unlikely that systems will be able to be restored until Monday afternoon at the earliest.  
+
<b>June 28th, 2021, 4:06 PM:</b> Mist OS upgrade is complete.
  
<b>August 15, 2020, 00:04 AM EST:</b> A primary pump seal in the cooling infrastructure has blown and parts availability will not be able be determined until tomorrow. All systems are shut down as there is no cooling. If parts are available, systems may be back at the earliest late tomorrow. Check here for updates.
+
<b>May 27, 2021:</b> Datamovers addresses have changed to improve high bandwidth connectivity and cybersecurity. The new addresses are 142.1.174.227 for nia-datamover1.scinet.utoronto.ca, and 142.1.174.228 for nia-datamover2.scinet.utoronto.ca.
  
<b>August 14, 2020, 21:04 AM EST:</b> Tomorrow's /scratch purge has been postponed.
+
If you have jobs that need to connect to a software license server using an ssh tunnel through nia-gw (which actually resolves to datamover1 or datamover2), you may need to ask the system administrators of that license server to allow incoming connections from the new addresses above.
  
<b>August 14, 2020, 21:00 AM EST:</b> Staff at the datacenter. Looks like one of the pumps has a seal that is leaking badly.
 
 
<b>August 14, 2020, 20:37 AM EST:</b> We seem to be undergoing a thermal shutdown at the datacenter.
 
 
<b>August 14, 2020, 20:20 AM EST:</b> Network problems to niagara/mist. We are investigating.
 
 
 
<!--  When removing system status entries, please archive them to: https://docs.scinet.utoronto.ca/index.php/Previous_messages -->
 
<!--  When removing system status entries, please archive them to: https://docs.scinet.utoronto.ca/index.php/Previous_messages -->
 
{|style="border-spacing: 10px;width: 100%"
 
{|style="border-spacing: 10px;width: 100%"
Line 52: Line 47:
 
* [[Niagara Quickstart]]
 
* [[Niagara Quickstart]]
 
* [[HPSS | HPSS archival storage]]
 
* [[HPSS | HPSS archival storage]]
* [[SOSCIP_GPU | SOSCIP GPU cluster]]
 
 
* [[Mist| Mist Power 9 GPU cluster]]
 
* [[Mist| Mist Power 9 GPU cluster]]
 
* [[Teach|Teach cluster]]
 
* [[Teach|Teach cluster]]
Line 63: Line 57:
 
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]
 
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]
 
* [[Modules specific to Niagara|Software Modules specific to Niagara]]  
 
* [[Modules specific to Niagara|Software Modules specific to Niagara]]  
 +
* [[Modules for Mist]]
 
* [[Commercial software]]
 
* [[Commercial software]]
 
* [[Burst Buffer]]
 
* [[Burst Buffer]]

Latest revision as of 21:55, 27 July 2021

System Status

Niagara Mist Teach Rouge
Jupyter Hub Scheduler File system Burst Buffer
HPSS Login Nodes External Network Globus

July 24, 2021, 6:00 PM EDT: There appear to be file system issues, which may affect users' ability to login. We are investigating.

July 23th, 2021, 9:00 AM EDT: Security update: Due to a severe vulnerability in the Linux kernel (CVE-2021-33909), our team is currently patching and rebooting all login nodes and compute nodes, as well as the JupyterHub. There should be no affect on running jobs, however sessions on login and datamover nodes will be disrupted.

July 20th, 2021, 7:00 PM EDT: SLURM configuration - Changed the default behaviour to kill a job step if any task exits with a non-zero exit code. If your code is able to handle failures gracefully, please add srun's option --no-kill to recover the previous default behaviour.

July 20th, 2021, 7:00 PM EDT: Maintenance finished, systems are back online.

SciNet Downtime July 20th, 2021 (Tuesday): There will be a maintenance shutdown of the SciNet data center on Tuesday July 20th, starting at 7 am EDT. There will be no access to any of the SciNet systems (Niagara, Mist, HPSS, Teach cluster, or the file systems) during this time. We expect to be able to bring the systems back online in the evening of July 20th. The status of the Niagara cluster can be checked on status.computecanada.ca. For up-to-date and more detailed information on the status of all the SciNet systems, you can always check back here.

June 28th, 2021, 4:06 PM: Mist OS upgrade is complete.

May 27, 2021: Datamovers addresses have changed to improve high bandwidth connectivity and cybersecurity. The new addresses are 142.1.174.227 for nia-datamover1.scinet.utoronto.ca, and 142.1.174.228 for nia-datamover2.scinet.utoronto.ca.

If you have jobs that need to connect to a software license server using an ssh tunnel through nia-gw (which actually resolves to datamover1 or datamover2), you may need to ask the system administrators of that license server to allow incoming connections from the new addresses above.

QuickStart Guides

Tutorials, Manuals, etc.