Difference between revisions of "Main Page"

From SciNet Users Documentation
Jump to: navigation, search
(System Status)
(Tutorials, Manuals, etc.)
 
(191 intermediate revisions by 11 users not shown)
Line 7: Line 7:
 
<!-- Use "Up" or "Down"; these are templates. -->
 
<!-- Use "Up" or "Down"; these are templates. -->
 
{|style="width:100%"  
 
{|style="width:100%"  
|{{Up|Niagara|Niagara_Quickstart}}
+
|{{Up |Niagara|Niagara_Quickstart}}
|{{Up|HPSS|HPSS}}
+
|{{Up |Mist|Mist}}
|{{Up|Mist|Mist}}
+
|{{Up |Teach|Teach}}
|{{Up|Teach|Teach}}
+
|{{Up |Rouge|Rouge}}
 
|-
 
|-
|{{Up|Jupyter Hub|Jupyter_Hub}}
+
|{{Up |Jupyter Hub|Jupyter_Hub}}
|{{Up|Scheduler|Niagara_Quickstart#Submitting_jobs}}
+
|{{up |Scheduler|Niagara_Quickstart#Submitting_jobs}}
|{{Up|File system|Niagara_Quickstart#Storage_and_quotas}}
+
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}
|{{Up|Burst Buffer|Burst_Buffer}}
+
|{{Up |Burst Buffer|Burst_Buffer}}
 
|-
 
|-
|{{Up|Login Nodes|Niagara_Quickstart#Logging_in}}  
+
|{{Up |HPSS|HPSS}}
|{{Up|External Network|Niagara_Quickstart#Logging_in}}  
+
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}}  
 +
|{{Up |External Network|Niagara_Quickstart#Logging_in}}  
 
|{{Up|Globus|Globus}}
 
|{{Up|Globus|Globus}}
 
|}
 
|}
 +
 
<!-- Current Messages: -->
 
<!-- Current Messages: -->
 +
<b>July 24, 2021, 6:00 PM EDT:</b> There appear to be file system issues, which may affect users' ability to login.  We are investigating.
 +
 +
<b> July 23th, 2021, 9:00 AM EDT:</b> <b> Security update: </b> Due to a severe vulnerability in the Linux kernel (CVE-2021-33909), our team is currently patching and rebooting all login nodes and compute nodes, as well as the JupyterHub.  There should be no affect on running jobs, however sessions on login and datamover nodes will be disrupted.
 +
 +
<b> July 20th, 2021, 7:00 PM EDT:</b> <b> SLURM configuration</b> - Changed the default behaviour to kill a job step if any task exits with a non-zero exit code. If your code is able to handle failures gracefully, please add srun's option --no-kill to recover the previous default behaviour.
  
<b> June 29, 6:21:00 PM:</b> Systems are available again.
+
<b> July 20th, 2021, 7:00 PM EDT:</b> Maintenance finished, systems are back online.  
  
<b> June 29, 12:30:00  PM:</b> Power Outage caused thermal shutdown.
+
<b>SciNet Downtime July 20th, 2021 (Tuesday):</b> There will be a maintenance shutdown of the SciNet data center on Tuesday July 20th, starting at 7 am EDT. There will be no access to any of the SciNet systems (Niagara, Mist, HPSS, Teach cluster, or the file systems) during this time.  We expect to be able to bring the systems back online in the evening of July 20th.  The status of the Niagara cluster can be checked on status.computecanada.ca. For up-to-date and more detailed information on the status of all the SciNet systems, you can always check back here.
  
<b>June 20, 2020, 10:24 PM:</b> File systems are back up.  Unfortunately, all running jobs would have died and users are asked to resubmit them.
+
<b>June 28th, 2021, 4:06 PM:</b> Mist OS upgrade is complete.
  
<b>June 20, 2020, 9:48 PM:</b> An issue with the file systems is causing trouble. We are investigating the cause.
+
<b>May 27, 2021:</b> Datamovers addresses have changed to improve high bandwidth connectivity and cybersecurity. The new addresses are 142.1.174.227 for nia-datamover1.scinet.utoronto.ca, and 142.1.174.228 for nia-datamover2.scinet.utoronto.ca.
  
<b>June 15, 2020, 10:30 PM:</b> A <b>power glitch</b> caused some compute nodes to be rebooted: jobs running at the time may have failed; users are asked to resubmit these jobs.
+
If you have jobs that need to connect to a software license server using an ssh tunnel through nia-gw (which actually resolves to datamover1 or datamover2), you may need to ask the system administrators of that license server to allow incoming connections from the new addresses above.
  
 
<!--  When removing system status entries, please archive them to: https://docs.scinet.utoronto.ca/index.php/Previous_messages -->
 
<!--  When removing system status entries, please archive them to: https://docs.scinet.utoronto.ca/index.php/Previous_messages -->
Line 40: Line 47:
 
* [[Niagara Quickstart]]
 
* [[Niagara Quickstart]]
 
* [[HPSS | HPSS archival storage]]
 
* [[HPSS | HPSS archival storage]]
* [[SOSCIP_GPU | SOSCIP GPU cluster]]
 
 
* [[Mist| Mist Power 9 GPU cluster]]
 
* [[Mist| Mist Power 9 GPU cluster]]
 
* [[Teach|Teach cluster]]
 
* [[Teach|Teach cluster]]
Line 51: Line 57:
 
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]
 
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]
 
* [[Modules specific to Niagara|Software Modules specific to Niagara]]  
 
* [[Modules specific to Niagara|Software Modules specific to Niagara]]  
 +
* [[Modules for Mist]]
 
* [[Commercial software]]
 
* [[Commercial software]]
 
* [[Burst Buffer]]
 
* [[Burst Buffer]]

Latest revision as of 21:55, 27 July 2021

System Status

Niagara Mist Teach Rouge
Jupyter Hub Scheduler File system Burst Buffer
HPSS Login Nodes External Network Globus

July 24, 2021, 6:00 PM EDT: There appear to be file system issues, which may affect users' ability to login. We are investigating.

July 23th, 2021, 9:00 AM EDT: Security update: Due to a severe vulnerability in the Linux kernel (CVE-2021-33909), our team is currently patching and rebooting all login nodes and compute nodes, as well as the JupyterHub. There should be no affect on running jobs, however sessions on login and datamover nodes will be disrupted.

July 20th, 2021, 7:00 PM EDT: SLURM configuration - Changed the default behaviour to kill a job step if any task exits with a non-zero exit code. If your code is able to handle failures gracefully, please add srun's option --no-kill to recover the previous default behaviour.

July 20th, 2021, 7:00 PM EDT: Maintenance finished, systems are back online.

SciNet Downtime July 20th, 2021 (Tuesday): There will be a maintenance shutdown of the SciNet data center on Tuesday July 20th, starting at 7 am EDT. There will be no access to any of the SciNet systems (Niagara, Mist, HPSS, Teach cluster, or the file systems) during this time. We expect to be able to bring the systems back online in the evening of July 20th. The status of the Niagara cluster can be checked on status.computecanada.ca. For up-to-date and more detailed information on the status of all the SciNet systems, you can always check back here.

June 28th, 2021, 4:06 PM: Mist OS upgrade is complete.

May 27, 2021: Datamovers addresses have changed to improve high bandwidth connectivity and cybersecurity. The new addresses are 142.1.174.227 for nia-datamover1.scinet.utoronto.ca, and 142.1.174.228 for nia-datamover2.scinet.utoronto.ca.

If you have jobs that need to connect to a software license server using an ssh tunnel through nia-gw (which actually resolves to datamover1 or datamover2), you may need to ask the system administrators of that license server to allow incoming connections from the new addresses above.

QuickStart Guides

Tutorials, Manuals, etc.