Difference between revisions of "Main Page"

From SciNet Users Documentation
Jump to: navigation, search
(System Status)
(Tutorials, Manuals, etc.)
 
(223 intermediate revisions by 11 users not shown)
Line 7: Line 7:
 
<!-- Use "Up" or "Down"; these are templates. -->
 
<!-- Use "Up" or "Down"; these are templates. -->
 
{|style="width:100%"  
 
{|style="width:100%"  
|{{Up|Niagara|Niagara_Quickstart}}
+
|{{Up |Niagara|Niagara_Quickstart}}
|{{Up|HPSS|HPSS}}
+
|{{Up |Mist|Mist}}
|{{Up|SOSCIP&nbsp;GPU|SOSCIP_GPU}}
+
|{{Up |Teach|Teach}}
|{{Up|Mist|Mist}}
+
|{{Up |Rouge|Rouge}}
 
|-
 
|-
|{{Up|Teach|Teach}}
+
|{{Up |Jupyter Hub|Jupyter_Hub}}
|{{Up|Jupyter Hub|Jupyter_Hub}}
+
|{{up |Scheduler|Niagara_Quickstart#Submitting_jobs}}
|{{Up|Scheduler|Niagara_Quickstart#Submitting_jobs}}
+
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}
|{{Up|File system|Niagara_Quickstart#Storage_and_quotas}}
+
|{{Up |Burst Buffer|Burst_Buffer}}
 
|-
 
|-
|{{Up|Login Nodes|Niagara_Quickstart#Logging_in}}  
+
|{{Up |HPSS|HPSS}}
|{{Up|External Network|Niagara_Quickstart#Logging_in}}  
+
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}}  
 +
|{{Up |External Network|Niagara_Quickstart#Logging_in}}  
 
|{{Up|Globus|Globus}}
 
|{{Up|Globus|Globus}}
|{{Up|Burst Buffer|Burst_Buffer}}
 
 
|}
 
|}
 +
 
<!-- Current Messages: -->
 
<!-- Current Messages: -->
 +
<b>July 24, 2021, 6:00 PM EDT:</b> There appear to be file system issues, which may affect users' ability to login.  We are investigating.
 +
 +
<b> July 23th, 2021, 9:00 AM EDT:</b> <b> Security update: </b> Due to a severe vulnerability in the Linux kernel (CVE-2021-33909), our team is currently patching and rebooting all login nodes and compute nodes, as well as the JupyterHub.  There should be no affect on running jobs, however sessions on login and datamover nodes will be disrupted.
  
<b>April 28, 2020, 7:20 AM:</b> A power glitch this morning caused all compute nodes to be rebooted: jobs running at the time have failed; users are asked to resubmit these jobs.
+
<b> July 20th, 2021, 7:00 PM EDT:</b> <b> SLURM configuration</b> - Changed the default behaviour to kill a job step if any task exits with a non-zero exit code. If your code is able to handle failures gracefully, please add srun's option --no-kill to recover the previous default behaviour.
 
<b>April 20, 2020: Security Incident at Cedar; implications for Niagara users</b>
 
  
Last week, it became evident that the Cedar GP cluster had been
+
<b> July 20th, 2021, 7:00 PM EDT:</b> Maintenance finished, systems are back online.  
comprimised for several weeks.  The passwords of at least two
 
Compute Canada users were known to the attackers. One of these was
 
used to escalate privileges on Cedar, as explained on
 
https://status.computecanada.ca/view_incident?incident=423.
 
  
These accounts were used to login to Niagara as well, but Niagara
+
<b>SciNet Downtime July 20th, 2021 (Tuesday):</b> There will be a maintenance shutdown of the SciNet data center on Tuesday July 20th, starting at 7 am EDT. There will be no access to any of the SciNet systems (Niagara, Mist, HPSS, Teach cluster, or the file systems) during this time.  We expect to be able to bring the systems back online in the evening of July 20th.  The status of the Niagara cluster can be checked on status.computecanada.ca. For up-to-date and more detailed information on the status of all the SciNet systems, you can always check back here.
did not have the same security loophole as Cedar (which has been
 
fixed), and no further escalation was observed on Niagara.
 
  
Reassuring as that may sound, it is not known how the passwords of
+
<b>June 28th, 2021, 4:06 PM:</b> Mist OS upgrade is complete.
the two user accounts were obtained. Given this uncertainty, the
 
SciNet team *strongly* recommends that you change your password on
 
https://ccdb.computecanada.ca/security/change_password, and remove
 
any SSH keys and regenerate new ones (see
 
https://docs.scinet.utoronto.ca/index.php/SSH_keys).
 
  
<b> SciNet/Niagara Downtime Announcement, May 6-7, 2020</b>
+
<b>May 27, 2021:</b> Datamovers addresses have changed to improve high bandwidth connectivity and cybersecurity. The new addresses are 142.1.174.227 for nia-datamover1.scinet.utoronto.ca, and 142.1.174.228 for nia-datamover2.scinet.utoronto.ca.
  
All resources at SciNet will undergo a two-day maintenance shutdown on May 6th and 7th 2020, starting at 7 am EDT on Wednesday May 6th.  There will be no access to any of the SciNet systems (Niagara, Mist, HPSS, Teach cluster, or the file systems) or systems hosted at the SciNet data centre.  We expect to be able to bring the systems back online the evening of May 7th.
+
If you have jobs that need to connect to a software license server using an ssh tunnel through nia-gw (which actually resolves to datamover1 or datamover2), you may need to ask the system administrators of that license server to allow incoming connections from the new addresses above.
  
 
<!--  When removing system status entries, please archive them to: https://docs.scinet.utoronto.ca/index.php/Previous_messages -->
 
<!--  When removing system status entries, please archive them to: https://docs.scinet.utoronto.ca/index.php/Previous_messages -->
Line 56: Line 47:
 
* [[Niagara Quickstart]]
 
* [[Niagara Quickstart]]
 
* [[HPSS | HPSS archival storage]]
 
* [[HPSS | HPSS archival storage]]
* [[SOSCIP_GPU | SOSCIP GPU cluster]]
 
 
* [[Mist| Mist Power 9 GPU cluster]]
 
* [[Mist| Mist Power 9 GPU cluster]]
 
* [[Teach|Teach cluster]]
 
* [[Teach|Teach cluster]]
Line 67: Line 57:
 
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]
 
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]
 
* [[Modules specific to Niagara|Software Modules specific to Niagara]]  
 
* [[Modules specific to Niagara|Software Modules specific to Niagara]]  
 +
* [[Modules for Mist]]
 
* [[Commercial software]]
 
* [[Commercial software]]
 
* [[Burst Buffer]]
 
* [[Burst Buffer]]
 
* [[SSH Tunneling]]
 
* [[SSH Tunneling]]
 +
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]
 
* [[Visualization]]
 
* [[Visualization]]
 
* [[Running Serial Jobs on Niagara]]
 
* [[Running Serial Jobs on Niagara]]
 
* [[Jupyter Hub]]
 
* [[Jupyter Hub]]
 
|}
 
|}

Latest revision as of 21:55, 27 July 2021

System Status

Niagara Mist Teach Rouge
Jupyter Hub Scheduler File system Burst Buffer
HPSS Login Nodes External Network Globus

July 24, 2021, 6:00 PM EDT: There appear to be file system issues, which may affect users' ability to login. We are investigating.

July 23th, 2021, 9:00 AM EDT: Security update: Due to a severe vulnerability in the Linux kernel (CVE-2021-33909), our team is currently patching and rebooting all login nodes and compute nodes, as well as the JupyterHub. There should be no affect on running jobs, however sessions on login and datamover nodes will be disrupted.

July 20th, 2021, 7:00 PM EDT: SLURM configuration - Changed the default behaviour to kill a job step if any task exits with a non-zero exit code. If your code is able to handle failures gracefully, please add srun's option --no-kill to recover the previous default behaviour.

July 20th, 2021, 7:00 PM EDT: Maintenance finished, systems are back online.

SciNet Downtime July 20th, 2021 (Tuesday): There will be a maintenance shutdown of the SciNet data center on Tuesday July 20th, starting at 7 am EDT. There will be no access to any of the SciNet systems (Niagara, Mist, HPSS, Teach cluster, or the file systems) during this time. We expect to be able to bring the systems back online in the evening of July 20th. The status of the Niagara cluster can be checked on status.computecanada.ca. For up-to-date and more detailed information on the status of all the SciNet systems, you can always check back here.

June 28th, 2021, 4:06 PM: Mist OS upgrade is complete.

May 27, 2021: Datamovers addresses have changed to improve high bandwidth connectivity and cybersecurity. The new addresses are 142.1.174.227 for nia-datamover1.scinet.utoronto.ca, and 142.1.174.228 for nia-datamover2.scinet.utoronto.ca.

If you have jobs that need to connect to a software license server using an ssh tunnel through nia-gw (which actually resolves to datamover1 or datamover2), you may need to ask the system administrators of that license server to allow incoming connections from the new addresses above.

QuickStart Guides

Tutorials, Manuals, etc.