https://docs.scinet.utoronto.ca/api.php?action=feedcontributions&user=Bmundim&feedformat=atomSciNet Users Documentation - User contributions [en]2024-03-29T01:25:14ZUser contributionsMediaWiki 1.35.12https://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=5256Main Page2023-12-07T15:22:36Z<p>Bmundim: </p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up", "Partial" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{Up |Niagara|Niagara_Quickstart}}<br />
|{{up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Up |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up |HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
'''Thu Dec 6 10:01:24 EST 2023:''' Niagara's scheduler rebooting for security patches.<br />
<br />
'''Wed Dec 6 13:06:46 EST 2023:''' Endpoint computecanada#niagara transition from Globus GCSv4 to GCSv5 is completed. computecanada#niagara-GCSv4 has been deactivated<br />
<br />
'''Mon Dec 4 16:35:07 EST 2023:''' Endpoint computecanada#niagara has now been upgraded to Globus GCSv5. The old endpoint is still available as computecanada#niagara-GCSv4 on nia-datamover2, only until Wednesday, at which time we'll disable it as well.<br />
<br />
'''Mon Dec 4 11:54:49 EST 2023:''' The nia-datamover1 node will the offline this Monday afternoon for the Globus GCSv5 upgrade. Endpoint computecanada#niagara-GCSv4 will still be available via nia-datamover2<br />
<br />
'''Tue Nov 28 16:29:14 EST 2023:''' The computecanada#hpss Globus endpoint is now running GCSv5. We'll find a window of opportunity next week to upgrade computecanada#niagara to GCSv5 as well.<br />
<br />
'''Tue Nov 28 14:20:30 EST 2023:''' The computecanada#hpss Globus endpoint will be offline for the next few hours for the GCSv5 upgrade.<br />
<br />
'''Fri Nov 10, 2023, 18:00 PM EDT:''' The HPSS upgrade is finished. We didn't have time to update Globus to GCSv5, so we'll find a window of opportunity to do this next week. <br />
<br />
Please be advised that starting this <B>Friday morning, Nov/10, we'll be upgrading the HPSS system from version 8.3 to 9.3 and the HPSS Globus server from GCSv4 to GCSv5.</B> Everything going well we expect to be back online by the end of the day. <br />
<br />
<!-- When removing system status entries, please archive them to: --><br />
[[Previous messages]]<br />
<br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH#SSH Keys|SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=5253Main Page2023-12-07T15:02:31Z<p>Bmundim: </p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up", "Partial" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{Up |Niagara|Niagara_Quickstart}}<br />
|{{up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Partial |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up |HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
'''Thu Dec 6 10:01:24 EST 2023:''' Niagara's scheduler rebooting for security patches.<br />
<br />
'''Wed Dec 6 13:06:46 EST 2023:''' Endpoint computecanada#niagara transition from Globus GCSv4 to GCSv5 is completed. computecanada#niagara-GCSv4 has been deactivated<br />
<br />
'''Mon Dec 4 16:35:07 EST 2023:''' Endpoint computecanada#niagara has now been upgraded to Globus GCSv5. The old endpoint is still available as computecanada#niagara-GCSv4 on nia-datamover2, only until Wednesday, at which time we'll disable it as well.<br />
<br />
'''Mon Dec 4 11:54:49 EST 2023:''' The nia-datamover1 node will the offline this Monday afternoon for the Globus GCSv5 upgrade. Endpoint computecanada#niagara-GCSv4 will still be available via nia-datamover2<br />
<br />
'''Tue Nov 28 16:29:14 EST 2023:''' The computecanada#hpss Globus endpoint is now running GCSv5. We'll find a window of opportunity next week to upgrade computecanada#niagara to GCSv5 as well.<br />
<br />
'''Tue Nov 28 14:20:30 EST 2023:''' The computecanada#hpss Globus endpoint will be offline for the next few hours for the GCSv5 upgrade.<br />
<br />
'''Fri Nov 10, 2023, 18:00 PM EDT:''' The HPSS upgrade is finished. We didn't have time to update Globus to GCSv5, so we'll find a window of opportunity to do this next week. <br />
<br />
Please be advised that starting this <B>Friday morning, Nov/10, we'll be upgrading the HPSS system from version 8.3 to 9.3 and the HPSS Globus server from GCSv4 to GCSv5.</B> Everything going well we expect to be back online by the end of the day. <br />
<br />
<!-- When removing system status entries, please archive them to: --><br />
[[Previous messages]]<br />
<br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH#SSH Keys|SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=5145Main Page2023-10-25T23:54:51Z<p>Bmundim: /* System Status */</p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up", "Partial" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{up |Niagara|Niagara_Quickstart}}<br />
|{{up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Up |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up |HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
'''Wed Oct 25 7:54 PM EDT:''' slurm-*.out now outputs job info for last array job.<br />
<br />
'''Tue Oct 24 12:00 AM EDT:''' network appears to be up<br />
<br />
'''Tue Oct 24 11:32 AM EDT:''' campus network issues<br />
<br />
'''Thu Oct 05, 2023, 12:05 PM EDT:''' Niagara scheduler is back online.<br />
<br />
'''Thu Oct 05, 2023, 11:50 AM EDT:''' Niagara scheduler is temporarily under maintenance for security updates. <br />
<br />
'''Tue Oct 31, 2023, 12:PM EDT - Fri Nov 3, 2023, 12:00 PM EDT:''' Three-day reservation for the "Niagara at Scale" event. Only "Niagara at Scale" projects will run on the compute nodes. Users are encouraged to submit small and short jobs that could run before this event. Throughout the event, users can still login, access their data, and submit jobs, but these jobs will not run until after the event. Note that the debugjob queue will remain available to everyone as well.<br />
<br />
<!-- When removing system status entries, please archive them to: --><br />
[[Previous messages]]<br />
<br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH#SSH Keys|SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Slurm&diff=5142Slurm2023-10-25T21:00:30Z<p>Bmundim: /* EDR/HDR Infiniband Topology */</p>
<hr />
<div>The queueing system used at SciNet is based around the [https://slurm.schedmd.com Slurm Workload Manager]. This "scheduler", Slurm, determines which jobs will be run on which compute nodes, and when. This page outlines how to submit jobs, how to interact with the scheduler, and some of the most common Slurm commands.<br />
<br />
Some common questions about the queuing system can be found on the [[FAQ]] as well.<br />
<br />
= Submitting jobs =<br />
<br />
You submit jobs from a Niagara login node. This is done by passing a script to the sbatch command:<br />
<br />
nia-login07:~$ sbatch jobscript.sh<br />
<br />
This puts the job, described by the job script, into the queue. The scheduler will will run the job on the compute nodes in due course. A typical submission script is as follows.<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
<br />
Some notes about this example:<br />
* The first line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>).<br />
* In this case, SLURM looks for 2 nodes with 40 cores on which to run 80 tasks, for 1 hour.<br />
* Note that the mpifun flag "--ppn" (processors per node) is ignored. Slurm takes care of this detail.<br />
* Once the scheduler finds a spot to run the job, it runs the script:<br />
** It changes to the submission directory;<br />
** Loads modules;<br />
** Runs the <code>mpi_example</code> application.<br />
* To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).<br />
<br />
To create a job script appropriate for your work, you must modify the commands above to instruct Slurm to run the commands you need run.<br />
<br />
== Things to remember ==<br />
<br />
There are some things to always bear in mind when crafting your submission script:<br />
* Scheduling is by node, so in multiples of 40 cores. You are expected to use all 40 cores! If you are running serial jobs, and need assistance bundling your work into multiples of 40, please see the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page.<br />
* Jobs must write to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access. Download data you need before submitting your job.<br />
* Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below).<br />
* Jobs will run under your group's RRG allocation. If your group does not have an allocation, your job will run under your group's RAS allocation (previously called `default' allocation). Note that groups with an allocation cannot run under a default allocation.<br />
* The maximum [[Wallclock_time | walltime]] for all users is 24 hours. The minimum and default walltime is 15 minutes.<br />
<br />
= Scheduling details =<br />
<br />
We now present the details of how to write a job script, and some extra commands which you might find useful.<br />
<br />
== SLURM nomenclature: jobs, nodes, tasks, cpus, cores, threads ==<br />
<br />
SLURM has a somewhat different way of referring to things like MPI processes and thread tasks, as compared to our previous scheduler, MOAB. The SLURM nomenclature is reflected in the names of scheduler options (i.e., resource requests). SLURM strictly enforces those requests, so it is important to get this right.<br />
<br />
{| class="wikitable"<br />
!term <br />
!meaning <br />
!SLURM term<br />
!related scheduler options <br />
|-<br />
|job<br />
|scheduled piece of work for which specific resources were requested.<br />
|job<br />
|<tt>sbatch, salloc</tt><br />
|-<br />
|node<br />
|basic computing component with several cores (40 for Niagara) that share memory <br />
|node<br />
|<tt>--nodes -N</tt><br />
|-<br />
|mpi process<br />
|one of a group of running programs using Message Passing Interface for parallel computing<br />
|task<br />
|<tt>--ntasks -n --ntasks-per-node</tt><br />
|-<br />
|core ''or'' physical cpu<br />
|A fully functional independent physical execution unit.<br />
| - <br />
| -<br />
|-<br />
|logical cpu<br />
|An execution unit that the operating system can assign work to. Operating systems can be configured to overload physical cores with multiple logical cpus using hyperthreading.<br />
|cpu<br />
|<tt>--cpus-per-task</tt><br />
|-<br />
|thread<br />
|one of possibly multiple simultaneous execution paths within a program, which can share memory.<br />
| -<br />
| <tt>--cpus-per-task</tt> '''and''' <tt>OMP_NUM_THREADS</tt><br />
|-<br />
|hyperthread<br />
|a thread run in a collection of threads that is larger than the number of physical cores.<br />
| -<br />
| -<br />
|}<br />
<br />
== Scheduling by Node ==<br />
<br />
* On many systems that use SLURM, the scheduler will deduce from the job script specifications (the number of tasks and the number of cpus-per-node) what resources should be allocated. On Niagara, this is a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours.<br />
** No other users are running anything on them.<br />
** You can ssh into them, while your job is running, to see how things are going.<br />
* Whatever you request of the scheduler, your request will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes. Each node has about 202GB of RAM available.<br />
* You should try to use all the cores on the nodes allocated to your job. Since there are 40 cores per node, your job should use N x 40 cores. If this is not the case, we will be contacted you to help you optimize your workflow. Again, users which have serials jobs should consult the [[Running Serial Jobs on Niagara | serial jobs]] page.<br />
<br />
== Hyperthreading: Logical CPUs vs. cores ==<br />
<br />
Hyperthreading, a technology that leverages more of the physical hardware by pretending there are twice as many logical cores than real cores, is enabled on Niagara.<br />
The operating system and scheduler see 80 logical CPUs.<br />
<br />
Using 80 logical CPUs versus 40 real cores typically gives about a 5-10% speedup, depending on your application (your mileage may vary).<br />
<br />
Because Niagara is scheduled by node, hyperthreading is actually fairly easy to use:<br />
* Ask for a certain number of nodes, N, for your job.<br />
* You know that you get 40 x N cores, so you will use (at least) a total of 40 x N MPI processes or threads (mpirun, srun, and the OS will automaticallly spread these over the real cores).<br />
* But you should also test if running 80 x N MPI processes or threads gives you any speedup.<br />
* Regardless, your usage will be counted as 40 x N x (walltime in years).<br />
<br />
Many applications which are communication-heavy can benefit from the use of hyperthreading.<br />
<br />
= Submission script details =<br />
<br />
This section outlines some details of how to interact with the scheduler, and how it implements Niagara's scheduling policies.<br />
<br />
== Queues ==<br />
<br />
There are 3 queues available on SciNet systems. These queues have different limits; see the [[#Limits | Limits]] section for further details.<br />
<br />
=== Compute ===<br />
<br />
The compute queue is the default queue. Most jobs will run in this queue. If no flags are specified in the submission script this is the queue where your job will land.<br />
<br />
=== Debug ===<br />
<br />
The Debug queue is a high-priority queue, used for short-term testing of your code. Do NOT use the debug queue for production work. You can use the debug queue one of two ways. To submit a standard job script to the debug queue, add the line<br />
#SBATCH -p debug<br />
to your submission script. This will put the job into the debug queue, and it should run in short order.<br />
<br />
To request an interactive debug session, where you retain control over the command line prompt, at a login node type the command<br />
nia-login07:~$ salloc -p debug --nodes 1 --time=1:00:00<br />
This will request 1 node for 1 hour. You can similarly request a debug session using the 'debugjob' command:<br />
nia-login07:~$ debugjob N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.<br />
<br />
=== Archive ===<br />
<br />
The archivelong and archiveshort queues are only used by the [[HPSS]] system. See that page for details on how to use these queues.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Jobs in queue<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 50 || 1000 || 1 node (40&nbsp;cores) || default:&nbsp;20&nbsp;nodes&nbsp;(800&nbsp;cores) <br> with&nbsp;allocation:&nbsp;1000&nbsp;nodes&nbsp;(40000&nbsp;cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || min(1, 1.5/n<sub>node</sub>) hours<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== Slurm Accounts ==<br />
<br />
To be able to prioritise jobs based on groups and allocations, the Slurm scheduler uses the concept of ''accounts''. Each group that has a Resource for Research Groups (RRG) or Research Platforms and Portals (RPP) allocation (awarded through an annual competition by Compute Canada) has an account that starts with <tt>rrg-</tt> or <tt>rpp-</tt>. Slurm assigns a 'fairshare' priority to these accounts based on the size of the award in core-years. Groups without an RRG or RPP can use Niagara using a so-called Rapid Access Service (RAS), and have an account that starts with <tt>def-</tt>.<br />
<br />
On Niagara, most users will only ever use one account, and those users do not need to specify the account to Slurm. However, users that are part of collaborations may be able to use multiple accounts, i.e., that of their sponsor and that of their collaborator, but this mean that they need to select the right account when running jobs. <br />
<br />
To select the account, just add <br />
<br />
#SBATCH -A [account]<br />
<br />
to the job scripts, or use the <tt>-A [account]</tt> to <tt>salloc</tt> or <tt>debugjob</tt>. <br />
<br />
To see which accounts you have access to, or what their names are, use the command<br />
<br />
sshare -U<br />
<br />
It has been noted that, in some cases, using the '-A' flag does not result in the appropriate account being used. To get around this, specify the account when sbatch is invoked:<br />
sbatch -A account myjobscript.sh<br />
<br />
== Slurm environment variables ==<br />
<br />
There are many environment variables built into Slurm. These are some which you may find useful:<br />
* SLURM_SUBMIT_DIR: directory from which the job was submitted.<br />
* SLURM_SUBMIT_HOST: host from which the job was submitted.<br />
* SLURM_JOB_ID: the job's id.<br />
* SLURM_JOB_NUM_NODES: number of nodes in the job.<br />
* SLURM_JOB_NODELIST: list of nodes assigned to the job.<br />
* SLURM_JOB_ACCOUNT: account associated with the job.<br />
<br />
Any of these environment variables can be accessed from within your job script.<br />
<br />
== Passing Variables to submission scripts ==<br />
It is possible to pass values through environment variables into your SLURM submission scripts.<br />
For doing so with already defined variables in your shell, just add the following directive in the submission script,<br />
<br />
#SBATCH --export=ALL<br />
<br />
and you will have access to any predefined environment variable.<br />
<br />
A better way is to specify explicitly which variables you want to pass into the submision script,<br />
<br />
sbatch --export=i=15,j='test' jobscript.sbatch<br />
<br />
You can even set the job name and output files using environment variables, eg.<br />
<br />
i="simulation"<br />
j=14<br />
sbatch --job-name=$i.$j.run --output=$i.$j.out --export=i=$i,j=$j jobscript.sbatch<br />
<br />
(The latter only works on the command line; you cannot use environment variables in <tt>#SBATCH</tt> lines in the job script.)<br />
<br />
== Command line arguments ==<br />
<br />
Command line arguments can also be used for job script in the same way as command line argument for shell scripts. All command line arguments given to sbatch that follow after the job script name, will be passed to the job script. In fact, SLURM will not look at any of these arguments, so you must place all sbatch arguments before the script name, e.g.:<br />
<br />
sbatch -p debug jobscript.sbatch FirstArgument SecondArgument ...<br />
<br />
In this example, <tt>-p debug</tt> is interpreted by SLURM, while in your submission script you can access <tt>FirstArgument</tt>, <tt>SecondArgument</tt>, etc., by referring to <code>$1, $2, ...</code>.<br />
<br />
== Job arrays ==<br />
<br />
Sometimes you need to run the same job script many times, but just tweaking one value each time. One way of accomplishing this is using job arrays. Job arrays are invoked using the "-a" flag with sbatch:<br />
sbatch -a 1-100 myjobscript.sh<br />
This will submit 100 instances of myjobscript.sh. Within the job script you can distinguish which of those instances is running using the environment variable SLURM_ARRAY_TASK_ID.<br />
<br />
Note that Niagara [[#Limits | currently]] has a limit of 1000 submitted jobs for users within groups with allocations, and 200 submitted jobs without an allocation.<br />
<br />
== Job dependencies ==<br />
<br />
You can make one job dependent on the successful completion of another job using the following command:<br />
sbatch --dependency=afterok:JOBID myjobscript.sh<br />
This will make the current job submission not start until the parent job, with jobid JOBID, successfully completes. There are many job dependency options available. Visit the [https://slurm.schedmd.com/sbatch.html#OPT_dependency Slurm sbatch page ] for the full list. <br />
<br />
If the parent job fails (that is, ends with a non-zero exit code) the dependent job can never be scheduled and will be automatically cancelled.<br />
<br />
== Email Notification ==<br />
Email notification works, but you need to add the email address and type of notification you may want to receive in your submission script, eg.<br />
<br />
#SBATCH --mail-user=YOUR.email.ADDRESS<br />
#SBATCH --mail-type=ALL<br />
<br />
The sbatch man page (type <tt>man sbatch</tt> on Niagara) explains all possible mail-types.<br />
<br />
== Job Location Constraints ==<br />
<br />
=== Node types ===<br />
<br />
With the expansion of Niagara there are now two node types, 1548 Intel 6148 "skylake" CPU based nodes, and 468 Intel 6248 "cascadelake" CPU based nodes. By default a job will be placed on the first available nodes but will not span node types. You can specify a node type using one of the following directives to your submission script.<br />
<br />
#SBATCH --constraint=skylake <br />
#SBATCH --constraint=cascade<br />
<br />
=== EDR/HDR Infiniband Topology ===<br />
<br />
The Infiniband high speed network used for job communication and file I/O on Niagara consists of 5 1:1 subscribed "wings" that connected together in a dragonfly topology with adaptive routing enabled. 4 wings (dragonfly[1-4]) consist of EDR based skylake nodes and dragonfly5 contains all the of HDR100 cascadelake nodes. By default multi-node jobs will run on the first available nodes which could be all within 1 wing, or span across multiple wings, but not across node types. For most scalable parallel programs the performance difference should not be very significant, however if you wish keep your jobs from spanning wings you can use the following.<br />
<br />
#SBATCH --constraint=[dragonfly1|dragonfly2|dragonfly3|dragonfly4|dragonfly5]<br />
<br />
= Monitoring jobs =<br />
<br />
There are many options available for monitoring your jobs. The most basic of which is the squeue command:<br />
<br />
nia-login07:~$ squeue -u USERNAME<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
292047 compute myjob4 username PD 0:00 4 (Priority)<br />
292048 compute myjob3 username PD 0:00 4 (Priority)<br />
266829 compute myjob2 username R 18:56:17 2 nia[1397-1398]<br />
266828 compute myjob1 username R 18:56:46 1 nia1298<br />
<br />
Here you can see that we have two running jobs ('R') and two pending jobs ('PD'). The nodes being used are listed.<br />
<br />
== Job status ==<br />
<br />
To get an estimate of when a job will start, use the command<br />
squeue --start -j JOBID<br />
Note that this is only an estimate, and tends not to be very accurate.<br />
<br />
Information about a specific job can be found using the <br />
squeue -j JOBID<br />
or alternatively<br />
scontrol show job JOBID<br />
which is more verbose.<br />
<br />
== SSHing to a node ==<br />
<br />
Once your job has started, the node belongs to you. As such you may, from a login node, SSH into the node to check the performance of your job. The first step is to find out which nodes are being used (see above). Once you have your list of nodes, you can SSH into them directly. Once there, you can run the 'top' or 'free' commands to check both CPU and memory usage.<br />
<br />
== jobperf ==<br />
<br />
The jobperf script will give you feedback on the performance of your currently-running job:<br />
nia-login07:~$ jobperf 123456<br />
----------------------------------------------------------------------------------------------------<br />
RUNNING IDLE USER MEMORY(MB) PROCESS NAMES<br />
HOSTNAME # %CPU %MEM DISK SLEEP NAME RAMDISK USED AVAIL (excl:bash,sh,ssh,sshd)<br />
----------------------------------------------------------------------------------------------------<br />
nia1013 71 174% 0.5% 0 22 ejspence 0 15060 178017 14*gmx_mpi mpiexec slurm_script<br />
nia1014 79 192% 0.1% 0 18 ejspence 0 14803 178274 13*gmx_mpi<br />
nia1295 79 188% 0.4% 0 18 ejspence 0 15199 177878 13*gmx_mpi<br />
----------------------------------------------------------------------------------------------------<br />
<br />
Here you can see both the CPU and memory usage of the job, for all nodes being used.<br />
<br />
== Other commands ==<br />
<br />
Some other commands had can be useful for dealing with your jobs:<br />
* <code>scancel -i JOBID</code> cancels a specific job.<br />
* <code>sacct</code> gives information about your recent jobs.<br />
* <code>sinfo -p compute</code> gives a list of available nodes.<br />
* <code>qsum</code> gives a summary of the queue by user.<br />
<br />
= Example submission scripts =<br />
<br />
Here we present some examples of how to create submission scripts for running parallel jobs. Serial job examples can be found on the [[Running_Serial_Jobs_on_Niagara | serial jobs page]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=8<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li><p>First line indicates that this is a bash script.</p></li><br />
<li><p>Lines starting with <code>#SBATCH</code> go to SLURM.</p></li><br />
<li><p>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</p></li><br />
<li><p>In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.</p></li><br />
<li><p>Note that the mpifun flag "--ppn" (processors per node) is ignored.</p></li><br />
<li><p>Once it found such a node, it runs the script:</p><br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application.</li><br />
</ul><br />
<li>To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=5130Main Page2023-10-05T16:16:05Z<p>Bmundim: </p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up", "Partial" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{up |Niagara|Niagara_Quickstart}}<br />
|{{up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Up |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up |HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
'''Thu Oct 05, 2023, 12:05 PM EDT:''' Niagara scheduler is back online.<br />
<br />
'''Thu Oct 05, 2023, 11:50 AM EDT:''' Niagara scheduler is temporarily under maintenance for security updates. <br />
<br />
'''Tue Oct 31, 2023, 12:PM EDT - Fri Nov 3, 2023, 12:00 PM EDT:''' Three-day reservation for the "Niagara at Scale" event. Only "Niagara at Scale" projects will run on the compute nodes. Users are encouraged to submit small and short jobs that could run before this event. Throughout the event, users can still login, access their data, and submit jobs, but these jobs will not run until after the event. Note that the debugjob queue will remain available to everyone as well.<br />
<br />
<!-- When removing system status entries, please archive them to: --><br />
[[Previous messages]]<br />
<br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH#SSH Keys|SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=5127Main Page2023-10-05T15:43:48Z<p>Bmundim: /* System Status */</p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up", "Partial" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{up |Niagara|Niagara_Quickstart}}<br />
|{{up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Down |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up |HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
'''Thu Oct 05, 2023, 11:50 AM EDT:''' Niagara scheduler is temporarily under maintenance for security updates. <br />
<br />
'''Tue Oct 31, 2023, 12:PM EDT - Fri Nov 3, 2023, 12:00 PM EDT:''' Three-day reservation for the "Niagara at Scale" event. Only "Niagara at Scale" projects will run on the compute nodes. Users are encouraged to submit small and short jobs that could run before this event. Throughout the event, users can still login, access their data, and submit jobs, but these jobs will not run until after the event. Note that the debugjob queue will remain available to everyone as well.<br />
<br />
<!-- When removing system status entries, please archive them to: --><br />
[[Previous messages]]<br />
<br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH#SSH Keys|SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=5118Main Page2023-09-28T15:35:28Z<p>Bmundim: </p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up", "Partial" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{up |Niagara|Niagara_Quickstart}}<br />
|{{up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Up |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up |HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
''' Thu Sep 28, 2023 11:00 am''': Niagara scheduler is back online.<br />
<br />
''' Thu Sep 28, 2023 10:50 am''': Niagara scheduler is temporarily under maintenance for security updates.<br />
<br />
''' Wed Sep 27, 2023 11:35 am''': Mist login node is accessible again.<br />
<br />
''' Wed Sep 27, 2023 11:00 am''': Mist login node is under maintenance and temporarily inaccessible to users.<br />
<br />
''' Wed Sep 6, 2023 11:30 am''': Mist login node is accessible again.<br />
<br />
''' Wed Sep 6, 2023 11:00 am''': Mist login node is under maintenance and temporarily inaccessible to users.<br />
<br />
''' Fri Aug 25, 2023 0:19 am''': A power glitch brought some compute nodes down; users should resubmit any affected jobs. The Jupyterhub had to be restarted for the same reason.<br />
<br />
<!-- When removing system status entries, please archive them to: --><br />
[[Previous messages]]<br />
<br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH#SSH Keys|SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=5115Main Page2023-09-28T14:37:20Z<p>Bmundim: </p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up", "Partial" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{up |Niagara|Niagara_Quickstart}}<br />
|{{up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Down |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up |HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
''' Thu Sep 28, 2023 10:40 am''': Niagara scheduler is temporarily under maintenance for security updates.<br />
<br />
''' Wed Sep 27, 2023 11:35 am''': Mist login node is accessible again.<br />
<br />
''' Wed Sep 27, 2023 11:00 am''': Mist login node is under maintenance and temporarily inaccessible to users.<br />
<br />
''' Wed Sep 6, 2023 11:30 am''': Mist login node is accessible again.<br />
<br />
''' Wed Sep 6, 2023 11:00 am''': Mist login node is under maintenance and temporarily inaccessible to users.<br />
<br />
''' Fri Aug 25, 2023 0:19 am''': A power glitch brought some compute nodes down; users should resubmit any affected jobs. The Jupyterhub had to be restarted for the same reason.<br />
<br />
<!-- When removing system status entries, please archive them to: --><br />
[[Previous messages]]<br />
<br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH#SSH Keys|SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Running_Serial_Jobs_on_Niagara&diff=5016Running Serial Jobs on Niagara2023-07-28T20:25:50Z<p>Bmundim: /* Serial jobs of varying duration */</p>
<hr />
<div>===General considerations===<br />
<br />
====Use whole nodes...====<br />
<br />
When you submit a job to Niagara, it is run on one (or more than one) entire node - meaning that your job is occupying at least 40 processors for the duration of its run. The SciNet systems are usually fully utilized, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other researchers don't have to wait unnecessarily, and so that your jobs get as much work done as possible.<br />
<br />
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time. On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.<br />
<br />
====... memory permitting====<br />
<br />
When running multiple serial jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The Niagara compute nodes have about 200GB of memory available to user jobs running on the 40 cores, i.e., a bit over 4GB per core. So the jobs also have to be bunched in ways that will fit into 200GB. If they use more than this, it will crash the node, inconveniencing you and other researchers waiting for that node.<br />
<br />
If 40 serial jobs would not fit within the 200GB limit -- i.e. each individual job requires significantly in excess of ~4GB -- then it's allowed to just run fewer jobs so that they do fit. Note that in that case, the jobs are likely candidates for parallelization, and you can contact us at [mailto:support@scinet.utoronto.ca <support@scinet.utoronto.ca>] and arrange a meeting with one of the technical analysts to help you with that.<br />
<br />
If the memory requirements allow it, you could actually run more than 40 jobs at the same time, up to 80, exploiting the [[Niagara_Quickstart#Hyperthreading:_Logical_CPUs_vs._cores | HyperThreading]] feature of the Intel CPUs. It may seem counter-intuitive, but running 80 simultaneous jobs on 40 cores for certain types of tasks has increased some users overall throughput.<br />
<br />
====Is your job really serial?====<br />
<br />
While your program may not be explicitly parallel, it may use some of Niagara's threaded libraries for numerical computations, which can make use of multiple processors. In particular, Niagara's [[Python]] and [[R_Statistical_Package | R]] modules are compiled with aggressive optimization and using threaded numerical libraries which by default will make use of multiple cores for computations such as large matrix operations. This can greatly speed up individual runs, but by less (usually much less) than a factor of 40. If you do have many such threaded computations to do, you often get more calculations done per unit time if you turn off the threading and run multiple such computations at once (provided that fits in memory, as explained above). You can turn off threading of these libraries with the shell script line <tt>export OMP_NUM_THREADS=1</tt>; that line will be included in the scripts below. <br />
<br />
If your calculations implicitly use threading, you may want to experiment to see what gives you the best performance - you may find that running 4 (or even 8) jobs with 10 threads each (<tt>OMP_NUM_THREADS=10</tt>), or 2 jobs with 20 threads, gives better performance than 40 jobs with 1 thread (and almost certainly better than 1 job with 40 threads). We'd encourage to you to perform exactly such a scaling test to find the combination of number of threads per process and processes per job that maximizes your throughput; for a small up-front investment in time you may significantly speed up all the computations you need to do.<br />
<br />
===Serial jobs of similar duration===<br />
<br />
The most straightforward way to run multiple serial jobs is to bunch the serial jobs in groups of 40 or more that will take roughly the same amount of time, and create a job script that looks a <br />
bit like this<br />
<source lang="bash"><br />
#!/bin/bash<br />
# SLURM submission script for multiple serial jobs on Niagara<br />
#<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name serialx40<br />
<br />
# Turn off implicit threading in Python, R<br />
export OMP_NUM_THREADS=1<br />
<br />
# EXECUTION COMMAND; ampersand off 40 jobs and wait<br />
(cd serialjobdir01 && ./doserialjob01 && echo "job 01 finished") &<br />
(cd serialjobdir02 && ./doserialjob02 && echo "job 02 finished") &<br />
(cd serialjobdir03 && ./doserialjob03 && echo "job 03 finished") &<br />
(cd serialjobdir04 && ./doserialjob04 && echo "job 04 finished") &<br />
(cd serialjobdir05 && ./doserialjob05 && echo "job 05 finished") &<br />
(cd serialjobdir06 && ./doserialjob06 && echo "job 06 finished") &<br />
(cd serialjobdir07 && ./doserialjob07 && echo "job 07 finished") &<br />
(cd serialjobdir08 && ./doserialjob08 && echo "job 08 finished") &<br />
(cd serialjobdir09 && ./doserialjob09 && echo "job 09 finished") &<br />
(cd serialjobdir10 && ./doserialjob10 && echo "job 10 finished") &<br />
(cd serialjobdir11 && ./doserialjob11 && echo "job 11 finished") &<br />
(cd serialjobdir12 && ./doserialjob12 && echo "job 12 finished") &<br />
(cd serialjobdir13 && ./doserialjob13 && echo "job 13 finished") &<br />
(cd serialjobdir14 && ./doserialjob14 && echo "job 14 finished") &<br />
(cd serialjobdir15 && ./doserialjob15 && echo "job 15 finished") &<br />
(cd serialjobdir16 && ./doserialjob16 && echo "job 16 finished") &<br />
(cd serialjobdir17 && ./doserialjob17 && echo "job 17 finished") &<br />
(cd serialjobdir18 && ./doserialjob18 && echo "job 18 finished") &<br />
(cd serialjobdir19 && ./doserialjob19 && echo "job 19 finished") &<br />
(cd serialjobdir20 && ./doserialjob20 && echo "job 20 finished") &<br />
(cd serialjobdir21 && ./doserialjob21 && echo "job 21 finished") &<br />
(cd serialjobdir22 && ./doserialjob22 && echo "job 22 finished") &<br />
(cd serialjobdir23 && ./doserialjob23 && echo "job 23 finished") &<br />
(cd serialjobdir24 && ./doserialjob24 && echo "job 24 finished") &<br />
(cd serialjobdir25 && ./doserialjob25 && echo "job 25 finished") &<br />
(cd serialjobdir26 && ./doserialjob26 && echo "job 26 finished") &<br />
(cd serialjobdir27 && ./doserialjob27 && echo "job 27 finished") &<br />
(cd serialjobdir28 && ./doserialjob28 && echo "job 28 finished") &<br />
(cd serialjobdir29 && ./doserialjob29 && echo "job 29 finished") &<br />
(cd serialjobdir30 && ./doserialjob30 && echo "job 30 finished") &<br />
(cd serialjobdir31 && ./doserialjob31 && echo "job 31 finished") &<br />
(cd serialjobdir32 && ./doserialjob32 && echo "job 32 finished") &<br />
(cd serialjobdir33 && ./doserialjob33 && echo "job 33 finished") &<br />
(cd serialjobdir34 && ./doserialjob34 && echo "job 34 finished") &<br />
(cd serialjobdir35 && ./doserialjob35 && echo "job 35 finished") &<br />
(cd serialjobdir36 && ./doserialjob36 && echo "job 36 finished") &<br />
(cd serialjobdir37 && ./doserialjob37 && echo "job 37 finished") &<br />
(cd serialjobdir38 && ./doserialjob38 && echo "job 38 finished") &<br />
(cd serialjobdir39 && ./doserialjob39 && echo "job 39 finished") &<br />
(cd serialjobdir40 && ./doserialjob40 && echo "job 40 finished") &<br />
wait<br />
</source><br />
<br />
There are four important things to take note of here. First, the <tt>'''wait'''</tt><br />
command at the end is crucial; without it the job will terminate <br />
immediately, killing the 40 programs you just started.<br />
<br />
Second is that every serial job is running in its own directory; this is important because writing to the same directory from different processes can lead to slow down because of directory locking. How badly your job suffers from this depends on how much I/O your serial jobs are doing, but with 40 jobs on a node, it can quickly add up.<br />
<br />
Third is that it is important to group the programs by how long they <br />
will take. If (say) <tt>dojob08</tt> takes 2 hours and the rest only take 1, <br />
then for one hour 39 of the 40 cores on that Niagara node are wasted; they are <br />
sitting idle but are unavailable for other users, and the utilization of <br />
this node over the whole run is only 51%. This is the sort of thing <br />
we'll notice, and users who don't make efficient use of the machine will <br />
have their ability to use Niagara resources reduced. If you have many serial jobs of varying length, <br />
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].<br />
<br />
Fourth, if memory requirements allow it, you should try to run more than 40 jobs at once, with a maximum of 80 jobs.<br />
<br />
Finally, writing out 80 cases (or even just 40, as in the above example) can become highly tedious, as can keeping track of all these subjobs. You should consider using a tool that automates this, like:<br />
<br />
===GNU Parallel===<br />
<br />
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in<br />
parallel. It allows you to keep the processors on each 40-core node busy, if you provide enough jobs to do.<br />
<br />
GNU parallel is accessible on Niagara in the module<br />
<tt>gnu-parallel</tt>:<br />
<source lang="bash"><br />
module load NiaEnv/2019b gnu-parallel<br />
</source><br />
This also switches to the newer NiaEnv/2019b stack. The current version of the GNU parallel module in that stack is 20191122. In the older stack, NiaEnv/2018a (which is loaded by default), the version of GNU parallel is 20180322. <br />
<br />
The command <tt>man parallel_tutorial</tt> shows much of GNU parallel's functionality, while <tt>man parallel</tt> gives the details of its syntax.<br />
<br />
The citation for GNU Parallel is: O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.<br />
<br />
It is easiest to demonstrate the usage of GNU parallel by<br />
examples. First, suppose you have 80 jobs to do (similar to the above case), and that these jobs duration varies quite a bit, but that the average job duration is around 5 hours. You could use the following script (but don't, see below):<br />
<source lang="bash"><br />
#!/bin/bash<br />
# SLURM submission script for multiple serial jobs on Niagara<br />
#<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=12:00:00<br />
#SBATCH --job-name gnu-parallel-example<br />
<br />
# Turn off implicit threading in Python, R<br />
export OMP_NUM_THREADS=1<br />
<br />
module load NiaEnv/2019b gnu-parallel<br />
<br />
# EXECUTION COMMAND - DON'T USE THIS ONE<br />
parallel -j $SLURM_TASKS_PER_NODE <<EOF<br />
cd serialjobdir01 && ./doserialjob01 && echo "job 01 finished"<br />
cd serialjobdir02 && ./doserialjob02 && echo "job 02 finished"<br />
...<br />
cd serialjobdir80 && ./doserialjob80 && echo "job 80 finished"<br />
EOF<br />
</source><br />
<br />
The <tt>-j $SLURM_TASKS_PER_NODE</tt> parameter sets the number of jobs to run at the same time on each compute node, and is using the slurm value, which coincides with the <tt>--ntasks-per-node</tt> parameter. For gpu-parallel modules starting from version 20191122, if you omit the option <tt>-j $SLURM_TASKS_PER_NODE</tt>, you will get as many simultaneous subjobs as the <tt>ntask-per-node</tt> parameter you specify in the <tt>#SBATCH</tt> part of the jobs script.<br />
<br />
Each line in the input given to parallel is a separate subjob, so 80 jobs are lined up to run. Initially, 40 subjobs are given to the 40 processors on the node. When one of the processors is done with its assigned subjob, it will get a next subjob instead of sitting idle until the other processors are done. While you would expect that on average this script should take 10 hours (each processor on average has to complete two jobs of 5 hours), there's a good chance that one of the processors gets two jobs that take more than 5 hours, so the job script requests 12 hours to be safe. How much more time you should ask for in practice depends on the spread in expected run times of the separate jobs.<br />
<br />
===Serial jobs of varying duration===<br />
<br />
The script above works, and can be extended to more subjobs, which is especially important if you have to do a lot (100+) of relatively short serial runs '''of which the walltime varies'''. But it gets tedious to write out all the cases. You could write a script to automate this, but you do not have to, because GNU Parallel already has ways of generating subjobs, as we will show below.<br />
<br />
GNU Parallel can also keep track of the subjobs with succeeded, failed, or never started. For that, you just add <tt>--joblog</tt> to the parallel command followed by a filename to which to write the status:<br />
<br />
<source lang="bash" line start=17><br />
# EXECUTION COMMAND - DON'T USE THIS ONE<br />
parallel --joblog slurm-$SLURM_JOBID.log -j $SLURM_TASKS_PER_NODE <<EOF<br />
cd serialjobdir01 && ./doserialjob01<br />
cd serialjobdir02 && ./doserialjob02<br />
...<br />
cd serialjobdir80 && ./doserialjob80<br />
EOF<br />
</source><br />
<br />
In this case, the job log gets written to "slurm-$SLURM_JOBID.log", where "<tt>$SLURM_JOBID</tt>" will be replaced by the job number. The joblog can also be used to retry failed jobs (more below).<br />
<br />
Second, we can generate that set of subjobs instead of writing them out by hand. The following does the trick:<br />
<br />
<source lang="bash" line start=17><br />
# EXECUTION COMMAND <br />
parallel --joblog slurm-$SLURM_JOBID.log -j $SLURM_TASKS_PER_NODE "cd serialjobdir{} && ./doserialjob{}" ::: {01..80}<br />
</source><br />
<br />
This works as follows: <tt>"cd serialjobdir{} && ./doserialjob{}"</tt> is a template command, with placeholders {}. <tt>:::</tt> indicated that a set of parameters follows that are to be put into the template, thus generating the commands for each subjob. After the <tt>:::</tt> we can place a space-separated set of arguments, which in this case are generated using the bash-specific construct for a range, <tt>{01..80}</tt>.<br />
<br />
The final script now looks like this:<br />
<source lang="bash"><br />
#!/bin/bash<br />
# SLURM submission script for multiple serial jobs on Niagara<br />
#<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=12:00:00<br />
#SBATCH --job-name gnu-parallel-example<br />
<br />
# Turn off implicit threading in Python, R<br />
export OMP_NUM_THREADS=1<br />
<br />
module load NiaEnv/2019b gnu-parallel <br />
<br />
# EXECUTION COMMAND <br />
parallel --joblog slurm-$SLURM_JOBID.log "cd serialjobdir{} && ./doserialjob{}" ::: {01..80}<br />
</source><br />
<br />
Notes:<br />
* As before, GNU Parallel keeps 40 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.<br />
* The <tt>-j</tt> option was omitted, which works if using GNU Parallel module version 20191122 or higher. Otherwise, you need to add the <tt>-j $SLURM_TASKS_PER_NODE</tt> flag to the parallel command. <br />
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option. <br />
** When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.<br />
** More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk | Ramdisk page]].<br />
* This script optimizes resource utility, but can only use 1 node (40 cores) at a time. The next section addresses how to use more nodes.<br />
* While on the command line, the option "--bar" can be nice to see the progress, when running as a job, you would not see this status bar. <br />
* The <tt>--joblog</tt> parameter also keeps track of failed or unfinished jobs, so you can later try to redo those with the same command, but with the option "--resume" added.<br />
* If it happens that your serial jobs are running out of memory and being killed by the system, the <tt>--memfree size</tt> option can be helpful. It sets the minimum memory free when starting another job. On Niagara, <tt>size</tt> could be set to <tt>15000M</tt> for example to match what the RealMemory slurm configuration provides to users on compute nodes. You might have to adjust it if your jobs do make use of ramdisk to hold data for example.<br />
<br />
===Version for more than 1 node at once===<br />
<br />
If you have many hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. Alternatively, it is possible to request more than one node and to use the following routine to distribute your processes amongst the cores.<br />
<br />
Although it is not recommended to use GNU parallel modules before version 20191122, if you do, the script should look like this:<br />
<source lang="bash"><br />
#!/bin/bash<br />
# SLURM submission script for multiple serial jobs on multiple Niagara nodes<br />
#<br />
#SBATCH --nodes=4<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=12:00:00<br />
#SBATCH --job-name gnu-parallel-multinode-example<br />
<br />
# Turn off implicit threading in Python, R<br />
export OMP_NUM_THREADS=1<br />
<br />
module load gnu-parallel<br />
<br />
HOSTS=$(scontrol show hostnames $SLURM_NODELIST | tr '\n' ,)<br />
NCORES=40<br />
<br />
parallel --env OMP_NUM_THREADS,PATH,LD_LIBRARY_PATH --joblog slurm-$SLURM_JOBID.log -j $NCORES -S $HOSTS --wd $PWD "cd serialjobdir{} && ./doserialjob{}" ::: {001..800}<br />
<br />
</source><br />
<br />
* The parameter <tt>-S $HOSTS</tt> divides the work over different nodes. <tt>$HOSTS</tt> should be a comma separated list of the node names. These node names are also stored in <tt>$SLURM_NODELIST</tt>, but with a syntax that allows for ranges, which GNU parallel does not understand. The <tt>scontrol</tt> command in the script above fixes that.<br />
* Alternatively, GNU Parallel can be passed a file with the list of nodes to which to ssh, using <tt>--sshloginfile</tt>, but your jobs script would first have to create that file.<br />
* The parameter <tt>-j $NCORES</tt> tells <tt>parallel</tt> to run 40 subjobs simultaneously on each of the nodes (note: do not use the similarly named variable $SLURM_TASKS_PER_NODE as its format is incompatible with GNU parallel).<br />
* The parameter <tt>--wd $PWD</tt> sets the working directory on the other nodes to the working directory on the first node. <span style="color:red;">The <tt>--wd</tt> argument is essential:</span> without this, the run tries to start from the wrong place and will most likely fail.<br />
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use the <tt>--env ENVIRONMENTVARIABLE</tt> argument for the parallel command. The example above copies the most common variables that a remote command may need.<br />
<br />
Instead of this script using an old version of GNU parallel, we recommend using GNU parallel modules starting from version 20191122 that is available in NiaEnv/2019b, <br />
which facilitate automatic distribution of subjobs over nodes. For these newer versions of the module, the script can look like this:<br />
<source lang="bash"><br />
#!/bin/bash<br />
# SLURM submission script for multiple serial jobs on multiple Niagara nodes<br />
#<br />
#SBATCH --nodes=4<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=12:00:00<br />
#SBATCH --job-name gnu-parallel-multinode-example<br />
<br />
# Turn off implicit threading in Python, R<br />
export OMP_NUM_THREADS=1<br />
<br />
module load NiaEnv/2019b gnu-parallel<br />
<br />
parallel --joblog slurm-$SLURM_JOBID.log --wd $PWD "cd serialjobdir{} && ./doserialjob{}" ::: {001..800}<br />
<br />
</source><br />
* The mechanism of the automation of the number of tasks per nodes and the node names that GNU Parallel can use, is all through the environment variable <tt>$PARALLEL</tt>, which is set by the gnu-parallel module.<br />
* The parameter <tt>--wd $PWD</tt> sets the working directory on the other nodes to the working directory on the first node. <span style="color:red;">The <tt>--wd</tt> argument is essential:</span> without this, the run tries to start from the wrong place and will most likely fail.<br />
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use the <tt>--env ENVIRONMENTVARIABLE</tt> argument for the parallel command. The <tt>$PARALLEL</tt> environment variable is already set to copy the most common variables <tt>$PATH, $LD_LIBRARY_PATH, and $OMP_NUM_THREADS</tt>.<br />
<br />
Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.<br />
<br />
Submitting several bunches to single nodes, as in the section above, is a more fail-safe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. <br />
<br />
We reiterate that if memory requirements allow it, you should try to run more than 40 jobs at once, with a maximum of 80 jobs. The way the above example job script are written, you simple change <tt>#SBATCH --ntasks-per-node=40</tt> to <tt>#SBATCH --ntasks-per-node=80</tt> to accomplish this.<br />
<br />
===More on GNU parallel=== <br />
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/ .<br />
* After loading the <tt>gnu-parallel</tt> module, type <tt>man parallel_tutorial</tt><br />
* After loading the <tt>gnu-parallel</tt> module, type <tt>man parallel</tt><br/>The man page can also be found at http://www.gnu.org/software/parallel/man.html .<br />
* Watch a [https://www.youtube.com/watch?v=2tVpUfND3LI&t=1852s recording of a Compute Ontario Colloquium</a> on GNU parallel].<br />
<br />
===GNU Parallel Reference===<br />
<br />
The author of GNU parallel request that when using GNU parallel for a publication, you please cite:<br />
<br />
* O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Running_Serial_Jobs_on_Niagara&diff=5013Running Serial Jobs on Niagara2023-07-28T20:17:00Z<p>Bmundim: /* Serial jobs of varying duration */</p>
<hr />
<div>===General considerations===<br />
<br />
====Use whole nodes...====<br />
<br />
When you submit a job to Niagara, it is run on one (or more than one) entire node - meaning that your job is occupying at least 40 processors for the duration of its run. The SciNet systems are usually fully utilized, with many researchers waiting in the queue for computational resources, so we require that you make full use of the nodes that your job is allocated, so other researchers don't have to wait unnecessarily, and so that your jobs get as much work done as possible.<br />
<br />
Often, the best way to make full use of the node is to run one large parallel computation; but sometimes it is beneficial to run several serial codes at the same time. On this page, we discuss ways to run suites of serial computations at once, as efficiently as possible, using the full resources of the node.<br />
<br />
====... memory permitting====<br />
<br />
When running multiple serial jobs on the same node, it is essential to have a good idea of how much memory the jobs will require. The Niagara compute nodes have about 200GB of memory available to user jobs running on the 40 cores, i.e., a bit over 4GB per core. So the jobs also have to be bunched in ways that will fit into 200GB. If they use more than this, it will crash the node, inconveniencing you and other researchers waiting for that node.<br />
<br />
If 40 serial jobs would not fit within the 200GB limit -- i.e. each individual job requires significantly in excess of ~4GB -- then it's allowed to just run fewer jobs so that they do fit. Note that in that case, the jobs are likely candidates for parallelization, and you can contact us at [mailto:support@scinet.utoronto.ca <support@scinet.utoronto.ca>] and arrange a meeting with one of the technical analysts to help you with that.<br />
<br />
If the memory requirements allow it, you could actually run more than 40 jobs at the same time, up to 80, exploiting the [[Niagara_Quickstart#Hyperthreading:_Logical_CPUs_vs._cores | HyperThreading]] feature of the Intel CPUs. It may seem counter-intuitive, but running 80 simultaneous jobs on 40 cores for certain types of tasks has increased some users overall throughput.<br />
<br />
====Is your job really serial?====<br />
<br />
While your program may not be explicitly parallel, it may use some of Niagara's threaded libraries for numerical computations, which can make use of multiple processors. In particular, Niagara's [[Python]] and [[R_Statistical_Package | R]] modules are compiled with aggressive optimization and using threaded numerical libraries which by default will make use of multiple cores for computations such as large matrix operations. This can greatly speed up individual runs, but by less (usually much less) than a factor of 40. If you do have many such threaded computations to do, you often get more calculations done per unit time if you turn off the threading and run multiple such computations at once (provided that fits in memory, as explained above). You can turn off threading of these libraries with the shell script line <tt>export OMP_NUM_THREADS=1</tt>; that line will be included in the scripts below. <br />
<br />
If your calculations implicitly use threading, you may want to experiment to see what gives you the best performance - you may find that running 4 (or even 8) jobs with 10 threads each (<tt>OMP_NUM_THREADS=10</tt>), or 2 jobs with 20 threads, gives better performance than 40 jobs with 1 thread (and almost certainly better than 1 job with 40 threads). We'd encourage to you to perform exactly such a scaling test to find the combination of number of threads per process and processes per job that maximizes your throughput; for a small up-front investment in time you may significantly speed up all the computations you need to do.<br />
<br />
===Serial jobs of similar duration===<br />
<br />
The most straightforward way to run multiple serial jobs is to bunch the serial jobs in groups of 40 or more that will take roughly the same amount of time, and create a job script that looks a <br />
bit like this<br />
<source lang="bash"><br />
#!/bin/bash<br />
# SLURM submission script for multiple serial jobs on Niagara<br />
#<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name serialx40<br />
<br />
# Turn off implicit threading in Python, R<br />
export OMP_NUM_THREADS=1<br />
<br />
# EXECUTION COMMAND; ampersand off 40 jobs and wait<br />
(cd serialjobdir01 && ./doserialjob01 && echo "job 01 finished") &<br />
(cd serialjobdir02 && ./doserialjob02 && echo "job 02 finished") &<br />
(cd serialjobdir03 && ./doserialjob03 && echo "job 03 finished") &<br />
(cd serialjobdir04 && ./doserialjob04 && echo "job 04 finished") &<br />
(cd serialjobdir05 && ./doserialjob05 && echo "job 05 finished") &<br />
(cd serialjobdir06 && ./doserialjob06 && echo "job 06 finished") &<br />
(cd serialjobdir07 && ./doserialjob07 && echo "job 07 finished") &<br />
(cd serialjobdir08 && ./doserialjob08 && echo "job 08 finished") &<br />
(cd serialjobdir09 && ./doserialjob09 && echo "job 09 finished") &<br />
(cd serialjobdir10 && ./doserialjob10 && echo "job 10 finished") &<br />
(cd serialjobdir11 && ./doserialjob11 && echo "job 11 finished") &<br />
(cd serialjobdir12 && ./doserialjob12 && echo "job 12 finished") &<br />
(cd serialjobdir13 && ./doserialjob13 && echo "job 13 finished") &<br />
(cd serialjobdir14 && ./doserialjob14 && echo "job 14 finished") &<br />
(cd serialjobdir15 && ./doserialjob15 && echo "job 15 finished") &<br />
(cd serialjobdir16 && ./doserialjob16 && echo "job 16 finished") &<br />
(cd serialjobdir17 && ./doserialjob17 && echo "job 17 finished") &<br />
(cd serialjobdir18 && ./doserialjob18 && echo "job 18 finished") &<br />
(cd serialjobdir19 && ./doserialjob19 && echo "job 19 finished") &<br />
(cd serialjobdir20 && ./doserialjob20 && echo "job 20 finished") &<br />
(cd serialjobdir21 && ./doserialjob21 && echo "job 21 finished") &<br />
(cd serialjobdir22 && ./doserialjob22 && echo "job 22 finished") &<br />
(cd serialjobdir23 && ./doserialjob23 && echo "job 23 finished") &<br />
(cd serialjobdir24 && ./doserialjob24 && echo "job 24 finished") &<br />
(cd serialjobdir25 && ./doserialjob25 && echo "job 25 finished") &<br />
(cd serialjobdir26 && ./doserialjob26 && echo "job 26 finished") &<br />
(cd serialjobdir27 && ./doserialjob27 && echo "job 27 finished") &<br />
(cd serialjobdir28 && ./doserialjob28 && echo "job 28 finished") &<br />
(cd serialjobdir29 && ./doserialjob29 && echo "job 29 finished") &<br />
(cd serialjobdir30 && ./doserialjob30 && echo "job 30 finished") &<br />
(cd serialjobdir31 && ./doserialjob31 && echo "job 31 finished") &<br />
(cd serialjobdir32 && ./doserialjob32 && echo "job 32 finished") &<br />
(cd serialjobdir33 && ./doserialjob33 && echo "job 33 finished") &<br />
(cd serialjobdir34 && ./doserialjob34 && echo "job 34 finished") &<br />
(cd serialjobdir35 && ./doserialjob35 && echo "job 35 finished") &<br />
(cd serialjobdir36 && ./doserialjob36 && echo "job 36 finished") &<br />
(cd serialjobdir37 && ./doserialjob37 && echo "job 37 finished") &<br />
(cd serialjobdir38 && ./doserialjob38 && echo "job 38 finished") &<br />
(cd serialjobdir39 && ./doserialjob39 && echo "job 39 finished") &<br />
(cd serialjobdir40 && ./doserialjob40 && echo "job 40 finished") &<br />
wait<br />
</source><br />
<br />
There are four important things to take note of here. First, the <tt>'''wait'''</tt><br />
command at the end is crucial; without it the job will terminate <br />
immediately, killing the 40 programs you just started.<br />
<br />
Second is that every serial job is running in its own directory; this is important because writing to the same directory from different processes can lead to slow down because of directory locking. How badly your job suffers from this depends on how much I/O your serial jobs are doing, but with 40 jobs on a node, it can quickly add up.<br />
<br />
Third is that it is important to group the programs by how long they <br />
will take. If (say) <tt>dojob08</tt> takes 2 hours and the rest only take 1, <br />
then for one hour 39 of the 40 cores on that Niagara node are wasted; they are <br />
sitting idle but are unavailable for other users, and the utilization of <br />
this node over the whole run is only 51%. This is the sort of thing <br />
we'll notice, and users who don't make efficient use of the machine will <br />
have their ability to use Niagara resources reduced. If you have many serial jobs of varying length, <br />
use the submission script to balance the computational load, as explained [[ #Serial jobs of varying duration | below]].<br />
<br />
Fourth, if memory requirements allow it, you should try to run more than 40 jobs at once, with a maximum of 80 jobs.<br />
<br />
Finally, writing out 80 cases (or even just 40, as in the above example) can become highly tedious, as can keeping track of all these subjobs. You should consider using a tool that automates this, like:<br />
<br />
===GNU Parallel===<br />
<br />
GNU parallel is a really nice tool written by Ole Tange to run multiple serial jobs in<br />
parallel. It allows you to keep the processors on each 40-core node busy, if you provide enough jobs to do.<br />
<br />
GNU parallel is accessible on Niagara in the module<br />
<tt>gnu-parallel</tt>:<br />
<source lang="bash"><br />
module load NiaEnv/2019b gnu-parallel<br />
</source><br />
This also switches to the newer NiaEnv/2019b stack. The current version of the GNU parallel module in that stack is 20191122. In the older stack, NiaEnv/2018a (which is loaded by default), the version of GNU parallel is 20180322. <br />
<br />
The command <tt>man parallel_tutorial</tt> shows much of GNU parallel's functionality, while <tt>man parallel</tt> gives the details of its syntax.<br />
<br />
The citation for GNU Parallel is: O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.<br />
<br />
It is easiest to demonstrate the usage of GNU parallel by<br />
examples. First, suppose you have 80 jobs to do (similar to the above case), and that these jobs duration varies quite a bit, but that the average job duration is around 5 hours. You could use the following script (but don't, see below):<br />
<source lang="bash"><br />
#!/bin/bash<br />
# SLURM submission script for multiple serial jobs on Niagara<br />
#<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=12:00:00<br />
#SBATCH --job-name gnu-parallel-example<br />
<br />
# Turn off implicit threading in Python, R<br />
export OMP_NUM_THREADS=1<br />
<br />
module load NiaEnv/2019b gnu-parallel<br />
<br />
# EXECUTION COMMAND - DON'T USE THIS ONE<br />
parallel -j $SLURM_TASKS_PER_NODE <<EOF<br />
cd serialjobdir01 && ./doserialjob01 && echo "job 01 finished"<br />
cd serialjobdir02 && ./doserialjob02 && echo "job 02 finished"<br />
...<br />
cd serialjobdir80 && ./doserialjob80 && echo "job 80 finished"<br />
EOF<br />
</source><br />
<br />
The <tt>-j $SLURM_TASKS_PER_NODE</tt> parameter sets the number of jobs to run at the same time on each compute node, and is using the slurm value, which coincides with the <tt>--ntasks-per-node</tt> parameter. For gpu-parallel modules starting from version 20191122, if you omit the option <tt>-j $SLURM_TASKS_PER_NODE</tt>, you will get as many simultaneous subjobs as the <tt>ntask-per-node</tt> parameter you specify in the <tt>#SBATCH</tt> part of the jobs script.<br />
<br />
Each line in the input given to parallel is a separate subjob, so 80 jobs are lined up to run. Initially, 40 subjobs are given to the 40 processors on the node. When one of the processors is done with its assigned subjob, it will get a next subjob instead of sitting idle until the other processors are done. While you would expect that on average this script should take 10 hours (each processor on average has to complete two jobs of 5 hours), there's a good chance that one of the processors gets two jobs that take more than 5 hours, so the job script requests 12 hours to be safe. How much more time you should ask for in practice depends on the spread in expected run times of the separate jobs.<br />
<br />
===Serial jobs of varying duration===<br />
<br />
The script above works, and can be extended to more subjobs, which is especially important if you have to do a lot (100+) of relatively short serial runs '''of which the walltime varies'''. But it gets tedious to write out all the cases. You could write a script to automate this, but you do not have to, because GNU Parallel already has ways of generating subjobs, as we will show below.<br />
<br />
GNU Parallel can also keep track of the subjobs with succeeded, failed, or never started. For that, you just add <tt>--joblog</tt> to the parallel command followed by a filename to which to write the status:<br />
<br />
<source lang="bash" line start=17><br />
# EXECUTION COMMAND - DON'T USE THIS ONE<br />
parallel --joblog slurm-$SLURM_JOBID.log -j $SLURM_TASKS_PER_NODE <<EOF<br />
cd serialjobdir01 && ./doserialjob01<br />
cd serialjobdir02 && ./doserialjob02<br />
...<br />
cd serialjobdir80 && ./doserialjob80<br />
EOF<br />
</source><br />
<br />
In this case, the job log gets written to "slurm-$SLURM_JOBID.log", where "<tt>$SLURM_JOBID</tt>" will be replaced by the job number. The joblog can also be used to retry failed jobs (more below).<br />
<br />
Second, we can generate that set of subjobs instead of writing them out by hand. The following does the trick:<br />
<br />
<source lang="bash" line start=17><br />
# EXECUTION COMMAND <br />
parallel --joblog slurm-$SLURM_JOBID.log -j $SLURM_TASKS_PER_NODE "cd serialjobdir{} && ./doserialjob{}" ::: {01..80}<br />
</source><br />
<br />
This works as follows: <tt>"cd serialjobdir{} && ./doserialjob{}"</tt> is a template command, with placeholders {}. <tt>:::</tt> indicated that a set of parameters follows that are to be put into the template, thus generating the commands for each subjob. After the <tt>:::</tt> we can place a space-separated set of arguments, which in this case are generated using the bash-specific construct for a range, <tt>{01..80}</tt>.<br />
<br />
The final script now looks like this:<br />
<source lang="bash"><br />
#!/bin/bash<br />
# SLURM submission script for multiple serial jobs on Niagara<br />
#<br />
#SBATCH --nodes=1<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=12:00:00<br />
#SBATCH --job-name gnu-parallel-example<br />
<br />
# Turn off implicit threading in Python, R<br />
export OMP_NUM_THREADS=1<br />
<br />
module load NiaEnv/2019b gnu-parallel <br />
<br />
# EXECUTION COMMAND <br />
parallel --joblog slurm-$SLURM_JOBID.log "cd serialjobdir{} && ./doserialjob{}" ::: {01..80}<br />
</source><br />
<br />
Notes:<br />
* As before, GNU Parallel keeps 40 jobs running at a time, and if one finishes, starts the next. This is an easy way to do ''load balancing''.<br />
* The <tt>-j</tt> option was omitted, which works if using GNU Parallel module version 20191122 or higher. Otherwise, you need to add the <tt>-j $SLURM_TASKS_PER_NODE</tt> flag to the parallel command. <br />
* Doing many serial jobs often entails doing many disk reads and writes, which can be detrimental to the performance. In that case, running from the ramdisk may be an option. <br />
** When using a ramdisk, make sure you copy your results from the ramdisk back to the scratch after the runs, or when the job is killed because time has run out.<br />
** More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk | Ramdisk page]].<br />
* This script optimizes resource utility, but can only use 1 node (40 cores) at a time. The next section addresses how to use more nodes.<br />
* While on the command line, the option "--bar" can be nice to see the progress, when running as a job, you would not see this status bar. <br />
* The <tt>--joblog</tt> parameter also keeps track of failed or unfinished jobs, so you can later try to redo those with the same command, but with the option "--resume" added.<br />
* If it happens that your serial jobs are running out of memory and being killed by the system, the <tt>--memfree size</tt> option can be helpful. It sets the minimum memory free when starting another job. On Niagara, <tt>size</tt> could be set to <tt>15000M</tt> for example to match the RealMemory slurm configuration on the compute nodes. You might have to adjust it if your jobs do make use of ramdisk to hold data for example.<br />
<br />
===Version for more than 1 node at once===<br />
<br />
If you have many hundreds of serial jobs that you want to run concurrently and the nodes are available, then the approach above, while useful, would require tens of scripts to be submitted separately. Alternatively, it is possible to request more than one node and to use the following routine to distribute your processes amongst the cores.<br />
<br />
Although it is not recommended to use GNU parallel modules before version 20191122, if you do, the script should look like this:<br />
<source lang="bash"><br />
#!/bin/bash<br />
# SLURM submission script for multiple serial jobs on multiple Niagara nodes<br />
#<br />
#SBATCH --nodes=4<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=12:00:00<br />
#SBATCH --job-name gnu-parallel-multinode-example<br />
<br />
# Turn off implicit threading in Python, R<br />
export OMP_NUM_THREADS=1<br />
<br />
module load gnu-parallel<br />
<br />
HOSTS=$(scontrol show hostnames $SLURM_NODELIST | tr '\n' ,)<br />
NCORES=40<br />
<br />
parallel --env OMP_NUM_THREADS,PATH,LD_LIBRARY_PATH --joblog slurm-$SLURM_JOBID.log -j $NCORES -S $HOSTS --wd $PWD "cd serialjobdir{} && ./doserialjob{}" ::: {001..800}<br />
<br />
</source><br />
<br />
* The parameter <tt>-S $HOSTS</tt> divides the work over different nodes. <tt>$HOSTS</tt> should be a comma separated list of the node names. These node names are also stored in <tt>$SLURM_NODELIST</tt>, but with a syntax that allows for ranges, which GNU parallel does not understand. The <tt>scontrol</tt> command in the script above fixes that.<br />
* Alternatively, GNU Parallel can be passed a file with the list of nodes to which to ssh, using <tt>--sshloginfile</tt>, but your jobs script would first have to create that file.<br />
* The parameter <tt>-j $NCORES</tt> tells <tt>parallel</tt> to run 40 subjobs simultaneously on each of the nodes (note: do not use the similarly named variable $SLURM_TASKS_PER_NODE as its format is incompatible with GNU parallel).<br />
* The parameter <tt>--wd $PWD</tt> sets the working directory on the other nodes to the working directory on the first node. <span style="color:red;">The <tt>--wd</tt> argument is essential:</span> without this, the run tries to start from the wrong place and will most likely fail.<br />
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use the <tt>--env ENVIRONMENTVARIABLE</tt> argument for the parallel command. The example above copies the most common variables that a remote command may need.<br />
<br />
Instead of this script using an old version of GNU parallel, we recommend using GNU parallel modules starting from version 20191122 that is available in NiaEnv/2019b, <br />
which facilitate automatic distribution of subjobs over nodes. For these newer versions of the module, the script can look like this:<br />
<source lang="bash"><br />
#!/bin/bash<br />
# SLURM submission script for multiple serial jobs on multiple Niagara nodes<br />
#<br />
#SBATCH --nodes=4<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=12:00:00<br />
#SBATCH --job-name gnu-parallel-multinode-example<br />
<br />
# Turn off implicit threading in Python, R<br />
export OMP_NUM_THREADS=1<br />
<br />
module load NiaEnv/2019b gnu-parallel<br />
<br />
parallel --joblog slurm-$SLURM_JOBID.log --wd $PWD "cd serialjobdir{} && ./doserialjob{}" ::: {001..800}<br />
<br />
</source><br />
* The mechanism of the automation of the number of tasks per nodes and the node names that GNU Parallel can use, is all through the environment variable <tt>$PARALLEL</tt>, which is set by the gnu-parallel module.<br />
* The parameter <tt>--wd $PWD</tt> sets the working directory on the other nodes to the working directory on the first node. <span style="color:red;">The <tt>--wd</tt> argument is essential:</span> without this, the run tries to start from the wrong place and will most likely fail.<br />
* If you need an environment variable to be transfered from the job script to the remotely running subjobs, use the <tt>--env ENVIRONMENTVARIABLE</tt> argument for the parallel command. The <tt>$PARALLEL</tt> environment variable is already set to copy the most common variables <tt>$PATH, $LD_LIBRARY_PATH, and $OMP_NUM_THREADS</tt>.<br />
<br />
Of course, this is just an example of what you could do with gnu parallel. How you set up your specific run depends on how each of the runs would be started. One could for instance also prepare a file of commands to run and make that the input to parallel as well.<br />
<br />
Submitting several bunches to single nodes, as in the section above, is a more fail-safe way of proceeding, since a node failure would only affect one of these bunches, rather than all runs. <br />
<br />
We reiterate that if memory requirements allow it, you should try to run more than 40 jobs at once, with a maximum of 80 jobs. The way the above example job script are written, you simple change <tt>#SBATCH --ntasks-per-node=40</tt> to <tt>#SBATCH --ntasks-per-node=80</tt> to accomplish this.<br />
<br />
===More on GNU parallel=== <br />
* The documentation for GNU parallel can be found at http://www.gnu.org/software/parallel/ .<br />
* After loading the <tt>gnu-parallel</tt> module, type <tt>man parallel_tutorial</tt><br />
* After loading the <tt>gnu-parallel</tt> module, type <tt>man parallel</tt><br/>The man page can also be found at http://www.gnu.org/software/parallel/man.html .<br />
* Watch a [https://www.youtube.com/watch?v=2tVpUfND3LI&t=1852s recording of a Compute Ontario Colloquium</a> on GNU parallel].<br />
<br />
===GNU Parallel Reference===<br />
<br />
The author of GNU parallel request that when using GNU parallel for a publication, you please cite:<br />
<br />
* O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=4977Main Page2023-06-21T20:04:50Z<p>Bmundim: /* System Status */</p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up", "Partial" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{Up |Niagara|Niagara_Quickstart}}<br />
|{{Up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{Up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Up |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up |HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
'''Wed Jun 21 16:03:45 EDT 2023:''' Niagara's scheduler maintenance is finished.<br />
<br />
'''Wed Jun 21 15:42:00 EDT 2023:''' Niagara's scheduler is rebooting in 10 minutes for a short maintenance down time.<br />
<br />
'''Wed Jun 21, 2023, 11:25 AM EDT:''' Maintenance is finished and Teach cluster is accessible again.<br />
<br />
'''Tue Jun 20, 2023, 9:55 AM EDT:''' Teach cluster is powered off for maintenance.<br />
<br />
<!-- When removing system status entries, please archive them to: --><br />
[[Previous messages]]<br />
<br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH#SSH Keys|SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=4974Main Page2023-06-21T20:04:26Z<p>Bmundim: /* System Status */</p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up", "Partial" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{Up |Niagara|Niagara_Quickstart}}<br />
|{{Up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{Up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Up |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up |HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
'''Wed Jun 21 16:03:45 EDT 2023:''' Niagara's scheduler maintenance is finished.<br />
'''Wed Jun 21 15:42:00 EDT 2023:''' Niagara's scheduler is rebooting in 10 minutes for a short maintenance down time.<br />
<br />
'''Wed Jun 21, 2023, 11:25 AM EDT:''' Maintenance is finished and Teach cluster is accessible again.<br />
<br />
'''Tue Jun 20, 2023, 9:55 AM EDT:''' Teach cluster is powered off for maintenance.<br />
<br />
<!-- When removing system status entries, please archive them to: --><br />
[[Previous messages]]<br />
<br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH#SSH Keys|SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=4968Main Page2023-06-21T19:44:09Z<p>Bmundim: /* System Status */</p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up", "Partial" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{Up |Niagara|Niagara_Quickstart}}<br />
|{{Up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{Up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Up |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up |HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
'''Wed Jun 21 15:42:00 EDT 2023:''' Niagara's scheduler is rebooting in 10 minutes for a short maintenance down time.<br />
<br />
'''Wed Jun 21, 2023, 11:25 AM EDT:''' Maintenance is finished and Teach cluster is accessible again.<br />
<br />
'''Tue Jun 20, 2023, 9:55 AM EDT:''' Teach cluster is powered off for maintenance.<br />
<br />
<!-- When removing system status entries, please archive them to: --><br />
[[Previous messages]]<br />
<br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH#SSH Keys|SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&diff=4251Niagara Quickstart2022-10-03T22:47:37Z<p>Bmundim: /* Specifications */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Niagara.jpg|center|300px|thumb]]<br />
|name=Niagara<br />
|installed=Jan 2018/March 2020<br />
|operatingsystem= CentOS 7.9 <br />
|loginnode= niagara.scinet.utoronto.ca<br />
|nnodes= 2,024 nodes (80,960 cores)<br />
|rampernode=188 GiB / 202 GB <br />
|corespernode=40 (80 hyperthreads)<br />
|interconnect=Mellanox Dragonfly+<br />
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)<br />
|queuetype=Slurm<br />
}}<br />
<br />
=Specifications=<br />
<br />
The Niagara cluster is a large cluster of 2,024 Lenovo SD530 servers each with 40 Intel "Skylake" cores at 2.4 GHz (1548 nodes) or 40 Intel "CascadeLake" cores at 2.5 GHz (476 nodes). <br />
The peak performance of the cluster is about 3.6 PFlops (6.25 PFlops theoretical). It was the 53rd fastest supercomputer on the [https://www.top500.org/list/2018/06/?page=1 TOP500 list of June 2018], and is at number 113 on the [https://www.top500.org/lists/top500/list/2021/06/ current list (June 2021)]. <br />
<br />
Each node of the cluster has 188 GiB / 202 GB RAM per node (at least 4 GiB/core for user jobs and roughly 170 GiB/node at most). Being designed for large parallel workloads, it has a fast interconnect consisting of EDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours and favours large jobs.<br />
<br />
* See the [https://www.youtube.com/watch?v=l-E2CFGh0BE&feature=youtu.be "Intro to Niagara"] recording<br />
<br />
More detailed hardware characteristics of the Niagara supercomputer can be found [https://docs.alliancecan.ca/wiki/Niagara on this page].<br />
<br />
Note: Documentation about the "GPU expansion to Niagara" called "Mist" can be found on [[Mist | its own page]].<br />
<br />
= Getting started on Niagara =<br />
<br />
Access to Niagara is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.<br />
<br />
If you have an active Alliance account but you do not have access to Niagara yet (e.g. because you are new to SciNet or belong to a group whose primary PI does not have an allocation as granted in the annual [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions {{Alliance}} RAC]), go to the [https://ccdb.computecanada.ca/services/opt_in opt-in page on the CCDB site]. After clicking the "Join" button, it usually takes only one or two business days for access to be granted. <br />
<br />
Please read this document carefully. The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource. If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].<br />
<br />
== Logging in ==<br />
<br />
Niagara runs CentOS 7, which is a type of Linux. You will need to be familiar with Linux systems to work on Niagara. If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&search=scmp101&include=all&filter=Filter Introduction to Linux Shell] class.<br />
<br />
As with all SciNet and {{Alliance}} compute systems, access to Niagara is done via [[SSH]] (secure shell) only. As of January 22 2022, authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.<br />
<br />
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Niagara login nodes with your {{Alliance}} credentials:<br />
<br />
$ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@niagara.scinet.utoronto.ca<br />
<br />
or<br />
<br />
$ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@niagara.computecanada.ca<br />
<br />
The first time you login to Niagara, please make sure you are actually accessing Niagara by checking if the login node ssh host key fingerprint matches [[SSH_Changes_in_May_2019 | (See here how)]]. This check prevents you from falling victim of [https://en.wikipedia.org/wiki/Man-in-the-middle_attack man-in-the-middle attacks.]<br />
<br />
* The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.<br />
* These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.<br />
* The optional <code>-Y</code> is needed to open windows from the Niagara command-line onto your local X server.<br />
* You can only connect 4 times in a 2-minute window to the login nodes. <br />
* To run on Niagara's compute nodes, you must [[#Submitting_jobs | submit a batch job]].<br />
<br />
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.<br />
<br />
== Your various directories ==<br />
<br />
By virtue of your access to Niagara you are granted storage space on the system. There are several directories available to you, each indicated by an associated environment variable.<br />
<br />
=== home and scratch ===<br />
<br />
You have a home and scratch directory on the system, the paths to which are stored in the environment variables $HOME and $SCRATCH. The locations are of the form<br />
<br />
$HOME=/home/g/groupname/myallianceusername<br />
$SCRATCH=/scratch/g/groupname/myallianceusername<br />
<br />
where groupname is the name of your PI's group, and myallianceusername is your {{Alliance}} username. For example:<br />
<br />
nia-login07:~$ pwd<br />
/home/s/scinet/rzon<br />
nia-login07:~$ cd $SCRATCH<br />
nia-login07:rzon$ pwd<br />
/scratch/s/scinet/rzon<br />
<br />
NOTE: home is read-only on compute nodes.<br />
<br />
=== project and archive/nearline ===<br />
<br />
Users from groups with [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC storage allocation] will also have a project directory and possible an archive (a.k.a. "nearline") directory, the paths to which are stored in the environment variables $PROJECT and $ARCHIVE. They follow the naming convention:<br />
<br />
$PROJECT=/project/g/groupname/myallianceusername<br />
$ARCHIVE=/archive/g/groupname/myallianceusername<br />
<br />
NOTE: Currently archive space is available only via [[HPSS]], and is not accessible on the Niagara login, compute, or datamover nodes.<br />
<br />
'''''IMPORTANT: Future-proof your scripts'''''<br />
<br />
When writing your scripts, use the environment variables (<tt>$HOME</tt>, <tt>$SCRATCH</tt>, <tt>$PROJECT</tt>, <tt>$ARCHIVE</tt>) instead of the actual paths! The paths may change in the future.<br />
<br />
=== Storage and quotas ===<br />
<br />
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use them. This table summarizes the various file systems. See the [[Data_Management | Data Management]] page for more details.<br />
<br />
{| class="wikitable"<br />
! location<br />
!colspan="2"| quota<br />
!align="right"| block size<br />
! expiration time<br />
! backed up<br />
! on login nodes<br />
! on compute nodes<br />
|-<br />
| $HOME<br />
|colspan="2"| 100 GB / 250,000 files per user<br />
|align="right"| 1 MB<br />
| <br />
| yes<br />
| yes<br />
| read-only<br />
|-<br />
|rowspan="2"| $SCRATCH<br />
|colspan="2"| 25 TB / 6,000,000 file per user<br />
|align="right" rowspan="2" | 16 MB<br />
|rowspan="2"| 2 months<br />
|rowspan="2"| no<br />
|rowspan="2"| yes<br />
|rowspan="2"| yes<br />
|-<br />
|align="right"|50-500TB per group<br />
|align="right"|[[Data_Management#Quotas_and_purging | depending on group size]]<br />
|-<br />
| $PROJECT<br />
|colspan="2"| by group allocation<br />
|align="right"| 16 MB<br />
| <br />
| yes<br />
| yes<br />
| yes<br />
|-<br />
| $ARCHIVE<br />
|colspan="2"| by group (nearline) allocation<br />
|align="right"| <br />
|<br />
| dual-copy<br />
| no<br />
| no<br />
|-<br />
| $BBUFFER<br />
|colspan="2"| 10 TB per user<br />
|align="right"| 1 MB<br />
| very short<br />
| no<br />
| yes<br />
| yes<br />
|}<br />
<br />
== Moving data to Niagara ==<br />
<br />
If you need to move data to Niagara for analysis, or when you need to move data off of Niagara, use the following guidelines:<br />
* If your data is less than 10GB, move the data using the login nodes.<br />
* If your data is greater than 10GB, move the data using the datamover nodes nia-datamover1.scinet.utoronto.ca and nia-datamover2.scinet.utoronto.ca .<br />
<br />
Details of how to use the datamover nodes can be found on the [[Data_Management#Moving_data | Data Management ]] page.<br />
<br />
= Loading software modules =<br />
<br />
You have two options for running code on Niagara: use existing software, or [[Niagara_Quickstart#Compiling_on_Niagara:_Example | compile your own]]. This section focuses on the former.<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
== Software stacks: NiaEnv and CCEnv ==<br />
<br />
On Niagara, there are two available software stacks:<br />
<br />
=== NiaEnv ===<br />
<br />
A [https://docs.scinet.utoronto.ca/index.php/Modules_specific_to_Niagara Niagara software stack] tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with<br />
<pre>module load NiaEnv</pre><br />
This loads the default (set of modules), which is currently the 2019b epoch. Before September 1, the default was NiaEnv/2018a. Users are encourage to use the 2019b stack, but to make sure old job scripts or older software installations in your home directory continue to work, you may need to use<br />
<pre>module load NiaEnv/2018a</pre><br />
You can override the system default for the epoch version by creating a file called <b><tt>.modulerc</tt></b> in your home directory with the line <b><tt>module-version NiaEnv/VERSION default</tt></b>, e.g. like so:<br />
<pre><br />
echo "module-version NiaEnv/2019b default" > $HOME/.modulerc<br />
</pre><br />
After this, subsequent logins and jobs will use the 2019b stack even when the system default is different.<br />
<p>Similarly, you can make an older epoch your personal default, like so<br />
<pre><br />
echo "module-version NiaEnv/2018a default" > $HOME/.modulerc<br />
</pre><br />
<br />
No modules are loaded by default on Niagara except NiaEnv.<br />
<br />
=== CCEnv ===<br />
<br />
The same [https://docs.alliancecan.ca/wiki/Modules software stack available on {{Alliance}}'s General Purpose clusters] too, with:<br />
<pre>module load CCEnv</pre><br />
Or, if you want the same default modules loaded as on Béluga, then do<br />
<pre>module load CCEnv StdEnv</pre><br />
or, if you want the same default modules loaded as on Cedar and Graham, do<br />
<pre>module load CCEnv arch/avx2 StdEnv/2020</pre><br />
<br />
== Tips for loading software ==<br />
<br />
* We advise '''''against''''' loading modules in your .bashrc. This can lead to very confusing behaviour under certain circumstances. Our guidelines for .bashrc files can be found [[bashrc guidelines|here]].<br />
* Instead, load modules by hand when needed, or by sourcing a separate script.<br />
* Load run-specific modules inside your job submission script.<br />
* Short names give default versions; e.g. <code>intel</code> → <code>intel/2018.2</code>. It is usually better to be explicit about the versions, for future reproducibility.<br />
* Modules often require other modules to be loaded first. Solve these dependencies by using [[Using_modules#Module_spider | <code>module spider</code>]].<br />
<br />
= Available compilers and interpreters =<br />
<br />
* For most compiled software, one should use the Intel compilers (<tt>icc</tt> for C, <tt>icpc</tt> for C++, and <tt>ifort</tt> for Fortran). Loading an <tt>intel</tt> module makes these available. <br />
* The GNU compiler suite (<tt>gcc, g++, gfortran</tt>) is also available, if you load one of the <tt>gcc</tt> modules.<br />
* To compile mpi code, you must additionally load an <tt>openmpi</tt> or <tt>intelmpi</tt> module.<br />
* Open source interpreted, interactive software is also available:<br />
** [[Python]]<br />
** [[R]]<br />
** Julia<br />
** [[Octave]]<br />
<br />
Please visit the corresponding page for details on using these tools. For information on running MATLAB applications on Niagara, visit [[MATLAB| this page]].<br />
<br />
= Using Commercial Software =<br />
<br />
May I use commercial software on Niagara?<br />
* Possibly, but you have to bring your own license for it. You can connect to an external license server using [[SSH_Tunneling | ssh tunneling]].<br />
* SciNet and {{the Alliance}} have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.<br />
* Thus, the only freely available commercial software installed on Niagara is software that can benefit everyone: Compilers, math libraries and debuggers.<br />
* That means no [[MATLAB]], Gaussian, IDL, <br />
* Open source alternatives like Octave, [[Python]], and [[R]] are available.<br />
* We are happy to help you to install commercial software for which you have a license.<br />
* In some cases, if you have a license, you can use software in the {{Alliance}} stack.<br />
The list of commercial software which is installed on Niagara, for which you will need a license to use, can be found on the [[Commercial_software | commercial software page]].<br />
<br />
= Compiling on Niagara: Example =<br />
<br />
Suppose one wants to compile an application from two c source files, appl.c and module.c, which use the Math Kernel Library. This is an example of how this would be done:<br />
<source lang="bash"><br />
nia-login07:~$ module load NiaEnv/2019b<br />
nia-login07:~$ module list<br />
Currently Loaded Modules:<br />
1) NiaEnv/2019b (S)<br />
Where:<br />
S: Module is Sticky, requires --force to unload or purge<br />
<br />
nia-login07:~$ module load intel/2019u4<br />
<br />
nia-login07:~$ ls<br />
appl.c module.c<br />
<br />
nia-login07:~$ icc -c -O3 -xHost -o appl.o appl.c<br />
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c<br />
nia-login07:~$ icc -o appl module.o appl.o -mkl<br />
<br />
nia-login07:~$ ./appl<br />
</source><br />
Note:<br />
* The optimization flags -O3 -xHost allow the Intel compiler to use instructions specific to the architecture CPU that is present (instead of for more generic x86_64 CPUs).<br />
* Linking with the Intel Math Kernel Library (MKL) is easy when using the intel compiler, it just requires the -mkl flags.<br />
* If compiling with gcc, the optimization flags would be -O3 -march=native. For the way to link with the MKL, it is suggested to use the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL link line advisor].<br />
<br />
= Testing and Debugging =<br />
<br />
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.<br />
* Small test jobs can be run on the login nodes. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than a couple of cores.<br />
* You can run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after <code>module load ddt</code>.<br />
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:<br />
nia-login07:~$ debugjob --clean N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 22 minutes. The <tt>--clean</tt> argument is optional but recommended as it will start the session without any modules loaded, thus mimicking more closely what happens when you submit a job script.<br />
<br />
Finally, if your debugjob process takes more than 1 hour, you can request an interactive job from the regular queue using the salloc command. Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.<br />
nia-login07:~$ salloc --nodes N --time=M:00:00 --x11<br />
where N is again the number of nodes, and M is the number of hours you wish the job to run.<br />
The <tt>--x11</tt> is required if you need to use graphics while testing your code through salloc, e.g. when using a debugger such as [[Parallel Debugging with DDT|DDT]] or DDD, See the [[Testing_With_Graphics | Testing with graphics]] page for the options in that case.<br />
<br />
= Submitting jobs =<br />
<br />
<!-- == Progressive approach to run jobs on niagara == --><br />
<!-- We would like to emphasize the need for users to adopt a more progressive and explicit approach for testing, running and scaling up of jobs on niagara. [[Progressive_Approach | '''Here is a set of steps we suggest that you follow.''']] --><br />
<br />
Once you have compiled and tested your code or workflow on the Niagara login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Niagara's 1548 compute nodes. When and where your job runs is determined by the scheduler.<br />
<br />
Niagara uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
nia-login07:scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course. Note that you must submit your job from a login node. You cannot submit jobs from the datamover nodes.<br />
<br />
In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
Jobs will run under your group's RRG allocation, or, if the your group has none, under a RAS allocation (previously called `default' allocation).<br />
<br />
Some example job scripts can be found below.<br />
<br />
Keep in mind:<br />
* Scheduling is by node, so in multiples of 40 cores.<br />
* Your job's maximum walltime is 24 hours. <br />
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access.<br />
* Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below). <br />
* [[Data_Management#Moving_data | Move your data]] to Niagara before you submit your job.<br />
<br />
== Scheduling by Node ==<br />
<br />
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node what resources should be allocated. On Niagara things are a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.<br />
** No other users are running anything on them.<br />
** You can [[SSH]] into them to see how things are going.<br />
* Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes and 202GB is the amount of memory on the node.<br />
* If you run serial jobs you must still use all 40 cores on the node. Visit the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page for examples of how to do this.<br />
* Since there are 40 cores per node, your job should use N x 40 cores. If you do not, we will contact you to help you optimize your workflow. Or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the job runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Limit on Running jobs<br />
!Limit on Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 50 || 1000 || 1 node (40&nbsp;cores) || default:&nbsp;20&nbsp;nodes&nbsp;(800&nbsp;cores) <br> with&nbsp;allocation:&nbsp;1000&nbsp;nodes&nbsp;(40000&nbsp;cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40&nbsp;cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== File Input/Output Tips ==<br />
<br />
It is important to understand the file systems, so as to perform your file I/O (Input/Output) responsibly. Refer to the [[Data_Management | Data Management]] page for details about the file systems.<br />
* Your files can be seen on all Niagara login and compute nodes.<br />
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.<br />
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.<br />
* Accessing data sets which consist of many, small files leads to poor performance on GPFS.<br />
* Avoid reading and writing lots of small amounts of data to disk. Many small files on the system waste space and are slower to access, read and write. If you must write many small files, use [[User_Ramdisk | ramdisk]].<br />
* Write data out in a binary format. This is faster and takes less space.<br />
* The [[Burst Buffer]] is another option for I/O heavy-jobs and for speeding up [[Checkpoints|checkpoints]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load NiaEnv/2019b<br />
module load intel/2019u4<br />
module load openmpi/4.0.1<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:scratch$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li>First line indicates that this is a bash script.</li><br />
<li>Lines starting with <code>#SBATCH</code> go to SLURM.</li><br />
<li>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</li><br />
<li>In this case, SLURM looks for 2 nodes each running 40 tasks (for a total of 80 tasks), for 1 hour</li><br />
<li>Note that the mpifun flag "--ppn" (processors per node) is ignored.</li><br />
<li>Once it found such a node, it runs the script:<br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application (SLURM will inform mpirun or srun on how many processes to run).<br />
</li><br />
</ul><br />
<li>To use hyperthreading, just change <code>--ntasks-per-node=40</code> to <code>--ntasks-per-node=80</code>, and add <code>--bind-to none</code> to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load NiaEnv/2019b<br />
module load intel/2019u4<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.<br />
<br />
== Monitoring queued jobs ==<br />
<br />
Once the job is incorporated into the queue, there are some commands you can use to monitor its progress.<br />
<br />
<ul><br />
<li><p><code>squeue</code> or <code>sqc</code> (a caching version of squeue) to show the job queue (<code>squeue -u $USER</code> for just your jobs);</p></li><br />
<li><p><code>squeue -j JOBID</code> to get information on a specific job</p><br />
<p>(alternatively, <code>scontrol show job JOBID</code>, which is more verbose).</p></li><br />
<li><p><code>squeue --start -j JOBID</code> to get an estimate for when a job will run; these tend not to be very accurate predictions.</p></li><br />
<li><p><code>scancel -i JOBID</code> to cancel the job.</p></li><br />
<li><p><code>jobperf JOBID</code> to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.</p></li><br />
<li><p><code>sacct</code> to get information on your recent jobs.</p></li><br />
</ul><br />
<br />
Further instructions for monitoring your jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]]. The [https://my.scinet.utoronto.ca my.SciNet] site is also a very useful tool for monitoring your current and past usage.<br />
<br />
= Visualization =<br />
Information about how to use visualization tools on Niagara is available on [[Visualization]] page.<br />
<br />
= Support =<br />
<br />
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]<br />
* [mailto:niagara@computecanada.ca niagara@computecanada.ca]</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&diff=4248Niagara Quickstart2022-10-03T22:46:47Z<p>Bmundim: /* Specifications */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Niagara.jpg|center|300px|thumb]]<br />
|name=Niagara<br />
|installed=Jan 2018/March 2020<br />
|operatingsystem= CentOS 7.9 <br />
|loginnode= niagara.scinet.utoronto.ca<br />
|nnodes= 2,024 nodes (80,960 cores)<br />
|rampernode=188 GiB / 202 GB <br />
|corespernode=40 (80 hyperthreads)<br />
|interconnect=Mellanox Dragonfly+<br />
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)<br />
|queuetype=Slurm<br />
}}<br />
<br />
=Specifications=<br />
<br />
The Niagara cluster is a large cluster of 2,024 Lenovo SD530 servers each with 40 Intel "Skylake" cores at 2.4 GHz (1548 nodes) or 40 Intel "CascadeLake" cores at 2.5 GHz (476 nodes). <br />
The peak performance of the cluster is about 3.6 PFlops (6.25 PFlops theoretical). It was the 53rd fastest supercomputer on the [https://www.top500.org/list/2018/06/?page=1 TOP500 list of June 2018], and is at number 113 on the [https://www.top500.org/lists/top500/list/2021/06/ current list (June 2021)]. <br />
<br />
Each node of the cluster has 188 GiB / 202 GB RAM per node (at least 4 GiB/core for user jobs and at most 170 GiB/node roughly). Being designed for large parallel workloads, it has a fast interconnect consisting of EDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours and favours large jobs.<br />
<br />
* See the [https://www.youtube.com/watch?v=l-E2CFGh0BE&feature=youtu.be "Intro to Niagara"] recording<br />
<br />
More detailed hardware characteristics of the Niagara supercomputer can be found [https://docs.alliancecan.ca/wiki/Niagara on this page].<br />
<br />
Note: Documentation about the "GPU expansion to Niagara" called "Mist" can be found on [[Mist | its own page]].<br />
<br />
= Getting started on Niagara =<br />
<br />
Access to Niagara is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.<br />
<br />
If you have an active Alliance account but you do not have access to Niagara yet (e.g. because you are new to SciNet or belong to a group whose primary PI does not have an allocation as granted in the annual [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions {{Alliance}} RAC]), go to the [https://ccdb.computecanada.ca/services/opt_in opt-in page on the CCDB site]. After clicking the "Join" button, it usually takes only one or two business days for access to be granted. <br />
<br />
Please read this document carefully. The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource. If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].<br />
<br />
== Logging in ==<br />
<br />
Niagara runs CentOS 7, which is a type of Linux. You will need to be familiar with Linux systems to work on Niagara. If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&search=scmp101&include=all&filter=Filter Introduction to Linux Shell] class.<br />
<br />
As with all SciNet and {{Alliance}} compute systems, access to Niagara is done via [[SSH]] (secure shell) only. As of January 22 2022, authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.<br />
<br />
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Niagara login nodes with your {{Alliance}} credentials:<br />
<br />
$ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@niagara.scinet.utoronto.ca<br />
<br />
or<br />
<br />
$ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@niagara.computecanada.ca<br />
<br />
The first time you login to Niagara, please make sure you are actually accessing Niagara by checking if the login node ssh host key fingerprint matches [[SSH_Changes_in_May_2019 | (See here how)]]. This check prevents you from falling victim of [https://en.wikipedia.org/wiki/Man-in-the-middle_attack man-in-the-middle attacks.]<br />
<br />
* The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.<br />
* These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.<br />
* The optional <code>-Y</code> is needed to open windows from the Niagara command-line onto your local X server.<br />
* You can only connect 4 times in a 2-minute window to the login nodes. <br />
* To run on Niagara's compute nodes, you must [[#Submitting_jobs | submit a batch job]].<br />
<br />
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.<br />
<br />
== Your various directories ==<br />
<br />
By virtue of your access to Niagara you are granted storage space on the system. There are several directories available to you, each indicated by an associated environment variable.<br />
<br />
=== home and scratch ===<br />
<br />
You have a home and scratch directory on the system, the paths to which are stored in the environment variables $HOME and $SCRATCH. The locations are of the form<br />
<br />
$HOME=/home/g/groupname/myallianceusername<br />
$SCRATCH=/scratch/g/groupname/myallianceusername<br />
<br />
where groupname is the name of your PI's group, and myallianceusername is your {{Alliance}} username. For example:<br />
<br />
nia-login07:~$ pwd<br />
/home/s/scinet/rzon<br />
nia-login07:~$ cd $SCRATCH<br />
nia-login07:rzon$ pwd<br />
/scratch/s/scinet/rzon<br />
<br />
NOTE: home is read-only on compute nodes.<br />
<br />
=== project and archive/nearline ===<br />
<br />
Users from groups with [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC storage allocation] will also have a project directory and possible an archive (a.k.a. "nearline") directory, the paths to which are stored in the environment variables $PROJECT and $ARCHIVE. They follow the naming convention:<br />
<br />
$PROJECT=/project/g/groupname/myallianceusername<br />
$ARCHIVE=/archive/g/groupname/myallianceusername<br />
<br />
NOTE: Currently archive space is available only via [[HPSS]], and is not accessible on the Niagara login, compute, or datamover nodes.<br />
<br />
'''''IMPORTANT: Future-proof your scripts'''''<br />
<br />
When writing your scripts, use the environment variables (<tt>$HOME</tt>, <tt>$SCRATCH</tt>, <tt>$PROJECT</tt>, <tt>$ARCHIVE</tt>) instead of the actual paths! The paths may change in the future.<br />
<br />
=== Storage and quotas ===<br />
<br />
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use them. This table summarizes the various file systems. See the [[Data_Management | Data Management]] page for more details.<br />
<br />
{| class="wikitable"<br />
! location<br />
!colspan="2"| quota<br />
!align="right"| block size<br />
! expiration time<br />
! backed up<br />
! on login nodes<br />
! on compute nodes<br />
|-<br />
| $HOME<br />
|colspan="2"| 100 GB / 250,000 files per user<br />
|align="right"| 1 MB<br />
| <br />
| yes<br />
| yes<br />
| read-only<br />
|-<br />
|rowspan="2"| $SCRATCH<br />
|colspan="2"| 25 TB / 6,000,000 file per user<br />
|align="right" rowspan="2" | 16 MB<br />
|rowspan="2"| 2 months<br />
|rowspan="2"| no<br />
|rowspan="2"| yes<br />
|rowspan="2"| yes<br />
|-<br />
|align="right"|50-500TB per group<br />
|align="right"|[[Data_Management#Quotas_and_purging | depending on group size]]<br />
|-<br />
| $PROJECT<br />
|colspan="2"| by group allocation<br />
|align="right"| 16 MB<br />
| <br />
| yes<br />
| yes<br />
| yes<br />
|-<br />
| $ARCHIVE<br />
|colspan="2"| by group (nearline) allocation<br />
|align="right"| <br />
|<br />
| dual-copy<br />
| no<br />
| no<br />
|-<br />
| $BBUFFER<br />
|colspan="2"| 10 TB per user<br />
|align="right"| 1 MB<br />
| very short<br />
| no<br />
| yes<br />
| yes<br />
|}<br />
<br />
== Moving data to Niagara ==<br />
<br />
If you need to move data to Niagara for analysis, or when you need to move data off of Niagara, use the following guidelines:<br />
* If your data is less than 10GB, move the data using the login nodes.<br />
* If your data is greater than 10GB, move the data using the datamover nodes nia-datamover1.scinet.utoronto.ca and nia-datamover2.scinet.utoronto.ca .<br />
<br />
Details of how to use the datamover nodes can be found on the [[Data_Management#Moving_data | Data Management ]] page.<br />
<br />
= Loading software modules =<br />
<br />
You have two options for running code on Niagara: use existing software, or [[Niagara_Quickstart#Compiling_on_Niagara:_Example | compile your own]]. This section focuses on the former.<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
== Software stacks: NiaEnv and CCEnv ==<br />
<br />
On Niagara, there are two available software stacks:<br />
<br />
=== NiaEnv ===<br />
<br />
A [https://docs.scinet.utoronto.ca/index.php/Modules_specific_to_Niagara Niagara software stack] tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with<br />
<pre>module load NiaEnv</pre><br />
This loads the default (set of modules), which is currently the 2019b epoch. Before September 1, the default was NiaEnv/2018a. Users are encourage to use the 2019b stack, but to make sure old job scripts or older software installations in your home directory continue to work, you may need to use<br />
<pre>module load NiaEnv/2018a</pre><br />
You can override the system default for the epoch version by creating a file called <b><tt>.modulerc</tt></b> in your home directory with the line <b><tt>module-version NiaEnv/VERSION default</tt></b>, e.g. like so:<br />
<pre><br />
echo "module-version NiaEnv/2019b default" > $HOME/.modulerc<br />
</pre><br />
After this, subsequent logins and jobs will use the 2019b stack even when the system default is different.<br />
<p>Similarly, you can make an older epoch your personal default, like so<br />
<pre><br />
echo "module-version NiaEnv/2018a default" > $HOME/.modulerc<br />
</pre><br />
<br />
No modules are loaded by default on Niagara except NiaEnv.<br />
<br />
=== CCEnv ===<br />
<br />
The same [https://docs.alliancecan.ca/wiki/Modules software stack available on {{Alliance}}'s General Purpose clusters] too, with:<br />
<pre>module load CCEnv</pre><br />
Or, if you want the same default modules loaded as on Béluga, then do<br />
<pre>module load CCEnv StdEnv</pre><br />
or, if you want the same default modules loaded as on Cedar and Graham, do<br />
<pre>module load CCEnv arch/avx2 StdEnv/2020</pre><br />
<br />
== Tips for loading software ==<br />
<br />
* We advise '''''against''''' loading modules in your .bashrc. This can lead to very confusing behaviour under certain circumstances. Our guidelines for .bashrc files can be found [[bashrc guidelines|here]].<br />
* Instead, load modules by hand when needed, or by sourcing a separate script.<br />
* Load run-specific modules inside your job submission script.<br />
* Short names give default versions; e.g. <code>intel</code> → <code>intel/2018.2</code>. It is usually better to be explicit about the versions, for future reproducibility.<br />
* Modules often require other modules to be loaded first. Solve these dependencies by using [[Using_modules#Module_spider | <code>module spider</code>]].<br />
<br />
= Available compilers and interpreters =<br />
<br />
* For most compiled software, one should use the Intel compilers (<tt>icc</tt> for C, <tt>icpc</tt> for C++, and <tt>ifort</tt> for Fortran). Loading an <tt>intel</tt> module makes these available. <br />
* The GNU compiler suite (<tt>gcc, g++, gfortran</tt>) is also available, if you load one of the <tt>gcc</tt> modules.<br />
* To compile mpi code, you must additionally load an <tt>openmpi</tt> or <tt>intelmpi</tt> module.<br />
* Open source interpreted, interactive software is also available:<br />
** [[Python]]<br />
** [[R]]<br />
** Julia<br />
** [[Octave]]<br />
<br />
Please visit the corresponding page for details on using these tools. For information on running MATLAB applications on Niagara, visit [[MATLAB| this page]].<br />
<br />
= Using Commercial Software =<br />
<br />
May I use commercial software on Niagara?<br />
* Possibly, but you have to bring your own license for it. You can connect to an external license server using [[SSH_Tunneling | ssh tunneling]].<br />
* SciNet and {{the Alliance}} have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.<br />
* Thus, the only freely available commercial software installed on Niagara is software that can benefit everyone: Compilers, math libraries and debuggers.<br />
* That means no [[MATLAB]], Gaussian, IDL, <br />
* Open source alternatives like Octave, [[Python]], and [[R]] are available.<br />
* We are happy to help you to install commercial software for which you have a license.<br />
* In some cases, if you have a license, you can use software in the {{Alliance}} stack.<br />
The list of commercial software which is installed on Niagara, for which you will need a license to use, can be found on the [[Commercial_software | commercial software page]].<br />
<br />
= Compiling on Niagara: Example =<br />
<br />
Suppose one wants to compile an application from two c source files, appl.c and module.c, which use the Math Kernel Library. This is an example of how this would be done:<br />
<source lang="bash"><br />
nia-login07:~$ module load NiaEnv/2019b<br />
nia-login07:~$ module list<br />
Currently Loaded Modules:<br />
1) NiaEnv/2019b (S)<br />
Where:<br />
S: Module is Sticky, requires --force to unload or purge<br />
<br />
nia-login07:~$ module load intel/2019u4<br />
<br />
nia-login07:~$ ls<br />
appl.c module.c<br />
<br />
nia-login07:~$ icc -c -O3 -xHost -o appl.o appl.c<br />
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c<br />
nia-login07:~$ icc -o appl module.o appl.o -mkl<br />
<br />
nia-login07:~$ ./appl<br />
</source><br />
Note:<br />
* The optimization flags -O3 -xHost allow the Intel compiler to use instructions specific to the architecture CPU that is present (instead of for more generic x86_64 CPUs).<br />
* Linking with the Intel Math Kernel Library (MKL) is easy when using the intel compiler, it just requires the -mkl flags.<br />
* If compiling with gcc, the optimization flags would be -O3 -march=native. For the way to link with the MKL, it is suggested to use the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL link line advisor].<br />
<br />
= Testing and Debugging =<br />
<br />
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.<br />
* Small test jobs can be run on the login nodes. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than a couple of cores.<br />
* You can run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after <code>module load ddt</code>.<br />
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:<br />
nia-login07:~$ debugjob --clean N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 22 minutes. The <tt>--clean</tt> argument is optional but recommended as it will start the session without any modules loaded, thus mimicking more closely what happens when you submit a job script.<br />
<br />
Finally, if your debugjob process takes more than 1 hour, you can request an interactive job from the regular queue using the salloc command. Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.<br />
nia-login07:~$ salloc --nodes N --time=M:00:00 --x11<br />
where N is again the number of nodes, and M is the number of hours you wish the job to run.<br />
The <tt>--x11</tt> is required if you need to use graphics while testing your code through salloc, e.g. when using a debugger such as [[Parallel Debugging with DDT|DDT]] or DDD, See the [[Testing_With_Graphics | Testing with graphics]] page for the options in that case.<br />
<br />
= Submitting jobs =<br />
<br />
<!-- == Progressive approach to run jobs on niagara == --><br />
<!-- We would like to emphasize the need for users to adopt a more progressive and explicit approach for testing, running and scaling up of jobs on niagara. [[Progressive_Approach | '''Here is a set of steps we suggest that you follow.''']] --><br />
<br />
Once you have compiled and tested your code or workflow on the Niagara login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Niagara's 1548 compute nodes. When and where your job runs is determined by the scheduler.<br />
<br />
Niagara uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
nia-login07:scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course. Note that you must submit your job from a login node. You cannot submit jobs from the datamover nodes.<br />
<br />
In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
Jobs will run under your group's RRG allocation, or, if the your group has none, under a RAS allocation (previously called `default' allocation).<br />
<br />
Some example job scripts can be found below.<br />
<br />
Keep in mind:<br />
* Scheduling is by node, so in multiples of 40 cores.<br />
* Your job's maximum walltime is 24 hours. <br />
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access.<br />
* Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below). <br />
* [[Data_Management#Moving_data | Move your data]] to Niagara before you submit your job.<br />
<br />
== Scheduling by Node ==<br />
<br />
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node what resources should be allocated. On Niagara things are a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.<br />
** No other users are running anything on them.<br />
** You can [[SSH]] into them to see how things are going.<br />
* Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes and 202GB is the amount of memory on the node.<br />
* If you run serial jobs you must still use all 40 cores on the node. Visit the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page for examples of how to do this.<br />
* Since there are 40 cores per node, your job should use N x 40 cores. If you do not, we will contact you to help you optimize your workflow. Or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the job runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Limit on Running jobs<br />
!Limit on Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 50 || 1000 || 1 node (40&nbsp;cores) || default:&nbsp;20&nbsp;nodes&nbsp;(800&nbsp;cores) <br> with&nbsp;allocation:&nbsp;1000&nbsp;nodes&nbsp;(40000&nbsp;cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40&nbsp;cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== File Input/Output Tips ==<br />
<br />
It is important to understand the file systems, so as to perform your file I/O (Input/Output) responsibly. Refer to the [[Data_Management | Data Management]] page for details about the file systems.<br />
* Your files can be seen on all Niagara login and compute nodes.<br />
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.<br />
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.<br />
* Accessing data sets which consist of many, small files leads to poor performance on GPFS.<br />
* Avoid reading and writing lots of small amounts of data to disk. Many small files on the system waste space and are slower to access, read and write. If you must write many small files, use [[User_Ramdisk | ramdisk]].<br />
* Write data out in a binary format. This is faster and takes less space.<br />
* The [[Burst Buffer]] is another option for I/O heavy-jobs and for speeding up [[Checkpoints|checkpoints]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load NiaEnv/2019b<br />
module load intel/2019u4<br />
module load openmpi/4.0.1<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:scratch$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li>First line indicates that this is a bash script.</li><br />
<li>Lines starting with <code>#SBATCH</code> go to SLURM.</li><br />
<li>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</li><br />
<li>In this case, SLURM looks for 2 nodes each running 40 tasks (for a total of 80 tasks), for 1 hour</li><br />
<li>Note that the mpifun flag "--ppn" (processors per node) is ignored.</li><br />
<li>Once it found such a node, it runs the script:<br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application (SLURM will inform mpirun or srun on how many processes to run).<br />
</li><br />
</ul><br />
<li>To use hyperthreading, just change <code>--ntasks-per-node=40</code> to <code>--ntasks-per-node=80</code>, and add <code>--bind-to none</code> to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load NiaEnv/2019b<br />
module load intel/2019u4<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.<br />
<br />
== Monitoring queued jobs ==<br />
<br />
Once the job is incorporated into the queue, there are some commands you can use to monitor its progress.<br />
<br />
<ul><br />
<li><p><code>squeue</code> or <code>sqc</code> (a caching version of squeue) to show the job queue (<code>squeue -u $USER</code> for just your jobs);</p></li><br />
<li><p><code>squeue -j JOBID</code> to get information on a specific job</p><br />
<p>(alternatively, <code>scontrol show job JOBID</code>, which is more verbose).</p></li><br />
<li><p><code>squeue --start -j JOBID</code> to get an estimate for when a job will run; these tend not to be very accurate predictions.</p></li><br />
<li><p><code>scancel -i JOBID</code> to cancel the job.</p></li><br />
<li><p><code>jobperf JOBID</code> to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.</p></li><br />
<li><p><code>sacct</code> to get information on your recent jobs.</p></li><br />
</ul><br />
<br />
Further instructions for monitoring your jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]]. The [https://my.scinet.utoronto.ca my.SciNet] site is also a very useful tool for monitoring your current and past usage.<br />
<br />
= Visualization =<br />
Information about how to use visualization tools on Niagara is available on [[Visualization]] page.<br />
<br />
= Support =<br />
<br />
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]<br />
* [mailto:niagara@computecanada.ca niagara@computecanada.ca]</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&diff=4245Niagara Quickstart2022-10-03T22:45:04Z<p>Bmundim: </p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Niagara.jpg|center|300px|thumb]]<br />
|name=Niagara<br />
|installed=Jan 2018/March 2020<br />
|operatingsystem= CentOS 7.9 <br />
|loginnode= niagara.scinet.utoronto.ca<br />
|nnodes= 2,024 nodes (80,960 cores)<br />
|rampernode=188 GiB / 202 GB <br />
|corespernode=40 (80 hyperthreads)<br />
|interconnect=Mellanox Dragonfly+<br />
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)<br />
|queuetype=Slurm<br />
}}<br />
<br />
=Specifications=<br />
<br />
The Niagara cluster is a large cluster of 2,024 Lenovo SD530 servers each with 40 Intel "Skylake" cores at 2.4 GHz or 40 Intel "CascadeLake" cores at 2.5 GHz. <br />
The peak performance of the cluster is about 3.6 PFlops (6.25 PFlops theoretical). It was the 53rd fastest supercomputer on the [https://www.top500.org/list/2018/06/?page=1 TOP500 list of June 2018], and is at number 113 on the [https://www.top500.org/lists/top500/list/2021/06/ current list (June 2021)]. <br />
<br />
Each node of the cluster has 188 GiB / 202 GB RAM per node (at least 4 GiB/core for user jobs). Being designed for large parallel workloads, it has a fast interconnect consisting of EDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours and favours large jobs.<br />
<br />
* See the [https://www.youtube.com/watch?v=l-E2CFGh0BE&feature=youtu.be "Intro to Niagara"] recording<br />
<br />
More detailed hardware characteristics of the Niagara supercomputer can be found [https://docs.alliancecan.ca/wiki/Niagara on this page].<br />
<br />
Note: Documentation about the "GPU expansion to Niagara" called "Mist" can be found on [[Mist | its own page]].<br />
<br />
= Getting started on Niagara =<br />
<br />
Access to Niagara is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.<br />
<br />
If you have an active Alliance account but you do not have access to Niagara yet (e.g. because you are new to SciNet or belong to a group whose primary PI does not have an allocation as granted in the annual [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions {{Alliance}} RAC]), go to the [https://ccdb.computecanada.ca/services/opt_in opt-in page on the CCDB site]. After clicking the "Join" button, it usually takes only one or two business days for access to be granted. <br />
<br />
Please read this document carefully. The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource. If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].<br />
<br />
== Logging in ==<br />
<br />
Niagara runs CentOS 7, which is a type of Linux. You will need to be familiar with Linux systems to work on Niagara. If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&search=scmp101&include=all&filter=Filter Introduction to Linux Shell] class.<br />
<br />
As with all SciNet and {{Alliance}} compute systems, access to Niagara is done via [[SSH]] (secure shell) only. As of January 22 2022, authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.<br />
<br />
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Niagara login nodes with your {{Alliance}} credentials:<br />
<br />
$ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@niagara.scinet.utoronto.ca<br />
<br />
or<br />
<br />
$ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@niagara.computecanada.ca<br />
<br />
The first time you login to Niagara, please make sure you are actually accessing Niagara by checking if the login node ssh host key fingerprint matches [[SSH_Changes_in_May_2019 | (See here how)]]. This check prevents you from falling victim of [https://en.wikipedia.org/wiki/Man-in-the-middle_attack man-in-the-middle attacks.]<br />
<br />
* The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.<br />
* These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.<br />
* The optional <code>-Y</code> is needed to open windows from the Niagara command-line onto your local X server.<br />
* You can only connect 4 times in a 2-minute window to the login nodes. <br />
* To run on Niagara's compute nodes, you must [[#Submitting_jobs | submit a batch job]].<br />
<br />
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.<br />
<br />
== Your various directories ==<br />
<br />
By virtue of your access to Niagara you are granted storage space on the system. There are several directories available to you, each indicated by an associated environment variable.<br />
<br />
=== home and scratch ===<br />
<br />
You have a home and scratch directory on the system, the paths to which are stored in the environment variables $HOME and $SCRATCH. The locations are of the form<br />
<br />
$HOME=/home/g/groupname/myallianceusername<br />
$SCRATCH=/scratch/g/groupname/myallianceusername<br />
<br />
where groupname is the name of your PI's group, and myallianceusername is your {{Alliance}} username. For example:<br />
<br />
nia-login07:~$ pwd<br />
/home/s/scinet/rzon<br />
nia-login07:~$ cd $SCRATCH<br />
nia-login07:rzon$ pwd<br />
/scratch/s/scinet/rzon<br />
<br />
NOTE: home is read-only on compute nodes.<br />
<br />
=== project and archive/nearline ===<br />
<br />
Users from groups with [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC storage allocation] will also have a project directory and possible an archive (a.k.a. "nearline") directory, the paths to which are stored in the environment variables $PROJECT and $ARCHIVE. They follow the naming convention:<br />
<br />
$PROJECT=/project/g/groupname/myallianceusername<br />
$ARCHIVE=/archive/g/groupname/myallianceusername<br />
<br />
NOTE: Currently archive space is available only via [[HPSS]], and is not accessible on the Niagara login, compute, or datamover nodes.<br />
<br />
'''''IMPORTANT: Future-proof your scripts'''''<br />
<br />
When writing your scripts, use the environment variables (<tt>$HOME</tt>, <tt>$SCRATCH</tt>, <tt>$PROJECT</tt>, <tt>$ARCHIVE</tt>) instead of the actual paths! The paths may change in the future.<br />
<br />
=== Storage and quotas ===<br />
<br />
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use them. This table summarizes the various file systems. See the [[Data_Management | Data Management]] page for more details.<br />
<br />
{| class="wikitable"<br />
! location<br />
!colspan="2"| quota<br />
!align="right"| block size<br />
! expiration time<br />
! backed up<br />
! on login nodes<br />
! on compute nodes<br />
|-<br />
| $HOME<br />
|colspan="2"| 100 GB / 250,000 files per user<br />
|align="right"| 1 MB<br />
| <br />
| yes<br />
| yes<br />
| read-only<br />
|-<br />
|rowspan="2"| $SCRATCH<br />
|colspan="2"| 25 TB / 6,000,000 file per user<br />
|align="right" rowspan="2" | 16 MB<br />
|rowspan="2"| 2 months<br />
|rowspan="2"| no<br />
|rowspan="2"| yes<br />
|rowspan="2"| yes<br />
|-<br />
|align="right"|50-500TB per group<br />
|align="right"|[[Data_Management#Quotas_and_purging | depending on group size]]<br />
|-<br />
| $PROJECT<br />
|colspan="2"| by group allocation<br />
|align="right"| 16 MB<br />
| <br />
| yes<br />
| yes<br />
| yes<br />
|-<br />
| $ARCHIVE<br />
|colspan="2"| by group (nearline) allocation<br />
|align="right"| <br />
|<br />
| dual-copy<br />
| no<br />
| no<br />
|-<br />
| $BBUFFER<br />
|colspan="2"| 10 TB per user<br />
|align="right"| 1 MB<br />
| very short<br />
| no<br />
| yes<br />
| yes<br />
|}<br />
<br />
== Moving data to Niagara ==<br />
<br />
If you need to move data to Niagara for analysis, or when you need to move data off of Niagara, use the following guidelines:<br />
* If your data is less than 10GB, move the data using the login nodes.<br />
* If your data is greater than 10GB, move the data using the datamover nodes nia-datamover1.scinet.utoronto.ca and nia-datamover2.scinet.utoronto.ca .<br />
<br />
Details of how to use the datamover nodes can be found on the [[Data_Management#Moving_data | Data Management ]] page.<br />
<br />
= Loading software modules =<br />
<br />
You have two options for running code on Niagara: use existing software, or [[Niagara_Quickstart#Compiling_on_Niagara:_Example | compile your own]]. This section focuses on the former.<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
== Software stacks: NiaEnv and CCEnv ==<br />
<br />
On Niagara, there are two available software stacks:<br />
<br />
=== NiaEnv ===<br />
<br />
A [https://docs.scinet.utoronto.ca/index.php/Modules_specific_to_Niagara Niagara software stack] tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with<br />
<pre>module load NiaEnv</pre><br />
This loads the default (set of modules), which is currently the 2019b epoch. Before September 1, the default was NiaEnv/2018a. Users are encourage to use the 2019b stack, but to make sure old job scripts or older software installations in your home directory continue to work, you may need to use<br />
<pre>module load NiaEnv/2018a</pre><br />
You can override the system default for the epoch version by creating a file called <b><tt>.modulerc</tt></b> in your home directory with the line <b><tt>module-version NiaEnv/VERSION default</tt></b>, e.g. like so:<br />
<pre><br />
echo "module-version NiaEnv/2019b default" > $HOME/.modulerc<br />
</pre><br />
After this, subsequent logins and jobs will use the 2019b stack even when the system default is different.<br />
<p>Similarly, you can make an older epoch your personal default, like so<br />
<pre><br />
echo "module-version NiaEnv/2018a default" > $HOME/.modulerc<br />
</pre><br />
<br />
No modules are loaded by default on Niagara except NiaEnv.<br />
<br />
=== CCEnv ===<br />
<br />
The same [https://docs.alliancecan.ca/wiki/Modules software stack available on {{Alliance}}'s General Purpose clusters] too, with:<br />
<pre>module load CCEnv</pre><br />
Or, if you want the same default modules loaded as on Béluga, then do<br />
<pre>module load CCEnv StdEnv</pre><br />
or, if you want the same default modules loaded as on Cedar and Graham, do<br />
<pre>module load CCEnv arch/avx2 StdEnv/2020</pre><br />
<br />
== Tips for loading software ==<br />
<br />
* We advise '''''against''''' loading modules in your .bashrc. This can lead to very confusing behaviour under certain circumstances. Our guidelines for .bashrc files can be found [[bashrc guidelines|here]].<br />
* Instead, load modules by hand when needed, or by sourcing a separate script.<br />
* Load run-specific modules inside your job submission script.<br />
* Short names give default versions; e.g. <code>intel</code> → <code>intel/2018.2</code>. It is usually better to be explicit about the versions, for future reproducibility.<br />
* Modules often require other modules to be loaded first. Solve these dependencies by using [[Using_modules#Module_spider | <code>module spider</code>]].<br />
<br />
= Available compilers and interpreters =<br />
<br />
* For most compiled software, one should use the Intel compilers (<tt>icc</tt> for C, <tt>icpc</tt> for C++, and <tt>ifort</tt> for Fortran). Loading an <tt>intel</tt> module makes these available. <br />
* The GNU compiler suite (<tt>gcc, g++, gfortran</tt>) is also available, if you load one of the <tt>gcc</tt> modules.<br />
* To compile mpi code, you must additionally load an <tt>openmpi</tt> or <tt>intelmpi</tt> module.<br />
* Open source interpreted, interactive software is also available:<br />
** [[Python]]<br />
** [[R]]<br />
** Julia<br />
** [[Octave]]<br />
<br />
Please visit the corresponding page for details on using these tools. For information on running MATLAB applications on Niagara, visit [[MATLAB| this page]].<br />
<br />
= Using Commercial Software =<br />
<br />
May I use commercial software on Niagara?<br />
* Possibly, but you have to bring your own license for it. You can connect to an external license server using [[SSH_Tunneling | ssh tunneling]].<br />
* SciNet and {{the Alliance}} have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.<br />
* Thus, the only freely available commercial software installed on Niagara is software that can benefit everyone: Compilers, math libraries and debuggers.<br />
* That means no [[MATLAB]], Gaussian, IDL, <br />
* Open source alternatives like Octave, [[Python]], and [[R]] are available.<br />
* We are happy to help you to install commercial software for which you have a license.<br />
* In some cases, if you have a license, you can use software in the {{Alliance}} stack.<br />
The list of commercial software which is installed on Niagara, for which you will need a license to use, can be found on the [[Commercial_software | commercial software page]].<br />
<br />
= Compiling on Niagara: Example =<br />
<br />
Suppose one wants to compile an application from two c source files, appl.c and module.c, which use the Math Kernel Library. This is an example of how this would be done:<br />
<source lang="bash"><br />
nia-login07:~$ module load NiaEnv/2019b<br />
nia-login07:~$ module list<br />
Currently Loaded Modules:<br />
1) NiaEnv/2019b (S)<br />
Where:<br />
S: Module is Sticky, requires --force to unload or purge<br />
<br />
nia-login07:~$ module load intel/2019u4<br />
<br />
nia-login07:~$ ls<br />
appl.c module.c<br />
<br />
nia-login07:~$ icc -c -O3 -xHost -o appl.o appl.c<br />
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c<br />
nia-login07:~$ icc -o appl module.o appl.o -mkl<br />
<br />
nia-login07:~$ ./appl<br />
</source><br />
Note:<br />
* The optimization flags -O3 -xHost allow the Intel compiler to use instructions specific to the architecture CPU that is present (instead of for more generic x86_64 CPUs).<br />
* Linking with the Intel Math Kernel Library (MKL) is easy when using the intel compiler, it just requires the -mkl flags.<br />
* If compiling with gcc, the optimization flags would be -O3 -march=native. For the way to link with the MKL, it is suggested to use the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL link line advisor].<br />
<br />
= Testing and Debugging =<br />
<br />
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.<br />
* Small test jobs can be run on the login nodes. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than a couple of cores.<br />
* You can run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after <code>module load ddt</code>.<br />
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:<br />
nia-login07:~$ debugjob --clean N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 22 minutes. The <tt>--clean</tt> argument is optional but recommended as it will start the session without any modules loaded, thus mimicking more closely what happens when you submit a job script.<br />
<br />
Finally, if your debugjob process takes more than 1 hour, you can request an interactive job from the regular queue using the salloc command. Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.<br />
nia-login07:~$ salloc --nodes N --time=M:00:00 --x11<br />
where N is again the number of nodes, and M is the number of hours you wish the job to run.<br />
The <tt>--x11</tt> is required if you need to use graphics while testing your code through salloc, e.g. when using a debugger such as [[Parallel Debugging with DDT|DDT]] or DDD, See the [[Testing_With_Graphics | Testing with graphics]] page for the options in that case.<br />
<br />
= Submitting jobs =<br />
<br />
<!-- == Progressive approach to run jobs on niagara == --><br />
<!-- We would like to emphasize the need for users to adopt a more progressive and explicit approach for testing, running and scaling up of jobs on niagara. [[Progressive_Approach | '''Here is a set of steps we suggest that you follow.''']] --><br />
<br />
Once you have compiled and tested your code or workflow on the Niagara login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Niagara's 1548 compute nodes. When and where your job runs is determined by the scheduler.<br />
<br />
Niagara uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
nia-login07:scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course. Note that you must submit your job from a login node. You cannot submit jobs from the datamover nodes.<br />
<br />
In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
Jobs will run under your group's RRG allocation, or, if the your group has none, under a RAS allocation (previously called `default' allocation).<br />
<br />
Some example job scripts can be found below.<br />
<br />
Keep in mind:<br />
* Scheduling is by node, so in multiples of 40 cores.<br />
* Your job's maximum walltime is 24 hours. <br />
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access.<br />
* Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below). <br />
* [[Data_Management#Moving_data | Move your data]] to Niagara before you submit your job.<br />
<br />
== Scheduling by Node ==<br />
<br />
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node what resources should be allocated. On Niagara things are a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.<br />
** No other users are running anything on them.<br />
** You can [[SSH]] into them to see how things are going.<br />
* Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes and 202GB is the amount of memory on the node.<br />
* If you run serial jobs you must still use all 40 cores on the node. Visit the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page for examples of how to do this.<br />
* Since there are 40 cores per node, your job should use N x 40 cores. If you do not, we will contact you to help you optimize your workflow. Or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the job runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Limit on Running jobs<br />
!Limit on Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 50 || 1000 || 1 node (40&nbsp;cores) || default:&nbsp;20&nbsp;nodes&nbsp;(800&nbsp;cores) <br> with&nbsp;allocation:&nbsp;1000&nbsp;nodes&nbsp;(40000&nbsp;cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40&nbsp;cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== File Input/Output Tips ==<br />
<br />
It is important to understand the file systems, so as to perform your file I/O (Input/Output) responsibly. Refer to the [[Data_Management | Data Management]] page for details about the file systems.<br />
* Your files can be seen on all Niagara login and compute nodes.<br />
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.<br />
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.<br />
* Accessing data sets which consist of many, small files leads to poor performance on GPFS.<br />
* Avoid reading and writing lots of small amounts of data to disk. Many small files on the system waste space and are slower to access, read and write. If you must write many small files, use [[User_Ramdisk | ramdisk]].<br />
* Write data out in a binary format. This is faster and takes less space.<br />
* The [[Burst Buffer]] is another option for I/O heavy-jobs and for speeding up [[Checkpoints|checkpoints]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load NiaEnv/2019b<br />
module load intel/2019u4<br />
module load openmpi/4.0.1<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:scratch$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li>First line indicates that this is a bash script.</li><br />
<li>Lines starting with <code>#SBATCH</code> go to SLURM.</li><br />
<li>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</li><br />
<li>In this case, SLURM looks for 2 nodes each running 40 tasks (for a total of 80 tasks), for 1 hour</li><br />
<li>Note that the mpifun flag "--ppn" (processors per node) is ignored.</li><br />
<li>Once it found such a node, it runs the script:<br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application (SLURM will inform mpirun or srun on how many processes to run).<br />
</li><br />
</ul><br />
<li>To use hyperthreading, just change <code>--ntasks-per-node=40</code> to <code>--ntasks-per-node=80</code>, and add <code>--bind-to none</code> to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load NiaEnv/2019b<br />
module load intel/2019u4<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.<br />
<br />
== Monitoring queued jobs ==<br />
<br />
Once the job is incorporated into the queue, there are some commands you can use to monitor its progress.<br />
<br />
<ul><br />
<li><p><code>squeue</code> or <code>sqc</code> (a caching version of squeue) to show the job queue (<code>squeue -u $USER</code> for just your jobs);</p></li><br />
<li><p><code>squeue -j JOBID</code> to get information on a specific job</p><br />
<p>(alternatively, <code>scontrol show job JOBID</code>, which is more verbose).</p></li><br />
<li><p><code>squeue --start -j JOBID</code> to get an estimate for when a job will run; these tend not to be very accurate predictions.</p></li><br />
<li><p><code>scancel -i JOBID</code> to cancel the job.</p></li><br />
<li><p><code>jobperf JOBID</code> to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.</p></li><br />
<li><p><code>sacct</code> to get information on your recent jobs.</p></li><br />
</ul><br />
<br />
Further instructions for monitoring your jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]]. The [https://my.scinet.utoronto.ca my.SciNet] site is also a very useful tool for monitoring your current and past usage.<br />
<br />
= Visualization =<br />
Information about how to use visualization tools on Niagara is available on [[Visualization]] page.<br />
<br />
= Support =<br />
<br />
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]<br />
* [mailto:niagara@computecanada.ca niagara@computecanada.ca]</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Main_Page&diff=3295Main Page2021-10-08T14:15:33Z<p>Bmundim: /* Tutorials, Manuals, etc. */ Link to ssh keys wiki</p>
<hr />
<div>__NOTOC__<br />
{| style="border-spacing:10px; width: 95%"<br />
| style="padding:1em; padding-top:.1em; border:2px solid #0645ad; background-color:#f6f6f6; border-radius:7px"|<br />
<br />
==System Status==<br />
<br />
<!-- Use "Up" or "Down"; these are templates. --><br />
{|style="width:100%" <br />
|{{Up |Niagara|Niagara_Quickstart}}<br />
|{{Up |Mist|Mist}}<br />
|{{Up |Teach|Teach}}<br />
|{{Up |Rouge|Rouge}}<br />
|-<br />
|{{Up |Jupyter Hub|Jupyter_Hub}}<br />
|{{Up |Scheduler|Niagara_Quickstart#Submitting_jobs}}<br />
|{{Up |File system|Niagara_Quickstart#Storage_and_quotas}}<br />
|{{Up |Burst Buffer|Burst_Buffer}}<br />
|-<br />
|{{Up|HPSS|HPSS}}<br />
|{{Up |Login Nodes|Niagara_Quickstart#Logging_in}} <br />
|{{Up |External Network|Niagara_Quickstart#Logging_in}} <br />
|{{Up |Globus |Globus}}<br />
|}<br />
<br />
<!-- Current Messages: --><br />
<b>Mon Sep 27 16:11 EDT 2021 </b> HPSS is back online.<br />
<br />
<b>Wed Sep 23 17:23 EDT 2021 </b> Systems being brought back online. HPSS may be down for some more days. <br />
<br />
<!-- When removing system status entries, please archive them to: https://docs.scinet.utoronto.ca/index.php/Previous_messages --><br />
{|style="border-spacing: 10px;width: 100%"<br />
|valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== QuickStart Guides ==<br />
* [[Niagara Quickstart]]<br />
* [[HPSS | HPSS archival storage]]<br />
* [[Mist| Mist Power 9 GPU cluster]]<br />
* [[Teach|Teach cluster]]<br />
* [[FAQ | FAQ (frequently asked questions)]]<br />
* [[Acknowledging SciNet]]<br />
| valign="top" style="margin: 1em; padding:1em; padding-top:.1em; border:2px solid #000; background-color:#fff; border-radius:7px; width: 49.5%" |<br />
<br />
== Tutorials, Manuals, etc. ==<br />
* [https://education.scinet.utoronto.ca SciNet education material]<br />
* [https://www.youtube.com/c/SciNetHPCattheUniversityofToronto SciNet's YouTube channel]<br />
* [[Modules specific to Niagara|Software Modules specific to Niagara]] <br />
* [[Modules for Mist]] <br />
* [[Commercial software]]<br />
* [[Burst Buffer]]<br />
* [[SSH keys]]<br />
* [[SSH Tunneling]]<br />
* [[SSH#Two-Factor_authentication|Two-Factor Authentication]]<br />
* [[Visualization]]<br />
* [[Running Serial Jobs on Niagara]]<br />
* [[Jupyter Hub]]<br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&diff=3272Niagara Quickstart2021-10-05T20:50:02Z<p>Bmundim: /* Logging in */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Niagara.jpg|center|300px|thumb]]<br />
|name=Niagara<br />
|installed=Jan 2018/March 2020<br />
|operatingsystem= CentOS 7.6 <br />
|loginnode= niagara.scinet.utoronto.ca<br />
|nnodes= 2,024 nodes (80,960 cores)<br />
|rampernode=188 GiB / 202 GB <br />
|corespernode=40 (80 hyperthreads)<br />
|interconnect=Mellanox Dragonfly+<br />
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)<br />
|queuetype=Slurm<br />
}}<br />
<br />
=Specifications=<br />
<br />
The Niagara cluster is a large cluster of 2,024 Lenovo SD530 servers each with 40 Intel "Skylake" at 2.4 GHz or 40 Intel "CascadeLake" cores at 2.5 GHz. <br />
The peak performance of the cluster is about 3.6 PFlops (6.25 PFlops theoretical). It was the 53rd fastest supercomputer on the [https://www.top500.org/list/2018/06/?page=1 TOP500 list of June 2018], and is at number 113 on the [https://www.top500.org/lists/top500/list/2021/06/ current list (June 2021)]. <br />
<br />
Each node of the cluster has 188 GiB / 202 GB RAM per node (at least 4 GiB/core for user jobs). Being designed for large parallel workloads, it has a fast interconnect consisting of EDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours and favours large jobs.<br />
<br />
* See the [https://www.youtube.com/watch?v=l-E2CFGh0BE&feature=youtu.be "Intro to Niagara"] recording<br />
<br />
More detailed hardware characteristics of the Niagara supercomputer can be found [https://docs.computecanada.ca/wiki/Niagara on this page].<br />
<br />
Note: Documentation about the "GPU expansion to Niagara" called "Mist" can be found on [[Mist | its own page]].<br />
<br />
= Getting started on Niagara =<br />
<br />
Access to Niagara is not enabled automatically for everyone with a Compute Canada account, but anyone with an active Compute Canada account can get their access enabled.<br />
<br />
If you have an active Compute Canada account but you do not have access to Niagara yet (e.g. because you are new to SciNet or belong to a group whose primary PI does not have an allocation as granted in the annual [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions Compute Canada RAC]), go to the [https://ccdb.computecanada.ca/services/opt_in opt-in page on the CCDB site]. After clicking the "Join" button, it usually takes only one or two business days for access to be granted. <br />
<br />
Please read this document carefully. The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource. If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].<br />
<br />
== Logging in ==<br />
<br />
Niagara runs CentOS 7, which is a type of Linux. You will need to be familiar with Linux systems to work on Niagara. If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&search=scmp101&include=all&filter=Filter Introduction to Linux Shell] class.<br />
<br />
As with all SciNet and CC (Compute Canada) compute systems, access to Niagara is done via [[SSH]] (secure shell) only. Open a terminal window (e.g. Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Niagara login nodes with your CC credentials:<br />
<br />
$ ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca<br />
<br />
or<br />
<br />
$ ssh -Y MYCCUSERNAME@niagara.computecanada.ca<br />
<br />
The first time you login to Niagara, please make sure you are actually accessing Niagara by checking if the login node ssh host key fingerprint matches [[SSH_Changes_in_May_2019 | (See here how)]]. This check prevents you from falling victim of [https://en.wikipedia.org/wiki/Man-in-the-middle_attack man-in-the-middle attacks.]<br />
<br />
* The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.<br />
* These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.<br />
* The optional <code>-Y</code> is needed to open windows from the Niagara command-line onto your local X server.<br />
* You can only connect 4 times in a 2-minute window to the login nodes. <br />
* To run on Niagara's compute nodes, you must [[#Submitting_jobs | submit a batch job]].<br />
<br />
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.<br />
<br />
== Your various directories ==<br />
<br />
By virtue of your access to Niagara you are granted storage space on the system. There are several directories available to you, each indicated by an associated environment variable.<br />
<br />
=== home and scratch ===<br />
<br />
You have a home and scratch directory on the system, the paths to which are stored in the environment variables $HOME and $SCRATCH. The locations are of the form<br />
<br />
$HOME=/home/g/groupname/myccusername<br />
$SCRATCH=/scratch/g/groupname/myccusername<br />
<br />
where groupname is the name of your PI's group, and myccusername is your CC username. For example:<br />
<br />
nia-login07:~$ pwd<br />
/home/s/scinet/rzon<br />
nia-login07:~$ cd $SCRATCH<br />
nia-login07:rzon$ pwd<br />
/scratch/s/scinet/rzon<br />
<br />
NOTE: home is read-only on compute nodes.<br />
<br />
=== project and archive/nearline ===<br />
<br />
Users from groups with [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC storage allocation] will also have a project directory and possible an archive (a.k.a. "nearline") directory, the paths to which are stored in the environment variables $PROJECT and $ARCHIVE. They follow the naming convention:<br />
<br />
$PROJECT=/project/g/groupname/myccusername<br />
$ARCHIVE=/archive/g/groupname/myccusername<br />
<br />
NOTE: Currently archive space is available only via [[HPSS]], and is not accessible on the Niagara login, compute, or datamover nodes.<br />
<br />
'''''IMPORTANT: Future-proof your scripts'''''<br />
<br />
When writing your scripts, use the environment variables (<tt>$HOME</tt>, <tt>$SCRATCH</tt>, <tt>$PROJECT</tt>, <tt>$ARCHIVE</tt>) instead of the actual paths! The paths may change in the future.<br />
<br />
=== Storage and quotas ===<br />
<br />
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use them. This table summarizes the various file systems. See the [[Data_Management | Data Management]] page for more details.<br />
<br />
{| class="wikitable"<br />
! location<br />
!colspan="2"| quota<br />
!align="right"| block size<br />
! expiration time<br />
! backed up<br />
! on login nodes<br />
! on compute nodes<br />
|-<br />
| $HOME<br />
|colspan="2"| 100 GB / 250,000 files per user<br />
|align="right"| 1 MB<br />
| <br />
| yes<br />
| yes<br />
| read-only<br />
|-<br />
|rowspan="2"| $SCRATCH<br />
|colspan="2"| 25 TB / 6,000,000 file per user<br />
|align="right" rowspan="2" | 16 MB<br />
|rowspan="2"| 2 months<br />
|rowspan="2"| no<br />
|rowspan="2"| yes<br />
|rowspan="2"| yes<br />
|-<br />
|align="right"|50-500TB per group<br />
|align="right"|[[Data_Management#Quotas_and_purging | depending on group size]]<br />
|-<br />
| $PROJECT<br />
|colspan="2"| by group allocation<br />
|align="right"| 16 MB<br />
| <br />
| yes<br />
| yes<br />
| yes<br />
|-<br />
| $ARCHIVE<br />
|colspan="2"| by group (nearline) allocation<br />
|align="right"| <br />
|<br />
| dual-copy<br />
| no<br />
| no<br />
|-<br />
| $BBUFFER<br />
|colspan="2"| 10 TB per user<br />
|align="right"| 1 MB<br />
| very short<br />
| no<br />
| yes<br />
| yes<br />
|}<br />
<br />
=== Moving data to Niagara ===<br />
<br />
If you need to move data to Niagara for analysis, or when you need to move data off of Niagara, use the following guidelines:<br />
* If your data is less than 10GB, move the data using the login nodes.<br />
* If your data is greater than 10GB, move the data using the datamover nodes nia-datamover1.scinet.utoronto.ca and nia-datamover2.scinet.utoronto.ca .<br />
<br />
Details of how to use the datamover nodes can be found on the [[Data_Management#Moving_data | Data Management ]] page.<br />
<br />
= Loading software modules =<br />
<br />
You have two options for running code on Niagara: use existing software, or [[Niagara_Quickstart#Compiling_on_Niagara:_Example | compile your own]]. This section focuses on the former.<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
== Software stacks: NiaEnv and CCEnv ==<br />
<br />
On Niagara, there are two available software stacks:<br />
<br />
=== NiaEnv ===<br />
<br />
A [https://docs.scinet.utoronto.ca/index.php/Modules_specific_to_Niagara Niagara software stack] tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with<br />
<pre>module load NiaEnv</pre><br />
This loads the default (set of modules), which is currently the 2019b epoch. Before September 1, the default was NiaEnv/2018a. Users are encourage to use the 2019b stack, but to make sure old job scripts or older software installations in your home directory continue to work, you may need to use<br />
<pre>module load NiaEnv/2018a</pre><br />
You can override the system default for the epoch version by creating a file called <b><tt>.modulerc</tt></b> in your home directory with the line <b><tt>module-version NiaEnv/VERSION default</tt></b>, e.g. like so:<br />
<pre><br />
echo "module-version NiaEnv/2019b default" > $HOME/.modulerc<br />
</pre><br />
After this, subsequent logins and jobs will use the 2019b stack even when the system default is different.<br />
<p>Similarly, you can make an older epoch your personal default, like so<br />
<pre><br />
echo "module-version NiaEnv/2018a default" > $HOME/.modulerc<br />
</pre><br />
<br />
No modules are loaded by default on Niagara except NiaEnv.<br />
<br />
=== CCEnv ===<br />
<br />
The same [https://docs.computecanada.ca/wiki/Modules software stack available on Compute Canada's General Purpose clusters] [https://docs.computecanada.ca/wiki/Graham Graham] and [https://docs.computecanada.ca/wiki/Cedar Cedar] can be used on Niagara too, with:<br />
<pre>module load CCEnv</pre><br />
Or, if you want the same default modules loaded as on Béluga, then do<br />
<pre>module load CCEnv StdEnv</pre><br />
or, if you want the same default modules loaded as on Cedar and Graham, do<br />
<pre>module load CCEnv arch/avx2 StdEnv</pre><br />
<br />
== Tips for loading software ==<br />
<br />
* We advise '''''against''''' loading modules in your .bashrc. This can lead to very confusing behaviour under certain circumstances. Our guidelines for .bashrc files can be found [[bashrc guidelines|here]].<br />
* Instead, load modules by hand when needed, or by sourcing a separate script.<br />
* Load run-specific modules inside your job submission script.<br />
* Short names give default versions; e.g. <code>intel</code> → <code>intel/2018.2</code>. It is usually better to be explicit about the versions, for future reproducibility.<br />
* Modules often require other modules to be loaded first. Solve these dependencies by using [[Using_modules#Module_spider | <code>module spider</code>]].<br />
<br />
= Available compilers and interpreters =<br />
<br />
* For most compiled software, one should use the Intel compilers (<tt>icc</tt> for C, <tt>icpc</tt> for C++, and <tt>ifort</tt> for Fortran). Loading an <tt>intel</tt> module makes these available. <br />
* The GNU compiler suite (<tt>gcc, g++, gfortran</tt>) is also available, if you load one of the <tt>gcc</tt> modules.<br />
* To compile mpi code, you must additionally load an <tt>openmpi</tt> or <tt>intelmpi</tt> module.<br />
* Open source interpreted, interactive software is also available:<br />
** [[Python]]<br />
** [[R]]<br />
** Julia<br />
** [[Octave]]<br />
<br />
Please visit the corresponding page for details on using these tools. For information on running MATLAB applications on Niagara, visit [[MATLAB| this page]].<br />
<br />
= Using Commercial Software =<br />
<br />
May I use commercial software on Niagara?<br />
* Possibly, but you have to bring your own license for it. You can connect to an external license server using [[SSH_Tunneling | ssh tunneling]].<br />
* SciNet and Compute Canada have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.<br />
* Thus, the only freely available commercial software installed on Niagara is software that can benefit everyone: Compilers, math libraries and debuggers.<br />
* That means no [[MATLAB]], Gaussian, IDL, <br />
* Open source alternatives like Octave, [[Python]], and [[R]] are available.<br />
* We are happy to help you to install commercial software for which you have a license.<br />
* In some cases, if you have a license, you can use software in the Compute Canada stack.<br />
The list of commercial software which is installed on Niagara, for which you will need a license to use, can be found on the [[Commercial_software | commercial software page]].<br />
<br />
= Compiling on Niagara: Example =<br />
<br />
Suppose one wants to compile an application from two c source files, appl.c and module.c, which use the Math Kernel Library. This is an example of how this would be done:<br />
<source lang="bash"><br />
nia-login07:~$ module load NiaEnv/2019b<br />
nia-login07:~$ module list<br />
Currently Loaded Modules:<br />
1) NiaEnv/2019b (S)<br />
Where:<br />
S: Module is Sticky, requires --force to unload or purge<br />
<br />
nia-login07:~$ module load intel/2019u4<br />
<br />
nia-login07:~$ ls<br />
appl.c module.c<br />
<br />
nia-login07:~$ icc -c -O3 -xHost -o appl.o appl.c<br />
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c<br />
nia-login07:~$ icc -o appl module.o appl.o -mkl<br />
<br />
nia-login07:~$ ./appl<br />
</source><br />
Note:<br />
* The optimization flags -O3 -xHost allow the Intel compiler to use instructions specific to the architecture CPU that is present (instead of for more generic x86_64 CPUs).<br />
* Linking with the Intel Math Kernel Library (MKL) is easy when using the intel compiler, it just requires the -mkl flags.<br />
* If compiling with gcc, the optimization flags would be -O3 -march=native. For the way to link with the MKL, it is suggested to use the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL link line advisor].<br />
<br />
= Testing and Debugging =<br />
<br />
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.<br />
* Small test jobs can be run on the login nodes. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than a couple of cores.<br />
* You can run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after <code>module load ddt</code>.<br />
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:<br />
nia-login07:~$ debugjob --clean N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 22 minutes. The <tt>--clean</tt> argument is optional but recommended as it will start the session without any modules loaded, thus mimicking more closely what happens when you submit a job script.<br />
<br />
Finally, if your debugjob process takes more than 1 hour, you can request an interactive job from the regular queue using the salloc command. Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.<br />
nia-login07:~$ salloc --nodes N --time=M:00:00 --x11<br />
where N is again the number of nodes, and M is the number of hours you wish the job to run.<br />
The <tt>--x11</tt> is required if you need to use graphics while testing your code through salloc, e.g. when using a debugger such as [[Parallel Debugging with DDT|DDT]] or DDD, See the [[Testing_With_Graphics | Testing with graphics]] page for the options in that case.<br />
<br />
= Submitting jobs =<br />
<br />
<!-- == Progressive approach to run jobs on niagara == --><br />
<!-- We would like to emphasize the need for users to adopt a more progressive and explicit approach for testing, running and scaling up of jobs on niagara. [[Progressive_Approach | '''Here is a set of steps we suggest that you follow.''']] --><br />
<br />
Once you have compiled and tested your code or workflow on the Niagara login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Niagara's 1548 compute nodes. When and where your job runs is determined by the scheduler.<br />
<br />
Niagara uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
nia-login07:scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course. Note that you must submit your job from a login node. You cannot submit jobs from the datamover nodes.<br />
<br />
In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
Jobs will run under your group's RRG allocation, or, if the your group has none, under a RAS allocation (previously called `default' allocation).<br />
<br />
Some example job scripts can be found below.<br />
<br />
Keep in mind:<br />
* Scheduling is by node, so in multiples of 40 cores.<br />
* Your job's maximum walltime is 24 hours. <br />
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access.<br />
* Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below). <br />
* [[Data_Management#Moving_data | Move your data]] to Niagara before you submit your job.<br />
<br />
== Scheduling by Node ==<br />
<br />
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node what resources should be allocated. On Niagara things are a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.<br />
** No other users are running anything on them.<br />
** You can [[SSH]] into them to see how things are going.<br />
* Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes and 202GB is the amount of memory on the node.<br />
* If you run serial jobs you must still use all 40 cores on the node. Visit the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page for examples of how to do this.<br />
* Since there are 40 cores per node, your job should use N x 40 cores. If you do not, we will contact you to help you optimize your workflow. Or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the job runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Limit on Running jobs<br />
!Limit on Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 50 || 1000 || 1 node (40&nbsp;cores) || default:&nbsp;20&nbsp;nodes&nbsp;(800&nbsp;cores) <br> with&nbsp;allocation:&nbsp;1000&nbsp;nodes&nbsp;(40000&nbsp;cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40&nbsp;cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== File Input/Output Tips ==<br />
<br />
It is important to understand the file systems, so as to perform your file I/O (Input/Output) responsibly. Refer to the [[Data_Management | Data Management]] page for details about the file systems.<br />
* Your files can be seen on all Niagara login and compute nodes.<br />
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.<br />
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.<br />
* Accessing data sets which consist of many, small files leads to poor performance on GPFS.<br />
* Avoid reading and writing lots of small amounts of data to disk. Many small files on the system waste space and are slower to access, read and write. If you must write many small files, use [[User_Ramdisk | ramdisk]].<br />
* Write data out in a binary format. This is faster and takes less space.<br />
* The [[Burst Buffer]] is another option for I/O heavy-jobs and for speeding up [[Checkpoints|checkpoints]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load NiaEnv/2019b<br />
module load intel/2019u4<br />
module load openmpi/4.0.1<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:scratch$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li>First line indicates that this is a bash script.</li><br />
<li>Lines starting with <code>#SBATCH</code> go to SLURM.</li><br />
<li>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</li><br />
<li>In this case, SLURM looks for 2 nodes each running 40 tasks (for a total of 80 tasks), for 1 hour</li><br />
<li>Note that the mpifun flag "--ppn" (processors per node) is ignored.</li><br />
<li>Once it found such a node, it runs the script:<br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application (SLURM will inform mpirun or srun on how many processes to run).<br />
</li><br />
</ul><br />
<li>To use hyperthreading, just change <code>--ntasks-per-node=40</code> to <code>--ntasks-per-node=80</code>, and add <code>--bind-to none</code> to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load NiaEnv/2019b<br />
module load intel/2019u4<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.<br />
<br />
== Monitoring queued jobs ==<br />
<br />
Once the job is incorporated into the queue, there are some commands you can use to monitor its progress.<br />
<br />
<ul><br />
<li><p><code>squeue</code> or <code>sqc</code> (a caching version of squeue) to show the job queue (<code>squeue -u $USER</code> for just your jobs);</p></li><br />
<li><p><code>squeue -j JOBID</code> to get information on a specific job</p><br />
<p>(alternatively, <code>scontrol show job JOBID</code>, which is more verbose).</p></li><br />
<li><p><code>squeue --start -j JOBID</code> to get an estimate for when a job will run; these tend not to be very accurate predictions.</p></li><br />
<li><p><code>scancel -i JOBID</code> to cancel the job.</p></li><br />
<li><p><code>jobperf JOBID</code> to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.</p></li><br />
<li><p><code>sacct</code> to get information on your recent jobs.</p></li><br />
</ul><br />
<br />
Further instructions for monitoring your jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]]. The [https://my.scinet.utoronto.ca my.SciNet] site is also a very useful tool for monitoring your current and past usage.<br />
<br />
= Visualization =<br />
Information about how to use visualization tools on Niagara is available on [[Visualization]] page.<br />
<br />
= Support =<br />
<br />
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]<br />
* [mailto:niagara@computecanada.ca niagara@computecanada.ca]</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&diff=3271Niagara Quickstart2021-10-05T20:48:39Z<p>Bmundim: /* Logging in */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Niagara.jpg|center|300px|thumb]]<br />
|name=Niagara<br />
|installed=Jan 2018/March 2020<br />
|operatingsystem= CentOS 7.6 <br />
|loginnode= niagara.scinet.utoronto.ca<br />
|nnodes= 2,024 nodes (80,960 cores)<br />
|rampernode=188 GiB / 202 GB <br />
|corespernode=40 (80 hyperthreads)<br />
|interconnect=Mellanox Dragonfly+<br />
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)<br />
|queuetype=Slurm<br />
}}<br />
<br />
=Specifications=<br />
<br />
The Niagara cluster is a large cluster of 2,024 Lenovo SD530 servers each with 40 Intel "Skylake" at 2.4 GHz or 40 Intel "CascadeLake" cores at 2.5 GHz. <br />
The peak performance of the cluster is about 3.6 PFlops (6.25 PFlops theoretical). It was the 53rd fastest supercomputer on the [https://www.top500.org/list/2018/06/?page=1 TOP500 list of June 2018], and is at number 113 on the [https://www.top500.org/lists/top500/list/2021/06/ current list (June 2021)]. <br />
<br />
Each node of the cluster has 188 GiB / 202 GB RAM per node (at least 4 GiB/core for user jobs). Being designed for large parallel workloads, it has a fast interconnect consisting of EDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours and favours large jobs.<br />
<br />
* See the [https://www.youtube.com/watch?v=l-E2CFGh0BE&feature=youtu.be "Intro to Niagara"] recording<br />
<br />
More detailed hardware characteristics of the Niagara supercomputer can be found [https://docs.computecanada.ca/wiki/Niagara on this page].<br />
<br />
Note: Documentation about the "GPU expansion to Niagara" called "Mist" can be found on [[Mist | its own page]].<br />
<br />
= Getting started on Niagara =<br />
<br />
Access to Niagara is not enabled automatically for everyone with a Compute Canada account, but anyone with an active Compute Canada account can get their access enabled.<br />
<br />
If you have an active Compute Canada account but you do not have access to Niagara yet (e.g. because you are new to SciNet or belong to a group whose primary PI does not have an allocation as granted in the annual [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions Compute Canada RAC]), go to the [https://ccdb.computecanada.ca/services/opt_in opt-in page on the CCDB site]. After clicking the "Join" button, it usually takes only one or two business days for access to be granted. <br />
<br />
Please read this document carefully. The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource. If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].<br />
<br />
== Logging in ==<br />
<br />
Niagara runs CentOS 7, which is a type of Linux. You will need to be familiar with Linux systems to work on Niagara. If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&search=scmp101&include=all&filter=Filter Introduction to Linux Shell] class.<br />
<br />
As with all SciNet and CC (Compute Canada) compute systems, access to Niagara is done via [[SSH]] (secure shell) only. Open a terminal window (e.g. Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Niagara login nodes with your CC credentials:<br />
<br />
$ ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca<br />
<br />
or<br />
<br />
$ ssh -Y MYCCUSERNAME@niagara.computecanada.ca<br />
<br />
The first time you login to Niagara, please make sure you are actually accessing Niagara by checking if the login node ssh host key fingerprint matches. [[SSH_Changes_in_May_2019 | See here how]]. This check prevents you from falling victim of [https://en.wikipedia.org/wiki/Man-in-the-middle_attack man-in-the-middle attacks.]<br />
<br />
* The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.<br />
* These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.<br />
* The optional <code>-Y</code> is needed to open windows from the Niagara command-line onto your local X server.<br />
* You can only connect 4 times in a 2-minute window to the login nodes. <br />
* To run on Niagara's compute nodes, you must [[#Submitting_jobs | submit a batch job]].<br />
<br />
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.<br />
<br />
== Your various directories ==<br />
<br />
By virtue of your access to Niagara you are granted storage space on the system. There are several directories available to you, each indicated by an associated environment variable.<br />
<br />
=== home and scratch ===<br />
<br />
You have a home and scratch directory on the system, the paths to which are stored in the environment variables $HOME and $SCRATCH. The locations are of the form<br />
<br />
$HOME=/home/g/groupname/myccusername<br />
$SCRATCH=/scratch/g/groupname/myccusername<br />
<br />
where groupname is the name of your PI's group, and myccusername is your CC username. For example:<br />
<br />
nia-login07:~$ pwd<br />
/home/s/scinet/rzon<br />
nia-login07:~$ cd $SCRATCH<br />
nia-login07:rzon$ pwd<br />
/scratch/s/scinet/rzon<br />
<br />
NOTE: home is read-only on compute nodes.<br />
<br />
=== project and archive/nearline ===<br />
<br />
Users from groups with [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC storage allocation] will also have a project directory and possible an archive (a.k.a. "nearline") directory, the paths to which are stored in the environment variables $PROJECT and $ARCHIVE. They follow the naming convention:<br />
<br />
$PROJECT=/project/g/groupname/myccusername<br />
$ARCHIVE=/archive/g/groupname/myccusername<br />
<br />
NOTE: Currently archive space is available only via [[HPSS]], and is not accessible on the Niagara login, compute, or datamover nodes.<br />
<br />
'''''IMPORTANT: Future-proof your scripts'''''<br />
<br />
When writing your scripts, use the environment variables (<tt>$HOME</tt>, <tt>$SCRATCH</tt>, <tt>$PROJECT</tt>, <tt>$ARCHIVE</tt>) instead of the actual paths! The paths may change in the future.<br />
<br />
=== Storage and quotas ===<br />
<br />
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use them. This table summarizes the various file systems. See the [[Data_Management | Data Management]] page for more details.<br />
<br />
{| class="wikitable"<br />
! location<br />
!colspan="2"| quota<br />
!align="right"| block size<br />
! expiration time<br />
! backed up<br />
! on login nodes<br />
! on compute nodes<br />
|-<br />
| $HOME<br />
|colspan="2"| 100 GB / 250,000 files per user<br />
|align="right"| 1 MB<br />
| <br />
| yes<br />
| yes<br />
| read-only<br />
|-<br />
|rowspan="2"| $SCRATCH<br />
|colspan="2"| 25 TB / 6,000,000 file per user<br />
|align="right" rowspan="2" | 16 MB<br />
|rowspan="2"| 2 months<br />
|rowspan="2"| no<br />
|rowspan="2"| yes<br />
|rowspan="2"| yes<br />
|-<br />
|align="right"|50-500TB per group<br />
|align="right"|[[Data_Management#Quotas_and_purging | depending on group size]]<br />
|-<br />
| $PROJECT<br />
|colspan="2"| by group allocation<br />
|align="right"| 16 MB<br />
| <br />
| yes<br />
| yes<br />
| yes<br />
|-<br />
| $ARCHIVE<br />
|colspan="2"| by group (nearline) allocation<br />
|align="right"| <br />
|<br />
| dual-copy<br />
| no<br />
| no<br />
|-<br />
| $BBUFFER<br />
|colspan="2"| 10 TB per user<br />
|align="right"| 1 MB<br />
| very short<br />
| no<br />
| yes<br />
| yes<br />
|}<br />
<br />
=== Moving data to Niagara ===<br />
<br />
If you need to move data to Niagara for analysis, or when you need to move data off of Niagara, use the following guidelines:<br />
* If your data is less than 10GB, move the data using the login nodes.<br />
* If your data is greater than 10GB, move the data using the datamover nodes nia-datamover1.scinet.utoronto.ca and nia-datamover2.scinet.utoronto.ca .<br />
<br />
Details of how to use the datamover nodes can be found on the [[Data_Management#Moving_data | Data Management ]] page.<br />
<br />
= Loading software modules =<br />
<br />
You have two options for running code on Niagara: use existing software, or [[Niagara_Quickstart#Compiling_on_Niagara:_Example | compile your own]]. This section focuses on the former.<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
== Software stacks: NiaEnv and CCEnv ==<br />
<br />
On Niagara, there are two available software stacks:<br />
<br />
=== NiaEnv ===<br />
<br />
A [https://docs.scinet.utoronto.ca/index.php/Modules_specific_to_Niagara Niagara software stack] tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with<br />
<pre>module load NiaEnv</pre><br />
This loads the default (set of modules), which is currently the 2019b epoch. Before September 1, the default was NiaEnv/2018a. Users are encourage to use the 2019b stack, but to make sure old job scripts or older software installations in your home directory continue to work, you may need to use<br />
<pre>module load NiaEnv/2018a</pre><br />
You can override the system default for the epoch version by creating a file called <b><tt>.modulerc</tt></b> in your home directory with the line <b><tt>module-version NiaEnv/VERSION default</tt></b>, e.g. like so:<br />
<pre><br />
echo "module-version NiaEnv/2019b default" > $HOME/.modulerc<br />
</pre><br />
After this, subsequent logins and jobs will use the 2019b stack even when the system default is different.<br />
<p>Similarly, you can make an older epoch your personal default, like so<br />
<pre><br />
echo "module-version NiaEnv/2018a default" > $HOME/.modulerc<br />
</pre><br />
<br />
No modules are loaded by default on Niagara except NiaEnv.<br />
<br />
=== CCEnv ===<br />
<br />
The same [https://docs.computecanada.ca/wiki/Modules software stack available on Compute Canada's General Purpose clusters] [https://docs.computecanada.ca/wiki/Graham Graham] and [https://docs.computecanada.ca/wiki/Cedar Cedar] can be used on Niagara too, with:<br />
<pre>module load CCEnv</pre><br />
Or, if you want the same default modules loaded as on Béluga, then do<br />
<pre>module load CCEnv StdEnv</pre><br />
or, if you want the same default modules loaded as on Cedar and Graham, do<br />
<pre>module load CCEnv arch/avx2 StdEnv</pre><br />
<br />
== Tips for loading software ==<br />
<br />
* We advise '''''against''''' loading modules in your .bashrc. This can lead to very confusing behaviour under certain circumstances. Our guidelines for .bashrc files can be found [[bashrc guidelines|here]].<br />
* Instead, load modules by hand when needed, or by sourcing a separate script.<br />
* Load run-specific modules inside your job submission script.<br />
* Short names give default versions; e.g. <code>intel</code> → <code>intel/2018.2</code>. It is usually better to be explicit about the versions, for future reproducibility.<br />
* Modules often require other modules to be loaded first. Solve these dependencies by using [[Using_modules#Module_spider | <code>module spider</code>]].<br />
<br />
= Available compilers and interpreters =<br />
<br />
* For most compiled software, one should use the Intel compilers (<tt>icc</tt> for C, <tt>icpc</tt> for C++, and <tt>ifort</tt> for Fortran). Loading an <tt>intel</tt> module makes these available. <br />
* The GNU compiler suite (<tt>gcc, g++, gfortran</tt>) is also available, if you load one of the <tt>gcc</tt> modules.<br />
* To compile mpi code, you must additionally load an <tt>openmpi</tt> or <tt>intelmpi</tt> module.<br />
* Open source interpreted, interactive software is also available:<br />
** [[Python]]<br />
** [[R]]<br />
** Julia<br />
** [[Octave]]<br />
<br />
Please visit the corresponding page for details on using these tools. For information on running MATLAB applications on Niagara, visit [[MATLAB| this page]].<br />
<br />
= Using Commercial Software =<br />
<br />
May I use commercial software on Niagara?<br />
* Possibly, but you have to bring your own license for it. You can connect to an external license server using [[SSH_Tunneling | ssh tunneling]].<br />
* SciNet and Compute Canada have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.<br />
* Thus, the only freely available commercial software installed on Niagara is software that can benefit everyone: Compilers, math libraries and debuggers.<br />
* That means no [[MATLAB]], Gaussian, IDL, <br />
* Open source alternatives like Octave, [[Python]], and [[R]] are available.<br />
* We are happy to help you to install commercial software for which you have a license.<br />
* In some cases, if you have a license, you can use software in the Compute Canada stack.<br />
The list of commercial software which is installed on Niagara, for which you will need a license to use, can be found on the [[Commercial_software | commercial software page]].<br />
<br />
= Compiling on Niagara: Example =<br />
<br />
Suppose one wants to compile an application from two c source files, appl.c and module.c, which use the Math Kernel Library. This is an example of how this would be done:<br />
<source lang="bash"><br />
nia-login07:~$ module load NiaEnv/2019b<br />
nia-login07:~$ module list<br />
Currently Loaded Modules:<br />
1) NiaEnv/2019b (S)<br />
Where:<br />
S: Module is Sticky, requires --force to unload or purge<br />
<br />
nia-login07:~$ module load intel/2019u4<br />
<br />
nia-login07:~$ ls<br />
appl.c module.c<br />
<br />
nia-login07:~$ icc -c -O3 -xHost -o appl.o appl.c<br />
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c<br />
nia-login07:~$ icc -o appl module.o appl.o -mkl<br />
<br />
nia-login07:~$ ./appl<br />
</source><br />
Note:<br />
* The optimization flags -O3 -xHost allow the Intel compiler to use instructions specific to the architecture CPU that is present (instead of for more generic x86_64 CPUs).<br />
* Linking with the Intel Math Kernel Library (MKL) is easy when using the intel compiler, it just requires the -mkl flags.<br />
* If compiling with gcc, the optimization flags would be -O3 -march=native. For the way to link with the MKL, it is suggested to use the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL link line advisor].<br />
<br />
= Testing and Debugging =<br />
<br />
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.<br />
* Small test jobs can be run on the login nodes. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than a couple of cores.<br />
* You can run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after <code>module load ddt</code>.<br />
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:<br />
nia-login07:~$ debugjob --clean N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 22 minutes. The <tt>--clean</tt> argument is optional but recommended as it will start the session without any modules loaded, thus mimicking more closely what happens when you submit a job script.<br />
<br />
Finally, if your debugjob process takes more than 1 hour, you can request an interactive job from the regular queue using the salloc command. Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.<br />
nia-login07:~$ salloc --nodes N --time=M:00:00 --x11<br />
where N is again the number of nodes, and M is the number of hours you wish the job to run.<br />
The <tt>--x11</tt> is required if you need to use graphics while testing your code through salloc, e.g. when using a debugger such as [[Parallel Debugging with DDT|DDT]] or DDD, See the [[Testing_With_Graphics | Testing with graphics]] page for the options in that case.<br />
<br />
= Submitting jobs =<br />
<br />
<!-- == Progressive approach to run jobs on niagara == --><br />
<!-- We would like to emphasize the need for users to adopt a more progressive and explicit approach for testing, running and scaling up of jobs on niagara. [[Progressive_Approach | '''Here is a set of steps we suggest that you follow.''']] --><br />
<br />
Once you have compiled and tested your code or workflow on the Niagara login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Niagara's 1548 compute nodes. When and where your job runs is determined by the scheduler.<br />
<br />
Niagara uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
nia-login07:scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course. Note that you must submit your job from a login node. You cannot submit jobs from the datamover nodes.<br />
<br />
In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
Jobs will run under your group's RRG allocation, or, if the your group has none, under a RAS allocation (previously called `default' allocation).<br />
<br />
Some example job scripts can be found below.<br />
<br />
Keep in mind:<br />
* Scheduling is by node, so in multiples of 40 cores.<br />
* Your job's maximum walltime is 24 hours. <br />
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access.<br />
* Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below). <br />
* [[Data_Management#Moving_data | Move your data]] to Niagara before you submit your job.<br />
<br />
== Scheduling by Node ==<br />
<br />
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node what resources should be allocated. On Niagara things are a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.<br />
** No other users are running anything on them.<br />
** You can [[SSH]] into them to see how things are going.<br />
* Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes and 202GB is the amount of memory on the node.<br />
* If you run serial jobs you must still use all 40 cores on the node. Visit the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page for examples of how to do this.<br />
* Since there are 40 cores per node, your job should use N x 40 cores. If you do not, we will contact you to help you optimize your workflow. Or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the job runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Limit on Running jobs<br />
!Limit on Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 50 || 1000 || 1 node (40&nbsp;cores) || default:&nbsp;20&nbsp;nodes&nbsp;(800&nbsp;cores) <br> with&nbsp;allocation:&nbsp;1000&nbsp;nodes&nbsp;(40000&nbsp;cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40&nbsp;cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== File Input/Output Tips ==<br />
<br />
It is important to understand the file systems, so as to perform your file I/O (Input/Output) responsibly. Refer to the [[Data_Management | Data Management]] page for details about the file systems.<br />
* Your files can be seen on all Niagara login and compute nodes.<br />
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.<br />
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.<br />
* Accessing data sets which consist of many, small files leads to poor performance on GPFS.<br />
* Avoid reading and writing lots of small amounts of data to disk. Many small files on the system waste space and are slower to access, read and write. If you must write many small files, use [[User_Ramdisk | ramdisk]].<br />
* Write data out in a binary format. This is faster and takes less space.<br />
* The [[Burst Buffer]] is another option for I/O heavy-jobs and for speeding up [[Checkpoints|checkpoints]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load NiaEnv/2019b<br />
module load intel/2019u4<br />
module load openmpi/4.0.1<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:scratch$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li>First line indicates that this is a bash script.</li><br />
<li>Lines starting with <code>#SBATCH</code> go to SLURM.</li><br />
<li>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</li><br />
<li>In this case, SLURM looks for 2 nodes each running 40 tasks (for a total of 80 tasks), for 1 hour</li><br />
<li>Note that the mpifun flag "--ppn" (processors per node) is ignored.</li><br />
<li>Once it found such a node, it runs the script:<br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application (SLURM will inform mpirun or srun on how many processes to run).<br />
</li><br />
</ul><br />
<li>To use hyperthreading, just change <code>--ntasks-per-node=40</code> to <code>--ntasks-per-node=80</code>, and add <code>--bind-to none</code> to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load NiaEnv/2019b<br />
module load intel/2019u4<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.<br />
<br />
== Monitoring queued jobs ==<br />
<br />
Once the job is incorporated into the queue, there are some commands you can use to monitor its progress.<br />
<br />
<ul><br />
<li><p><code>squeue</code> or <code>sqc</code> (a caching version of squeue) to show the job queue (<code>squeue -u $USER</code> for just your jobs);</p></li><br />
<li><p><code>squeue -j JOBID</code> to get information on a specific job</p><br />
<p>(alternatively, <code>scontrol show job JOBID</code>, which is more verbose).</p></li><br />
<li><p><code>squeue --start -j JOBID</code> to get an estimate for when a job will run; these tend not to be very accurate predictions.</p></li><br />
<li><p><code>scancel -i JOBID</code> to cancel the job.</p></li><br />
<li><p><code>jobperf JOBID</code> to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.</p></li><br />
<li><p><code>sacct</code> to get information on your recent jobs.</p></li><br />
</ul><br />
<br />
Further instructions for monitoring your jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]]. The [https://my.scinet.utoronto.ca my.SciNet] site is also a very useful tool for monitoring your current and past usage.<br />
<br />
= Visualization =<br />
Information about how to use visualization tools on Niagara is available on [[Visualization]] page.<br />
<br />
= Support =<br />
<br />
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]<br />
* [mailto:niagara@computecanada.ca niagara@computecanada.ca]</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=SSH_Tunneling&diff=3133SSH Tunneling2021-06-29T20:04:56Z<p>Bmundim: Change datamovers IPs.</p>
<hr />
<div>=What is SSH tunneling?=<br />
SSH tunnelling is a method to use a gateway computer to connect two<br />
computers that cannot connect directly.<br />
<br />
SSH tunneling is necessary in certain cases, because compute nodes on [[Niagara Quickstart|Niagara]] do not have direct access to<br />
the internet, nor can the compute nodes be contacted directly from the internet.<br />
<br />
The following use cases require SSH tunnels:<br />
<br />
# Running commercial software on a compute node that needs to contact a license server over the internet.<br />
# Running visualization software on a compute node that needs to be contacted by client software on a user's local computer.<br />
# Running a Jupyter notebook on a compute node that needs to be contacted by the web browser on a user's local computer.<br />
# Connecting to cedar database server from somewhere other than cedar head node, e.g., your desktop<br />
<br />
In the first case, the license server is situated outside of<br />
the compute cluster and is rarely under a user's control, whereas<br />
in the other cases, the server is on the compute node but the<br />
challenge is to connect to it from the outside. We will therefore<br />
consider these two kind of cases separately.<br />
<br />
= Contacting a license server from a compute node =<br />
<br />
Certain commercially-licensed programs must connect to a license server machine <br />
somewhere on the internet via a predetermined port. If the compute node where <br />
the program is running has no access to the internet, then a ''gateway server'' <br />
which does have access must be used to forward communications, on that port, <br />
from the compute node to the license server. To enable this one must set up <br />
an ''SSH tunnel''. Such an arrangement is also called ''port forwarding''.<br />
<br />
In most cases, creating an SSH tunnel in a batch job requires just two or <br />
three commands in your job script. You will need the following information:<br />
<br />
# The IP address, or the name, of the license server. Let's call this LICSERVER.<br />
# The port number of the license service. Let's call this LICPORT. <br />
<br />
You should obtain this information from whoever maintains the license server.<br />
That server also must allow connections from the login nodes; for<br />
Niagara, the outgoing IP address will either be 142.1.174.227 or 142.1.174.228.<br />
<br />
With this information, one can now setup the SSH tunnel. <br />
<br />
The gateway server on Niagara is called nia-gw. <br />
<br />
You need to choose the port number on the<br />
compute node to use. Let's call the latter COMPUTEPORT.<br />
<br />
The ssh command to issue in the job script is then:<br />
<br />
<source lang="bash"><br />
ssh nia-gw -L COMPUTEPORT:LICSERVER:LICPORT -n -N -f<br />
</source><br />
<br />
In this command, the string following the -L parameter specifies the port forwarding information, the parameter -n prevents ssh to read input (it couldn't in a compute job anyway), the<br />
parameter -N tells ssh not to open a shell on the GATEWAY, and the<br />
parameter -f tells ssh to run in the background, allowing the job<br />
script to proceed past this ssh command.<br />
<br />
A further command to add to the job script should tell the software<br />
that the license server is on port COMPUTEPORT on the server<br />
'localhost'. Here, 'localhost' is not a placeholder, rather, it is the literal name<br />
to use - 'localhost' is a standard host name pseudonym by which a<br />
computer can refer to itself. Exactly how to inform your software to use this port on 'localhost' will<br />
depend on the specific application and the type of license server,<br />
but often it is simply a matter of setting an environment variable in<br />
the job script like<br />
<br />
<source lang="bash"><br />
export MLM_LICENSE_FILE=COMPUTEPORT@localhost<br />
</source><br />
<br />
== Example job script==<br />
<br />
The following job script sets up an ssh tunnel to contact a<br />
license server licenseserver.institution.ca at port 9999:<br />
<br />
<source lang="bash"><br />
#!/bin/bash<br />
#SBATCH --nodes 1<br />
#SBATCH --ntasks 40<br />
#SBATCH --time 3:00:00<br />
<br />
ssh nia-gw -L 9999:licenseserver.institution.ca:9999 -N -f<br />
export MLM_LICENSE_FILE=9999@localhost<br />
<br />
module load thesoftware/2.0<br />
mpirun thesoftware ..... <br />
</source><br />
<br />
= Contacting a visualization, Jupyterhub, database or other server running on compute node=<br />
<br />
SSH tunnelling can also be used in the context of Compute Canada to allow a user's computer to connect to a compute node on a cluster through an encrypted tunnel that is routed via the login node of this cluster. This technique allows graphical output of applications like a Jupyter notebook or visualization software to be displayed transparently on the user's local workstation even while they are running on a compute node of a cluster. In case of connecting to a database server where the connection is possible though the head node only the SSH tunneling can be used to move an arbitrary port number of a compute network to head node of a cluster and bind it to the database server.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Rouge&diff=3017Rouge2021-05-07T20:02:59Z<p>Bmundim: /* Single-GPU job script */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[File:Amd1.jpeg|center|300px|thumb]] <br />
|name=Rouge<br />
|installed=March 2021<br />
|operatingsystem= Linux (Centos 7.6)<br />
|loginnode= rouge-login01<br />
|nnodes=20 <br />
|gpuspernode=8 MI50-32GB<br />
|rampernode=512 GB<br />
|corespernode=48 <br />
|interconnect=Infiniband (2xEDR)<br />
|vendorcompilers=rocm/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
= Specifications=<br />
<br />
The Rouge cluster was donated to the University of Toronto by AMD as part of their [https://www.amd.com/en/corporate/hpc-fund#:~:text=The%20goal%20of%20the%20AMD,potential%20threats%20to%20global%20health COVID-19 HPC Fund ] support program. The cluster consists of 20 x86_64 nodes each with a single AMD EPYC 7642 48-Core CPU running at 2.3GHz with 512GB of RAM and 8 Radeon Instinct MI50 GPUs per node.<br />
<br />
The nodes are interconnected with 2xHDR100 Infiniband for internode communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 960 CPU cores and 160 GPUs. <br />
<br />
Access and support requests should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
= Getting started on Rouge =<br />
<br />
<!-- <br />
Rouge can be accessed directly.<br />
<pre><br />
ssh -Y MYCCUSERNAME@rouge.scinet.utoronto.ca<br />
--><br />
<br />
<br />
Rouge login node '''rouge-login01''' can be accessed via the Niagara cluster.<br />
<pre><br />
ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca<br />
ssh -Y rouge-login01<br />
</pre><br />
<br />
== Storage ==<br />
<br />
The filesystem for Rouge is currently shared with Niagara cluster. See [https://docs.scinet.utoronto.ca/index.php/Niagara_Quickstart#Your_various_directories Niagara Storage] for more details.<br />
<br />
= Loading software modules =<br />
<br />
You have two options for running code on : use existing software, or compile your own. This section focuses on the former.<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
<br />
= Available compilers and interpreters =<br />
<br />
* The <tt>Rocm</tt> module has to be loaded first for GPU software.<br />
* To compile mpi code, you must additionally load an <tt>openmpi</tt> module.<br />
<br />
=== ROCm ===<br />
<br />
The current installed ROCm Tookit is '''4.1.0'''<br />
<pre><br />
module load rocm/<version><br />
</pre><br />
*A compiler (GCC or rocm-clang) module must be loaded in order to use ROCm to build any code.<br />
<br />
The current AMD driver version is 5.9.15. Use '''rocm-smi -a''' for full details.<br />
<br />
===GNU Compilers ===<br />
<br />
Available GCC modules are:<br />
<pre><br />
gcc/10.3.0<br />
rocm-clang/4.1.0<br />
hipify-clang/12.0.0<br />
aocc/3.0.0<br />
</pre><br />
<br />
=== OpenMPI ===<br />
<tt>openmpi/<version></tt> module is avaiable with differentcompilers.<br />
<br />
= Software =<br />
<br />
== Singularity Containers ==<br />
<pre><br />
/scinet/rouge/amd/containers/gromacs.rocm401.ubuntu18.sif<br />
/scinet/rouge/amd/containers/lammps.rocm401.ubuntu18.sif<br />
/scinet/rouge/amd/containers/namd.rocm401.ubuntu18.sif<br />
/scinet/rouge/amd/containers/openmm.rocm401.ubuntu18.sif<br />
</pre><br />
<br />
== GROMACS ==<br />
The HIP version of GROMACS 2020.3 (better performance than OpenCL version) is provided by AMD in a container. Currently it is suggested to use a single GPU for all simulations.<br />
Job example:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --time=1:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --gpus-per-node=1<br />
<br />
export SINGULARITY_HOME=$SLURM_SUBMIT_DIR<br />
<br />
singularity exec -B /home -B /scratch --env OMP_PLACES=cores /scinet/rouge/amd/containers/gromacs.rocm401.ubuntu18.sif gmx mdrun -pin off -ntmpi 1 -ntomp 6 ......<br />
<br />
# setting '-ntomp 4' might give better performance, do your own benchmark. not recommended to set larger than 6 for single GPU job<br />
# if you worry about 'GPU update with domain decomposition lacks substantial testing and should be used with caution.' warning message (if there is any), add '-update cpu' to override<br />
</pre><br />
<br />
== NAMD ==<br />
The HIP version of NAMD (3.0a) is provided by AMD in a container. Currently it is suggested to use a single GPU for all simulations.<br />
Job example:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --time=1:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --gpus-per-node=1<br />
<br />
export SINGULARITY_HOME=$SLURM_SUBMIT_DIR<br />
<br />
singularity exec -B /home -B /scratch --env LD_LIBRARY_PATH=/opt/rocm/lib:/.singularity.d/libs /scinet/rouge/amd/containers/namd.rocm401.ubuntu18.sif namd2 +idlepoll +p 12 stmv.namd<br />
# do not set +p flag larger than 12, there are only 6 cores (12 threads) per single GPU job.<br />
</pre><br />
<br />
== PyTorch ==<br />
Install PyTorch into a python virtual environment:<br />
<pre><br />
module load python gcc<br />
mkdir -p ~/.virtualenvs<br />
virtualenv --system-site-packages ~/.virtualenvs/pytorch-rocm<br />
source ~/.virtualenvs/pytorch-rocm/bin/activate<br />
pip3 install torch -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html<br />
pip3 install ninja && pip3 install 'git+https://github.com/pytorch/vision.git@v0.9.1'<br />
</pre><br />
Run PyTorch job with single GPU:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --time=1:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --gpus-per-node=1<br />
<br />
module load python gcc<br />
source ~/.virtualenvs/pytorch-rocm/bin/activate<br />
python code.py<br />
</pre><br />
<br />
= Testing and debugging =<br />
<br />
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.<br />
* Small test jobs can be run on the login node. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than one gpu and a few cores.<br />
<br />
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:<br />
<br />
<pre><br />
rouge-login01:~$ debugjob --clean -g G=1<br />
</pre> <br />
<br />
where G is the number of gpus. If G=1, this gives an interactive session for 2 hours, whereas G=4 gets you a node with 4 gpus for 30 minutes, and with G=8 (the maximum) gets you a full node with 8 gpus for 30 minutes. The <tt>--clean</tt> argument is optional but recommended as it will start the session without any modules loaded, thus mimicking more closely what happens when you submit a job script.<br />
<br />
= Submitting jobs =<br />
Once you have compiled and tested your code or workflow on the Rouge login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on one of Rouge's 20 compute nodes. When and where your job runs is determined by the scheduler.<br />
<br />
Rouge uses SLURM as its job scheduler. <br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
<pre><br />
rouge-login01:scratch$ sbatch jobscript.sh<br />
</pre><br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course. In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
Example job scripts can be found below.<br />
Keep in mind:<br />
* Scheduling is by gpu each with 6 CPU cores.<br />
* Your job's maximum walltime is 24 hours. <br />
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access.<br />
* Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below).<br />
<br />
== Single-GPU job script ==<br />
For a single GPU job, each will have a 1/8 of the node which is 1 GPU + 6/12 CPU Cores/Threads + ~64GB CPU memory. '''Users should never ask CPU or Memory explicitly.''' If running MPI program, user can set --ntasks to be the number of MPI ranks. '''Do NOT set --ntasks for non-MPI programs.''' <br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --gpus-per-node=1<br />
#SBATCH --time=1:00:0<br />
<br />
module load <modules you need><br />
Run your program<br />
</pre><br />
<br />
== Full-node job script ==<br />
'''If you are not sure the program can be executed on multiple GPUs, please follow the single-gpu job instruction above or contact SciNet support.'''<br />
<br />
Multi-GPU job should ask for a minimum of one full node (8 GPUs). User need to specify "compute_full_node" partition in order to get all resource on a node. <br />
*An example for a 1-node job:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --gpus-per-node=8<br />
#SBATCH --ntasks=8 #this only affects MPI job<br />
#SBATCH --time=1:00:00<br />
#SBATCH -p compute_full_node<br />
<br />
module load <modules you need><br />
Run your program<br />
</pre></div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Rouge&diff=3016Rouge2021-05-07T19:59:30Z<p>Bmundim: /* Full-node job script */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[File:Amd1.jpeg|center|300px|thumb]] <br />
|name=Rouge<br />
|installed=March 2021<br />
|operatingsystem= Linux (Centos 7.6)<br />
|loginnode= rouge-login01<br />
|nnodes=20 <br />
|gpuspernode=8 MI50-32GB<br />
|rampernode=512 GB<br />
|corespernode=48 <br />
|interconnect=Infiniband (2xEDR)<br />
|vendorcompilers=rocm/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
= Specifications=<br />
<br />
The Rouge cluster was donated to the University of Toronto by AMD as part of their [https://www.amd.com/en/corporate/hpc-fund#:~:text=The%20goal%20of%20the%20AMD,potential%20threats%20to%20global%20health COVID-19 HPC Fund ] support program. The cluster consists of 20 x86_64 nodes each with a single AMD EPYC 7642 48-Core CPU running at 2.3GHz with 512GB of RAM and 8 Radeon Instinct MI50 GPUs per node.<br />
<br />
The nodes are interconnected with 2xHDR100 Infiniband for internode communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 960 CPU cores and 160 GPUs. <br />
<br />
Access and support requests should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
= Getting started on Rouge =<br />
<br />
<!-- <br />
Rouge can be accessed directly.<br />
<pre><br />
ssh -Y MYCCUSERNAME@rouge.scinet.utoronto.ca<br />
--><br />
<br />
<br />
Rouge login node '''rouge-login01''' can be accessed via the Niagara cluster.<br />
<pre><br />
ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca<br />
ssh -Y rouge-login01<br />
</pre><br />
<br />
== Storage ==<br />
<br />
The filesystem for Rouge is currently shared with Niagara cluster. See [https://docs.scinet.utoronto.ca/index.php/Niagara_Quickstart#Your_various_directories Niagara Storage] for more details.<br />
<br />
= Loading software modules =<br />
<br />
You have two options for running code on : use existing software, or compile your own. This section focuses on the former.<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
<br />
= Available compilers and interpreters =<br />
<br />
* The <tt>Rocm</tt> module has to be loaded first for GPU software.<br />
* To compile mpi code, you must additionally load an <tt>openmpi</tt> module.<br />
<br />
=== ROCm ===<br />
<br />
The current installed ROCm Tookit is '''4.1.0'''<br />
<pre><br />
module load rocm/<version><br />
</pre><br />
*A compiler (GCC or rocm-clang) module must be loaded in order to use ROCm to build any code.<br />
<br />
The current AMD driver version is 5.9.15. Use '''rocm-smi -a''' for full details.<br />
<br />
===GNU Compilers ===<br />
<br />
Available GCC modules are:<br />
<pre><br />
gcc/10.3.0<br />
rocm-clang/4.1.0<br />
hipify-clang/12.0.0<br />
aocc/3.0.0<br />
</pre><br />
<br />
=== OpenMPI ===<br />
<tt>openmpi/<version></tt> module is avaiable with differentcompilers.<br />
<br />
= Software =<br />
<br />
== Singularity Containers ==<br />
<pre><br />
/scinet/rouge/amd/containers/gromacs.rocm401.ubuntu18.sif<br />
/scinet/rouge/amd/containers/lammps.rocm401.ubuntu18.sif<br />
/scinet/rouge/amd/containers/namd.rocm401.ubuntu18.sif<br />
/scinet/rouge/amd/containers/openmm.rocm401.ubuntu18.sif<br />
</pre><br />
<br />
== GROMACS ==<br />
The HIP version of GROMACS 2020.3 (better performance than OpenCL version) is provided by AMD in a container. Currently it is suggested to use a single GPU for all simulations.<br />
Job example:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --time=1:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --gpus-per-node=1<br />
<br />
export SINGULARITY_HOME=$SLURM_SUBMIT_DIR<br />
<br />
singularity exec -B /home -B /scratch --env OMP_PLACES=cores /scinet/rouge/amd/containers/gromacs.rocm401.ubuntu18.sif gmx mdrun -pin off -ntmpi 1 -ntomp 6 ......<br />
<br />
# setting '-ntomp 4' might give better performance, do your own benchmark. not recommended to set larger than 6 for single GPU job<br />
# if you worry about 'GPU update with domain decomposition lacks substantial testing and should be used with caution.' warning message (if there is any), add '-update cpu' to override<br />
</pre><br />
<br />
== NAMD ==<br />
The HIP version of NAMD (3.0a) is provided by AMD in a container. Currently it is suggested to use a single GPU for all simulations.<br />
Job example:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --time=1:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --gpus-per-node=1<br />
<br />
export SINGULARITY_HOME=$SLURM_SUBMIT_DIR<br />
<br />
singularity exec -B /home -B /scratch --env LD_LIBRARY_PATH=/opt/rocm/lib:/.singularity.d/libs /scinet/rouge/amd/containers/namd.rocm401.ubuntu18.sif namd2 +idlepoll +p 12 stmv.namd<br />
# do not set +p flag larger than 12, there are only 6 cores (12 threads) per single GPU job.<br />
</pre><br />
<br />
== PyTorch ==<br />
Install PyTorch into a python virtual environment:<br />
<pre><br />
module load python gcc<br />
mkdir -p ~/.virtualenvs<br />
virtualenv --system-site-packages ~/.virtualenvs/pytorch-rocm<br />
source ~/.virtualenvs/pytorch-rocm/bin/activate<br />
pip3 install torch -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html<br />
pip3 install ninja && pip3 install 'git+https://github.com/pytorch/vision.git@v0.9.1'<br />
</pre><br />
Run PyTorch job with single GPU:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --time=1:00:00<br />
#SBATCH --nodes=1<br />
#SBATCH --gpus-per-node=1<br />
<br />
module load python gcc<br />
source ~/.virtualenvs/pytorch-rocm/bin/activate<br />
python code.py<br />
</pre><br />
<br />
= Testing and debugging =<br />
<br />
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.<br />
* Small test jobs can be run on the login node. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than one gpu and a few cores.<br />
<br />
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:<br />
<br />
<pre><br />
rouge-login01:~$ debugjob --clean -g G=1<br />
</pre> <br />
<br />
where G is the number of gpus. If G=1, this gives an interactive session for 2 hours, whereas G=4 gets you a node with 4 gpus for 30 minutes, and with G=8 (the maximum) gets you a full node with 8 gpus for 30 minutes. The <tt>--clean</tt> argument is optional but recommended as it will start the session without any modules loaded, thus mimicking more closely what happens when you submit a job script.<br />
<br />
= Submitting jobs =<br />
Once you have compiled and tested your code or workflow on the Rouge login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on one of Rouge's 20 compute nodes. When and where your job runs is determined by the scheduler.<br />
<br />
Rouge uses SLURM as its job scheduler. <br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
<pre><br />
rouge-login01:scratch$ sbatch jobscript.sh<br />
</pre><br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course. In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
Example job scripts can be found below.<br />
Keep in mind:<br />
* Scheduling is by gpu each with 6 CPU cores.<br />
* Your job's maximum walltime is 24 hours. <br />
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access.<br />
* Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below).<br />
<br />
== Single-GPU job script ==<br />
For a single GPU job, each will have a 1/8 of the node which is 1 GPU + 6/48 CPU Cores/Threads + ~64GB CPU memory. '''Users should never ask CPU or Memory explicitly.''' If running MPI program, user can set --ntasks to be the number of MPI ranks. '''Do NOT set --ntasks for non-MPI programs.''' <br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --gpus-per-node=1<br />
#SBATCH --time=1:00:0<br />
<br />
module load <modules you need><br />
Run your program<br />
</pre><br />
<br />
== Full-node job script ==<br />
'''If you are not sure the program can be executed on multiple GPUs, please follow the single-gpu job instruction above or contact SciNet support.'''<br />
<br />
Multi-GPU job should ask for a minimum of one full node (8 GPUs). User need to specify "compute_full_node" partition in order to get all resource on a node. <br />
*An example for a 1-node job:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --gpus-per-node=8<br />
#SBATCH --ntasks=8 #this only affects MPI job<br />
#SBATCH --time=1:00:00<br />
#SBATCH -p compute_full_node<br />
<br />
module load <modules you need><br />
Run your program<br />
</pre></div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Installing_your_own_Python_Modules&diff=2717Installing your own Python Modules2020-07-16T16:26:32Z<p>Bmundim: /* Usage of your virtual environment by others */ typo</p>
<hr />
<div>There are many optional and conflicting packages for Python that users could potentially want (see e.g. http://pypi.python.org/pypi). Therefore, users need to install these additional packages locally in their home directories. In fact, there is no choice, as users do not have permissions to install packages system-wide.<br />
<br />
Python provides a number of ways to install packages, the most common of which are the <tt>pip</tt> and <tt>conda</tt> commands. By default, these commands would install in the same directory as the one in which the python executable lives, <br />
but python provides a number of ways for users to install libraries in their home directories instead. <br />
<br />
One way to do this with <tt>pip</tt> using the <tt>--user</tt> option, but that approach is now mostly superseded by virtual environments.<br />
<br />
Virtual environments are a standard in Python to create isolated Python environments. This is useful when certain modules or certain versions of modules are not available in the default python environment.<br />
<br />
Virtual environments can be used either with the [[Python#Regular_Python | regular python modules]] or the [[Python#Intel_Python | intelpython/anaconda]] modules.<br />
<br />
== Using Virtualenv in Regular Python ==<br />
<br />
===Creation===<br />
First load a python module:<br />
<br />
module load NiaEnv/2019b python/3.6.8<br />
<br />
Then create a directory for the virtual environments.<br />
One can put a virtual environment anywhere, but this directory structure is recommended:<br />
<br />
mkdir ~/.virtualenvs<br />
cd ~/.virtualenvs<br />
<br />
Now we create our first virtualenv called <code>myEnv</code> choose any name you like:<br />
<br />
virtualenv --system-site-packages ~/.virtualenvs/myenv<br />
<br />
The "--system-site-packages" flag will use the system-installed versions of packages rather than installing them anew (the list of these packages can be found on the [[Python]] wiki page). This will result in fewer files created in your virtual environment. After that you can activate that virtual environment:<br />
<br />
source ~/.virtualenvs/myenv/bin/activate <br />
<br />
As you are in the virtualenv now, you can just type <code>pip install <required module></code> to install any module into your virtual environment. <br />
<br />
To go back to the normal python installation simply type <br />
<br />
deactivate<br />
<br />
===Command line and job usage===<br />
<br />
You need to activate the appropriate environment every time you log in, and at the start of all your jobs scripts. However, the installation of packages only needs to be done once. In the NiaEnv/2019b stack, it is *not* necessary to load the python module before activating the environment, while in the NiaEnv/2018a stack, you need to load the python module before activating the environment. <br />
<br />
===Usage of your virtual environment by others===<br />
<br />
Sharing a virtual environment with another user is easy. As long as the directory containing the virtual environment is readable by that other user (which on Niagara is the default when that user is in the same group as the directory), then they simply have to source the activate file in the bin directory of that environment, e.g.<br />
<br />
source /home/g/group/user/.virtualenvs/myenv/bin/activate<br />
<br />
===Usage in the Jupyter Hub===<br />
<br />
You can use your virtual environment in Niagara's [[Jupyter_Hub]], but there are two additional steps required to get the JupterHub to know about your environment and to make it as one of its possible "kernels" for new notebooks.<br />
<br />
After having activated your environment, execute the following two commands<br />
<br />
pip install ipykernel<br />
python -m ipykernel install --name NAME --user<br />
venv2jup<br />
<br />
The first installs the packages needed to interface with jupyter as a kernel, the latter puts an entry in the <tt>.share/jupyter</tt> directory, in which the jupyterhub looks for possible kernels. The final command corrects some paths and checks if all is setup properly. This procedure works for NiaEnv/2019b, but may fail for NiaEnv/2018a.<br />
<br />
For conda environments that were installed in .conda/venv, the jupyter notebook should pick them up automatically.<br />
<br />
== Using Virtual Environments in Intelpython/Anaconda ==<br />
<br />
===Creation===<br />
<br />
One can use the same kind of virtual environments for the intelpython and conda modules as for regular modules. However,<br />
environments are built-in in Anaconda, see [https://conda.io/docs/user-guide/tasks/manage-environments.html]. These "conda environments" are not the same as regular virtual environments, as they can contain general packages, such as compilers. The latter feature means that conda environments are much more flexible, but also that they do not cooperate well with other software modules on Niagara.<br />
<br />
First, you just need to load a conda-like module, e.g.<br />
<br />
module load NiaEnv/2019b intelpython3<br />
<br />
Then, you create a virtual environment<br />
<br />
conda create -n myPythonEnv python=3.6<br />
<br />
(conda puts the environment in the directory <tt>$HOME/.conda/venv/myPythonEnv</tt>)<br />
<br />
Next, you activate your conda environment:<br />
<br />
source activate myPythonEnv<br />
<br />
At this point you are in your own environment and can just do the installation of any package that you need, e.g.<br />
<br />
pip install myFAVpackage<br />
<br />
or<br />
conda install myFAVpackage<br />
<br />
To go back to the normal python installation, type <br />
<br />
source deactivate<br />
<br />
===Command line and job usage===<br />
<br />
You need to load the intelpython/anaconda module and activate the appropriate environment every time you log in, and at the start of all your jobs scripts. However, the installation of packages only needs to be done once. <br />
<br />
===Usage in the Jupyter Hub===<br />
<br />
You can use conda environment in Niagara's [[Jupyter_Hub]]. If they were installed in .conda/venv, the jupyter notebook should pick them up automatically.<br />
<br />
==Installing the Scientific Python Suite==<br />
<br />
For many scientific codes the packages ''numpy'', ''scipy'', ''matplotlib'', ''pandas'' and ''ipython'' are used. Versions of these are already in the python modules (except for the regular python modules in the NiaEnv/2018a stack).<br />
<br />
However, if you need different versions, you could start your virtual environment without <tt>--system-site-packages</tt>. In that case, for regular python modules, please install versions of package with an <tt>intel-</tt> prefix, if they exists, so that you will get the most optimized version of the package.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Installing_your_own_Python_Modules&diff=2716Installing your own Python Modules2020-07-16T16:21:25Z<p>Bmundim: /* Creation */ typo</p>
<hr />
<div>There are many optional and conflicting packages for Python that users could potentially want (see e.g. http://pypi.python.org/pypi). Therefore, users need to install these additional packages locally in their home directories. In fact, there is no choice, as users do not have permissions to install packages system-wide.<br />
<br />
Python provides a number of ways to install packages, the most common of which are the <tt>pip</tt> and <tt>conda</tt> commands. By default, these commands would install in the same directory as the one in which the python executable lives, <br />
but python provides a number of ways for users to install libraries in their home directories instead. <br />
<br />
One way to do this with <tt>pip</tt> using the <tt>--user</tt> option, but that approach is now mostly superseded by virtual environments.<br />
<br />
Virtual environments are a standard in Python to create isolated Python environments. This is useful when certain modules or certain versions of modules are not available in the default python environment.<br />
<br />
Virtual environments can be used either with the [[Python#Regular_Python | regular python modules]] or the [[Python#Intel_Python | intelpython/anaconda]] modules.<br />
<br />
== Using Virtualenv in Regular Python ==<br />
<br />
===Creation===<br />
First load a python module:<br />
<br />
module load NiaEnv/2019b python/3.6.8<br />
<br />
Then create a directory for the virtual environments.<br />
One can put a virtual environment anywhere, but this directory structure is recommended:<br />
<br />
mkdir ~/.virtualenvs<br />
cd ~/.virtualenvs<br />
<br />
Now we create our first virtualenv called <code>myEnv</code> choose any name you like:<br />
<br />
virtualenv --system-site-packages ~/.virtualenvs/myenv<br />
<br />
The "--system-site-packages" flag will use the system-installed versions of packages rather than installing them anew (the list of these packages can be found on the [[Python]] wiki page). This will result in fewer files created in your virtual environment. After that you can activate that virtual environment:<br />
<br />
source ~/.virtualenvs/myenv/bin/activate <br />
<br />
As you are in the virtualenv now, you can just type <code>pip install <required module></code> to install any module into your virtual environment. <br />
<br />
To go back to the normal python installation simply type <br />
<br />
deactivate<br />
<br />
===Command line and job usage===<br />
<br />
You need to activate the appropriate environment every time you log in, and at the start of all your jobs scripts. However, the installation of packages only needs to be done once. In the NiaEnv/2019b stack, it is *not* necessary to load the python module before activating the environment, while in the NiaEnv/2018a stack, you need to load the python module before activating the environment. <br />
<br />
===Usage of your virtual environment by others===<br />
<br />
Sharing a virtual environment with another user is easy. As long as the directory containing the virtual environment is readable by that other user (which on Niagara is the default when that user is in the same group as the directory), then they simple have to source the activate file in the bin directory of that environment, e.g.<br />
<br />
source /home/g/group/user/.virtualenvs/myenv/bin/activate <br />
<br />
===Usage in the Jupyter Hub===<br />
<br />
You can use your virtual environment in Niagara's [[Jupyter_Hub]], but there are two additional steps required to get the JupterHub to know about your environment and to make it as one of its possible "kernels" for new notebooks.<br />
<br />
After having activated your environment, execute the following two commands<br />
<br />
pip install ipykernel<br />
python -m ipykernel install --name NAME --user<br />
venv2jup<br />
<br />
The first installs the packages needed to interface with jupyter as a kernel, the latter puts an entry in the <tt>.share/jupyter</tt> directory, in which the jupyterhub looks for possible kernels. The final command corrects some paths and checks if all is setup properly. This procedure works for NiaEnv/2019b, but may fail for NiaEnv/2018a.<br />
<br />
For conda environments that were installed in .conda/venv, the jupyter notebook should pick them up automatically.<br />
<br />
== Using Virtual Environments in Intelpython/Anaconda ==<br />
<br />
===Creation===<br />
<br />
One can use the same kind of virtual environments for the intelpython and conda modules as for regular modules. However,<br />
environments are built-in in Anaconda, see [https://conda.io/docs/user-guide/tasks/manage-environments.html]. These "conda environments" are not the same as regular virtual environments, as they can contain general packages, such as compilers. The latter feature means that conda environments are much more flexible, but also that they do not cooperate well with other software modules on Niagara.<br />
<br />
First, you just need to load a conda-like module, e.g.<br />
<br />
module load NiaEnv/2019b intelpython3<br />
<br />
Then, you create a virtual environment<br />
<br />
conda create -n myPythonEnv python=3.6<br />
<br />
(conda puts the environment in the directory <tt>$HOME/.conda/venv/myPythonEnv</tt>)<br />
<br />
Next, you activate your conda environment:<br />
<br />
source activate myPythonEnv<br />
<br />
At this point you are in your own environment and can just do the installation of any package that you need, e.g.<br />
<br />
pip install myFAVpackage<br />
<br />
or<br />
conda install myFAVpackage<br />
<br />
To go back to the normal python installation, type <br />
<br />
source deactivate<br />
<br />
===Command line and job usage===<br />
<br />
You need to load the intelpython/anaconda module and activate the appropriate environment every time you log in, and at the start of all your jobs scripts. However, the installation of packages only needs to be done once. <br />
<br />
===Usage in the Jupyter Hub===<br />
<br />
You can use conda environment in Niagara's [[Jupyter_Hub]]. If they were installed in .conda/venv, the jupyter notebook should pick them up automatically.<br />
<br />
==Installing the Scientific Python Suite==<br />
<br />
For many scientific codes the packages ''numpy'', ''scipy'', ''matplotlib'', ''pandas'' and ''ipython'' are used. Versions of these are already in the python modules (except for the regular python modules in the NiaEnv/2018a stack).<br />
<br />
However, if you need different versions, you could start your virtual environment without <tt>--system-site-packages</tt>. In that case, for regular python modules, please install versions of package with an <tt>intel-</tt> prefix, if they exists, so that you will get the most optimized version of the package.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Python&diff=2714Python2020-07-16T16:10:22Z<p>Bmundim: /* Intel Python */ typo</p>
<hr />
<div>[http://www.python.org/ Python] is programing language that continues to grow in popularity for scientific computing. It is very fast to write code in, but the software that results is much much slower than C or Fortran; one should be wary of doing too much compute-intensive work in Python. <br />
__FORCETOC__ <br />
<br />
= Python on Niagara =<br />
<br />
We currently have two families of Python installed on [[Niagara_Quickstart|Niagara]]. <br />
<br />
* Regular Python<br />
* Intel Python (a variant of anaconda)<br />
<br />
Here we describe the differences between these packages.<br />
<br />
Note that it is highly recommended that you use the NiaEnv/2019b stack by loading the corresponding module, ie.:<br />
<br />
module load NiaEnv/2019b<br />
<br />
If you do not, you are using the 2018a stack whose python setup is less optimal.<br />
<br />
== Regular Python ==<br />
<br />
Python versions 2.7 and 3.6 have been installed from source and are optimized for Niagara. We call these 'regular' python versions because they are not dependent on other distribution mechanisms like (ana)conda. Such distributions do not play well with the rest of the software stack, so the 'regular' python modules should be your first choice.<br />
<br />
In the [[Niagara_Quickstart#Software_stacks:_NiaEnv_and_CCEnv | Niagara Software Stack]] version 2019b, i.e., NiaEnv/2019b, the specific versions are 2.7.15 and 3.6.8, so you can load python 2 or python 3 using<br />
<br />
module load python/2.7.15<br />
module load python/3.6.8<br />
<br />
Both these installations come with the following optimized python packages preinstalled:<br />
<br />
virtualenv<br />
intel-numpy<br />
intel-scipy<br />
intel-scikit-learn<br />
ipp<br />
daal<br />
jinja2<br />
cython <br />
matplotlib<br />
ipython<br />
numba<br />
numexpr<br />
pandas<br />
line_profiler<br />
memory_profiler<br />
funcsigs<br />
pycosat<br />
pyeditline<br />
pyOpenSSL<br />
PySocks<br />
PyYAML<br />
requests <br />
xgboost<br />
<br />
In this list, a <tt>intel-PACKAGE</tt> package provides an Intel-optimized version of <tt>PACKAGE</tt>, often using Intel's high performance Math Kernel Library. You use these package in python the same way you would non-optimized versions, i.e., <tt>import PACKAGE</tt>.<br />
<br />
In the previous NiaEnv/2018a stack, the regular python versions did not have these packages, and users needed to install them in their own home directory. This was wasteful in terms of storage and has occasional led to quota issues, so we highly recommend using the NiaEnv/2019b packages.<br />
<br />
<br />
Additional packages in these module should be installed in virtual environments.<br />
<br />
== Intel Python ==<br />
<br />
The Intel Python modules are based on the Anaconda package, a python distribution that aims to simplify package management. Intel has modified the package, and optimized the libraries to use the MKL libraries, which should make them faster than the Anaconda modules for some calculations. These modifications have also been incorporated in the intel-<tt>PACKAGES</tt> included in the regular python modules discussed above, but with Intel Python, you also get the conda command. You can load the python 2 version or the python 3 version of intel python with<br />
<br />
module load intelpython2<br />
module load intelpython3<br />
<br />
Packages in this module can be installed in so-called conda environments (see below), although virtualenv also works. <br />
<br />
A word of caution: conda environment are very wasteful when it comes to the number of files that they store in your home directory, and there is a good chance you will hit your quote of 250,000 files with only a few conda environments. And conda being a package manager on its own means that it does not always work well in combination with the rest of the software stack.<br />
<br />
== Miniconda and Anaconda ==<br />
<br />
If your are looking for anaconda or miniconda, you should find that intelpython is a good substitute. In the NiaEnv/2019b stack, we no longer provide anaconda modules, but we do have aliases conda2 and conda3 for intelpython2 and intelpython3. <br />
<br />
We advice against installing your own anaconda or miniconda in your home directory. Instead, start from one of the intelpython modules and use conda environments, or, even better, start from a regular python module and create a virtualenv in which you can install your own packages. Installing your own anaconda or miniconda would cause many more files to be installed in your $HOME directory, and this might cause trouble with the quota on the number of files.<br />
<br />
= Installing your own Python Modules =<br />
<br />
If you need to install your own Python modules, either in regular python or with conda, you should set up a virtual or conda environment. Visit the [[Installing your own Python Modules]] page for instructions on how to set this up.<br />
<br />
We would urge you do remove any conda or virtual environments that you are not using, to help reduce the number of files on the $HOME file system.<br />
<br />
{{:Installing your own Python Modules}}<br />
<br />
= Running serial Python jobs =<br />
<br />
As with all serial jobs, if your Python computation does not use multiple cores, you should bundle them up so the 40 cores of a node are all performing work. Examples of this can be found on [[Running_Serial_Jobs_on_Niagara|this]] page.<br />
<br />
= Using a Jupyter Notebook =<br />
<br />
You may develop your Python scripts in a Jupyter Notebook on Niagara. A node has been set aside as a Jupyter Hub. See [[Jupyter_Hub | this page]] for details on how to access that node, and develop your code.<br />
<br />
= Producing Matplotlib Figures on Niagara Compute Nodes and in Job Scripts =<br />
<br />
The conventional way of producing figures from python using matplotlib i.e., <br />
<br />
import matplotlib.pyplot as plt<br />
plt.plot(.....)<br />
plt.savefig(...)<br />
<br />
will not work on the Niagara compute nodes. The reason is that pyplot will try to open the figure in a window on the screen, but the compute nodes do not have screens or window managers. There is an easy workaround, however, that sets up a different 'backend' to matplotlib, one that does not try to open a window, as follows:<br />
<br />
import matplotlib as mpl<br />
mpl.use('Agg')<br />
import matplotlib.pyplot as plt<br />
plt.plot(.....)<br />
plt.savefig(...)<br />
<br />
It is essential that the <tt>mpl.use('Agg')</tt> command precedes the importing of pyplot.<br />
<br />
= Using mpi4py =<br />
<br />
Several of the Python installations contain mpi4py preinstalled. However, using mpi4py requires loading an MPI module. There are several combinations of compiler/MPI/python modules which can be used.<br />
<br />
== Using intelpython ==<br />
<br />
Using the either the NiaEnv/2019b or NiaEnv/2018a stack (the most-recent software stack is always recommended), the intelpython modules all have mpi4py, and should all work if an MPI module is also loaded. An example of this, using the NiaEnv/2019b stack, might be<br />
<br />
$ module load NiaEnv/2019b<br />
$ module load intel/2019u4 intelmpi/2019u4<br />
$ module load intelpython3/2019u4<br />
<br />
Other combinations of compilers (intel/gcc) or MPI module (intelmpi/openmpi) will also work with intelpython.<br />
<br />
== Using anaconda ==<br />
<br />
Under the NiaEnv/2018a stack anaconda is available as a module. This module does not come with mpi4py, but can be installed using the usual steps: <br />
<br />
$ module load gcc/7.3.0 openmpi/3.1.1<br />
$ module load anaconda3/2018.12<br />
$<br />
$ conda create -n myenv<br />
$ <br />
$ source activate myenv<br />
(myenv) $ <br />
(myenv) $ conda install mpi4py<br />
(myenv) $<br />
<br />
== Using the source Python module ==<br />
<br />
The Python module compiled from source does not come with mpi4py. If you need to use this module you will need to install mpi4py in your own storage space, preferably in a virtual environment.<br />
<br />
== Error messages ==<br />
<br />
If you get an error of this type:<br />
<br />
pml_ucx.c:285 Error: UCP worker does not support MPI_THREAD_MULTIPLE<br />
<br />
Add the following lines to your Python script, BEFORE you import the mpi4py package:<br />
<br />
import mpi4py.rc<br />
mpi4py.rc.threads = False<br />
<br />
= SciNet's Python Classes =<br />
<br />
There is a dizzying amount of documentation available for programming in Python on the [http://python.org/ Python.org webpage]. That begin said, each fall, SciNet runs two 4-week classes on using Python for research:<br />
* [https://support.scinet.utoronto.ca/education/browse.php?category=-1&search=scmp142&include=all&filter=Filter SCMP142]: Introduction to Programming with Python. This class is intended for those with little-to-no programming experience who wish to learn how to program.<br />
* [https://support.scinet.utoronto.ca/education/browse.php?category=-1&search=scmp112&include=all&filter=Filter SCMP112]: Introduction to Scientific Computing with Python. This class focusses on using Python to perform research computing.<br />
<br />
An excellent set of material for teaching scientists to program in Python is also available at the [https://v4.software-carpentry.org/python/index.html Software Carpentry homepage].</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Slurm&diff=2633Slurm2020-06-12T15:49:44Z<p>Bmundim: /* EDR/HDR Infiniband Topology */</p>
<hr />
<div>The queueing system used at SciNet is based around the [https://slurm.schedmd.com Slurm Workload Manager]. This "scheduler", Slurm, determines which jobs will be run on which compute nodes, and when. This page outlines how to submit jobs, how to interact with the scheduler, and some of the most common Slurm commands.<br />
<br />
Some common questions about the queuing system can be found on the [[FAQ]] as well.<br />
<br />
= Submitting jobs =<br />
<br />
You submit jobs from a Niagara login node. This is done by passing a script to the sbatch command:<br />
<br />
nia-login07:~$ sbatch jobscript.sh<br />
<br />
This puts the job, described by the job script, into the queue. The scheduler will will run the job on the compute nodes in due course. A typical submission script is as follows.<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
<br />
Some notes about this example:<br />
* The first line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>).<br />
* In this case, SLURM looks for 2 nodes with 40 cores on which to run 80 tasks, for 1 hour.<br />
* Note that the mpifun flag "--ppn" (processors per node) is ignored. Slurm takes care of this detail.<br />
* Once the scheduler finds a spot to run the job, it runs the script:<br />
** It changes to the submission directory;<br />
** Loads modules;<br />
** Runs the <code>mpi_example</code> application.<br />
* To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).<br />
<br />
To create a job script appropriate for your work, you must modify the commands above to instruct Slurm to run the commands you need run.<br />
<br />
== Things to remember ==<br />
<br />
There are some things to always bear in mind when crafting your submission script:<br />
* Scheduling is by node, so in multiples of 40 cores. You are expected to use all 40 cores! If you are running serial jobs, and need assistance bundling your work into multiples of 40, please see the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page.<br />
* Jobs must write to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access. Download data you need before submitting your job.<br />
* Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below).<br />
* Jobs will run under your group's RRG allocation. If your group does not have an allocation, your job will run under your group's RAS allocation (previously called `default' allocation). Note that groups with an allocation cannot run under a default allocation.<br />
* The maximum [[Wallclock_time | walltime]] for all users is 24 hours. The minimum and default walltime is 15 minutes.<br />
<br />
= Scheduling details =<br />
<br />
We now present the details of how to write a job script, and some extra commands which you might find useful.<br />
<br />
== SLURM nomenclature: jobs, nodes, tasks, cpus, cores, threads ==<br />
<br />
SLURM has a somewhat different way of referring to things like MPI processes and thread tasks, as compared to our previous scheduler, MOAB. The SLURM nomenclature is reflected in the names of scheduler options (i.e., resource requests). SLURM strictly enforces those requests, so it is important to get this right.<br />
<br />
{| class="wikitable"<br />
!term <br />
!meaning <br />
!SLURM term<br />
!related scheduler options <br />
|-<br />
|job<br />
|scheduled piece of work for which specific resources were requested.<br />
|job<br />
|<tt>sbatch, salloc</tt><br />
|-<br />
|node<br />
|basic computing component with several cores (40 for Niagara) that share memory <br />
|node<br />
|<tt>--nodes -N</tt><br />
|-<br />
|mpi process<br />
|one of a group of running programs using Message Passing Interface for parallel computing<br />
|task<br />
|<tt>--ntasks -n --ntasks-per-node</tt><br />
|-<br />
|core ''or'' physical cpu<br />
|A fully functional independent physical execution unit.<br />
| - <br />
| -<br />
|-<br />
|logical cpu<br />
|An execution unit that the operating system can assign work to. Operating systems can be configured to overload physical cores with multiple logical cpus using hyperthreading.<br />
|cpu<br />
|<tt>--cpus-per-task</tt><br />
|-<br />
|thread<br />
|one of possibly multiple simultaneous execution paths within a program, which can share memory.<br />
| -<br />
| <tt>--cpus-per-task</tt> '''and''' <tt>OMP_NUM_THREADS</tt><br />
|-<br />
|hyperthread<br />
|a thread run in a collection of threads that is larger than the number of physical cores.<br />
| -<br />
| -<br />
|}<br />
<br />
== Scheduling by Node ==<br />
<br />
* On many systems that use SLURM, the scheduler will deduce from the job script specifications (the number of tasks and the number of cpus-per-node) what resources should be allocated. On Niagara, this is a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours.<br />
** No other users are running anything on them.<br />
** You can ssh into them, while your job is running, to see how things are going.<br />
* Whatever you request of the scheduler, your request will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes. Each node has about 202GB of RAM available.<br />
* You should try to use all the cores on the nodes allocated to your job. Since there are 40 cores per node, your job should use N x 40 cores. If this is not the case, we will be contacted you to help you optimize your workflow. Again, users which have serials jobs should consult the [[Running Serial Jobs on Niagara | serial jobs]] page.<br />
<br />
== Hyperthreading: Logical CPUs vs. cores ==<br />
<br />
Hyperthreading, a technology that leverages more of the physical hardware by pretending there are twice as many logical cores than real cores, is enabled on Niagara.<br />
The operating system and scheduler see 80 logical CPUs.<br />
<br />
Using 80 logical CPUs versus 40 real cores typically gives about a 5-10% speedup, depending on your application (your mileage may vary).<br />
<br />
Because Niagara is scheduled by node, hyperthreading is actually fairly easy to use:<br />
* Ask for a certain number of nodes, N, for your job.<br />
* You know that you get 40 x N cores, so you will use (at least) a total of 40 x N MPI processes or threads (mpirun, srun, and the OS will automaticallly spread these over the real cores).<br />
* But you should also test if running 80 x N MPI processes or threads gives you any speedup.<br />
* Regardless, your usage will be counted as 40 x N x (walltime in years).<br />
<br />
Many applications which are communication-heavy can benefit from the use of hyperthreading.<br />
<br />
= Submission script details =<br />
<br />
This section outlines some details of how to interact with the scheduler, and how it implements Niagara's scheduling policies.<br />
<br />
== Queues ==<br />
<br />
There are 3 queues available on SciNet systems. These queues have different limits; see the [[#Limits | Limits]] section for further details.<br />
<br />
=== Compute ===<br />
<br />
The compute queue is the default queue. Most jobs will run in this queue. If no flags are specified in the submission script this is the queue where your job will land.<br />
<br />
=== Debug ===<br />
<br />
The Debug queue is a high-priority queue, used for short-term testing of your code. Do NOT use the debug queue for production work. You can use the debug queue one of two ways. To submit a standard job script to the debug queue, add the line<br />
#SBATCH -p debug<br />
to your submission script. This will put the job into the debug queue, and it should run in short order.<br />
<br />
To request an interactive debug session, where you retain control over the command line prompt, at a login node type the command<br />
nia-login07:~$ salloc -p debug --nodes 1 --time=1:00:00<br />
This will request 1 node for 1 hour. You can similarly request a debug session using the 'debugjob' command:<br />
nia-login07:~$ debugjob N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.<br />
<br />
=== Archive ===<br />
<br />
The archivelong and archiveshort queues are only used by the [[HPSS]] system. See that page for details on how to use these queues.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 50 || 1000 || 1 node (40&nbsp;cores) || default:&nbsp;20&nbsp;nodes&nbsp;(800&nbsp;cores) <br> with&nbsp;allocation:&nbsp;1000&nbsp;nodes&nbsp;(40000&nbsp;cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== Slurm Accounts ==<br />
<br />
To be able to prioritise jobs based on groups and allocations, the Slurm scheduler uses the concept of ''accounts''. Each group that has a Resource for Research Groups (RRG) or Research Platforms and Portals (RPP) allocation (awarded through an annual competition by Compute Canada) has an account that starts with <tt>rrg-</tt> or <tt>rpp-</tt>. Slurm assigns a 'fairshare' priority to these accounts based on the size of the award in core-years. Groups without an RRG or RPP can use Niagara using a so-called Rapid Access Service (RAS), and have an account that starts with <tt>def-</tt>.<br />
<br />
On Niagara, most users will only ever use one account, and those users do not need to specify the account to Slurm. However, users that are part of collaborations may be able to use multiple accounts, i.e., that of their sponsor and that of their collaborator, but this mean that they need to select the right account when running jobs. <br />
<br />
To select the account, just add <br />
<br />
#SBATCH -A [account]<br />
<br />
to the job scripts, or use the <tt>-A [account]</tt> to <tt>salloc</tt> or <tt>debugjob</tt>. <br />
<br />
To see which accounts you have access to, or what their names are, use the command<br />
<br />
sshare -U<br />
<br />
It has been noted that, in some cases, using the '-A' flag does not result in the appropriate account being used. To get around this, specify the account when sbatch is invoked:<br />
sbatch -A account myjobscript.sh<br />
<br />
== Slurm environment variables ==<br />
<br />
There are many environment variables built into Slurm. These are some which you may find useful:<br />
* SLURM_SUBMIT_DIR: directory from which the job was submitted.<br />
* SLURM_SUBMIT_HOST: host from which the job was submitted.<br />
* SLURM_JOB_ID: the job's id.<br />
* SLURM_JOB_NUM_NODES: number of nodes in the job.<br />
* SLURM_JOB_NODELIST: list of nodes assigned to the job.<br />
* SLURM_JOB_ACCOUNT: account associated with the job.<br />
<br />
Any of these environment variables can be accessed from within your job script.<br />
<br />
== Passing Variables to submission scripts ==<br />
It is possible to pass values through environment variables into your SLURM submission scripts.<br />
For doing so with already defined variables in your shell, just add the following directive in the submission script,<br />
<br />
#SBATCH --export=ALL<br />
<br />
and you will have access to any predefined environment variable.<br />
<br />
A better way is to specify explicitly which variables you want to pass into the submision script,<br />
<br />
sbatch --export=i=15,j='test' jobscript.sbatch<br />
<br />
You can even set the job name and output files using environment variables, eg.<br />
<br />
i="simulation"<br />
j=14<br />
sbatch --job-name=$i.$j.run --output=$i.$j.out --export=i=$i,j=$j jobscript.sbatch<br />
<br />
(The latter only works on the command line; you cannot use environment variables in <tt>#SBATCH</tt> lines in the job script.)<br />
<br />
== Command line arguments ==<br />
<br />
Command line arguments can also be used for job script in the same way as command line argument for shell scripts. All command line arguments given to sbatch that follow after the job script name, will be passed to the job script. In fact, SLURM will not look at any of these arguments, so you must place all sbatch arguments before the script name, e.g.:<br />
<br />
sbatch -p debug jobscript.sbatch FirstArgument SecondArgument ...<br />
<br />
In this example, <tt>-p debug</tt> is interpreted by SLURM, while in your submission script you can access <tt>FirstArgument</tt>, <tt>SecondArgument</tt>, etc., by referring to <code>$1, $2, ...</code>.<br />
<br />
== Job arrays ==<br />
<br />
Sometimes you need to run the same job script many times, but just tweaking one value each time. One way of accomplishing this is using job arrays. Job arrays are invoked using the "-a" flag with sbatch:<br />
sbatch -a 1-100 myjobscript.sh<br />
This will submit 100 instances of myjobscript.sh. Within the job script you can distinguish which of those instances is running using the environment variable SLURM_ARRAY_TASK_ID.<br />
<br />
Note that Niagara [[#Limits | currently]] has a limit of 1000 submitted jobs for users within groups with allocations, and 200 submitted jobs without an allocation.<br />
<br />
== Job dependencies ==<br />
<br />
You can make one job dependent on the successful completion of another job using the following command:<br />
sbatch --dependency=afterok:JOBID myjobscript.sh<br />
This will make the current job submission not start until the parent job, with jobid JOBID, successfully completes. There are many job dependency options available. Visit the [https://slurm.schedmd.com/sbatch.html#OPT_dependency Slurm sbatch page ] for the full list. <br />
<br />
If the parent job fails (that is, ends with a non-zero exit code) the dependent job can never be scheduled and will be automatically cancelled.<br />
<br />
== Email Notification ==<br />
Email notification works, but you need to add the email address and type of notification you may want to receive in your submission script, eg.<br />
<br />
#SBATCH --mail-user=YOUR.email.ADDRESS<br />
#SBATCH --mail-type=ALL<br />
<br />
The sbatch man page (type <tt>man sbatch</tt> on Niagara) explains all possible mail-types.<br />
<br />
== Job Location Constraints ==<br />
<br />
=== Node types ===<br />
<br />
With the expansion of Niagara there are now two node types, 1548 Intel 6148 "skylake" CPU based nodes, and 468 Intel 6248 "cascadelake" CPU based nodes. By default a job will be placed on the first available nodes but will not span node types. You can specify a node type using one of the following directives to your submission script.<br />
<br />
#SBATCH --constraint=skylake <br />
#SBATCH --constraint=cascade<br />
<br />
=== EDR/HDR Infiniband Topology ===<br />
<br />
The Infiniband high speed network used for job communication and file I/O on Niagara consists of 5 1:1 subscribed "wings" that connected together in a dragonfly topology with adaptive routing enabled. 4 wings (dragonfly[1-4]) consist of EDR based skylakde nodes and dragonfly5 contains all the of HDR100 cascadelake nodes. By default multi-node jobs will run on the first available nodes which could be all within 1 wing, or span across multiple wings, but not across node types. For most scalable parallel programs the performance difference should not be very significant, however if you wish keep your jobs from spanning wings you can use the following.<br />
<br />
#SBATCH --constraint=[dragonfly1|dragonfly2|dragonfly3|dragonfly4|dragonfly5]<br />
<br />
= Monitoring jobs =<br />
<br />
There are many options available for monitoring your jobs. The most basic of which is the squeue command:<br />
<br />
nia-login07:~$ squeue -u USERNAME<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
292047 compute myjob4 username PD 0:00 4 (Priority)<br />
292048 compute myjob3 username PD 0:00 4 (Priority)<br />
266829 compute myjob2 username R 18:56:17 2 nia[1397-1398]<br />
266828 compute myjob1 username R 18:56:46 1 nia1298<br />
<br />
Here you can see that we have two running jobs ('R') and two pending jobs ('PD'). The nodes being used are listed.<br />
<br />
== Job status ==<br />
<br />
To get an estimate of when a job will start, use the command<br />
squeue --start -j JOBID<br />
Note that this is only an estimate, and tends not to be very accurate.<br />
<br />
Information about a specific job can be found using the <br />
squeue -j JOBID<br />
or alternatively<br />
scontrol show job JOBID<br />
which is more verbose.<br />
<br />
== SSHing to a node ==<br />
<br />
Once your job has started, the node belongs to you. As such you may, from a login node, SSH into the node to check the performance of your job. The first step is to find out which nodes are being used (see above). Once you have your list of nodes, you can SSH into them directly. Once there, you can run the 'top' or 'free' commands to check both CPU and memory usage.<br />
<br />
== jobperf ==<br />
<br />
The jobperf script will give you feedback on the performance of your currently-running job:<br />
nia-login07:~$ jobperf 123456<br />
----------------------------------------------------------------------------------------------------<br />
RUNNING IDLE USER MEMORY(MB) PROCESS NAMES<br />
HOSTNAME # %CPU %MEM DISK SLEEP NAME RAMDISK USED AVAIL (excl:bash,sh,ssh,sshd)<br />
----------------------------------------------------------------------------------------------------<br />
nia1013 71 6999% 0.5% 0 22 ejspence 0 15060 178017 14*gmx_mpi mpiexec slurm_script<br />
nia1014 79 7677% 0.1% 0 18 ejspence 0 14803 178274 13*gmx_mpi<br />
nia1295 79 7517% 0.4% 0 18 ejspence 0 15199 177878 13*gmx_mpi<br />
----------------------------------------------------------------------------------------------------<br />
<br />
Here you can see both the CPU and memory usage of the job, for all nodes being used.<br />
<br />
== Other commands ==<br />
<br />
Some other commands had can be useful for dealing with your jobs:<br />
* <code>scancel -i JOBID</code> cancels a specific job.<br />
* <code>sacct</code> gives information about your recent jobs.<br />
* <code>sinfo -p compute</code> gives a list of available nodes.<br />
* <code>qsum</code> gives a summary of the queue by user.<br />
<br />
= Example submission scripts =<br />
<br />
Here we present some examples of how to create submission scripts for running parallel jobs. Serial job examples can be found on the [[Running_Serial_Jobs_on_Niagara | serial jobs page]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=8<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li><p>First line indicates that this is a bash script.</p></li><br />
<li><p>Lines starting with <code>#SBATCH</code> go to SLURM.</p></li><br />
<li><p>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</p></li><br />
<li><p>In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.</p></li><br />
<li><p>Note that the mpifun flag "--ppn" (processors per node) is ignored.</p></li><br />
<li><p>Once it found such a node, it runs the script:</p><br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application.</li><br />
</ul><br />
<li>To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&diff=2296Niagara Quickstart2019-09-11T18:55:21Z<p>Bmundim: /* Logging in */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Niagara.jpg|center|300px|thumb]]<br />
|name=Niagara<br />
|installed=Jan 2018<br />
|operatingsystem= CentOS 7.4 <br />
|loginnode= niagara.scinet.utoronto.ca<br />
|nnodes= 1548 nodes (61,920 cores)<br />
|rampernode=188 GiB / 202 GB <br />
|corespernode=40 (80 hyperthreads)<br />
|interconnect=Mellanox Dragonfly+<br />
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)<br />
|queuetype=Slurm<br />
}}<br />
<br />
=Specifications=<br />
<br />
The Niagara cluster is a large cluster of 1548 Lenovo SD530 servers each with 40 Intel "Skylake" cores at 2.4 GHz. <br />
The peak performance of the cluster is 3.02 PFlops delivered / 4.75 PFlops theoretical. It was the 53rd fastest supercomputer on the [https://www.top500.org/list/2018/06/?page=1 TOP500 list of June 2018], and is at number 69 on the [https://www.top500.org/list/2019/06/?page=1 current list]. <br />
<br />
Each node of the cluster has 188 GiB / 202 GB RAM per node (at least 4 GiB/core for user jobs). Being designed for large parallel workloads, it has a fast interconnect consisting of EDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours and favours large jobs.<br />
<br />
* See the [https://support.scinet.utoronto.ca/education/go.php/370/content.php/cid/1383/ "Intro to Niagara"] recording<br />
<br />
More detailed hardware characteristics of the Niagara supercomputer can be found [https://docs.computecanada.ca/wiki/Niagara on this page].<br />
<br />
= Getting started on Niagara =<br />
<br />
Access to Niagara is not enabled automatically for everyone with a Compute Canada account, but anyone with an active Compute Canada account can get their access enabled.<br />
<br />
If you have an active Compute Canada account but you do not have access to Niagara yet (e.g. because you are new to SciNet or belong to a group whose primary PI does not have an allocation as granted in the annual [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions Compute Canada RAC]), go to the [https://ccdb.computecanada.ca/services/opt_in opt-in page on the CCDB site]. After clicking the "Join" button, it usually takes only one or two business days for access to be granted. <br />
<br />
Please read this document carefully. The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource. If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].<br />
<br />
== Logging in ==<br />
<br />
Niagara runs CentOS 7, which is a type of Linux. You will need to be familiar with Linux systems to work on Niagara. If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&search=scmp101&include=all&filter=Filter Introduction to Linux Shell] class.<br />
<br />
As with all SciNet and CC (Compute Canada) compute systems, access to Niagara is done via [[SSH]] (secure shell) only. Open a terminal window (e.g. Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Niagara login nodes with your CC credentials:<br />
<br />
$ ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca<br />
<br />
or<br />
<br />
$ ssh -Y MYCCUSERNAME@niagara.computecanada.ca<br />
<br />
The first time you login to Niagara, please make sure to check if the login node ssh host key fingerprint<br />
matches. [[SSH_Changes_in_May_2019 | See here how]].<br />
<br />
* The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.<br />
* These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.<br />
* The optional <code>-Y</code> is needed to open windows from the Niagara command-line onto your local X server.<br />
* You can only connect 4 times in a 2-minute window to the login nodes. <br />
* To run on Niagara's compute nodes, you must [[#Submitting_jobs | submit a batch job]].<br />
<br />
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.<br />
<br />
== Your various directories ==<br />
<br />
By virtue of your access to Niagara you are granted storage space on the system. There are several directories available to you, each indicated by an associated environment variable.<br />
<br />
=== home and scratch ===<br />
<br />
You have a home and scratch directory on the system, the paths to which are stored in the environment variables $HOME and $SCRATCH. The locations are of the form<br />
<br />
$HOME=/home/g/groupname/myccusername<br />
$SCRATCH=/scratch/g/groupname/myccusername<br />
<br />
where groupname is the name of your PI's group, and myccusername is your CC username. For example:<br />
<br />
nia-login07:~$ pwd<br />
/home/s/scinet/rzon<br />
nia-login07:~$ cd $SCRATCH<br />
nia-login07:rzon$ pwd<br />
/scratch/s/scinet/rzon<br />
<br />
NOTE: home is read-only on compute nodes.<br />
<br />
=== project and archive/nearline ===<br />
<br />
Users from groups with [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC storage allocation] will also have a project directory and possible an archive (a.k.a. "nearline") directory, the paths to which are stored in the environment variables $PROJECT and $ARCHIVE. They follow the naming convention:<br />
<br />
$PROJECT=/project/g/groupname/myccusername<br />
$ARCHIVE=/archive/g/groupname/myccusername<br />
<br />
NOTE: Currently archive space is available only via [[HPSS]], and is not accessible on the Niagara login, compute, or datamover nodes.<br />
<br />
'''''IMPORTANT: Future-proof your scripts'''''<br />
<br />
When writing your scripts, use the environment variables (<tt>$HOME</tt>, <tt>$SCRATCH</tt>, <tt>$PROJECT</tt>, <tt>$ARCHIVE</tt>) instead of the actual paths! The paths may change in the future.<br />
<br />
=== Storage and quotas ===<br />
<br />
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use them. This table summarizes the various file systems. See the [[Data_Management | Data Management]] page for more details.<br />
<br />
{| class="wikitable"<br />
! location<br />
!colspan="2"| quota<br />
!align="right"| block size<br />
! expiration time<br />
! backed up<br />
! on login nodes<br />
! on compute nodes<br />
|-<br />
| $HOME<br />
|colspan="2"| 100 GB / 250,000 files per user<br />
|align="right"| 1 MB<br />
| <br />
| yes<br />
| yes<br />
| read-only<br />
|-<br />
|rowspan="2"| $SCRATCH<br />
|colspan="2"| 25 TB / 6,000,000 file per user<br />
|align="right" rowspan="2" | 16 MB<br />
|rowspan="2"| 2 months<br />
|rowspan="2"| no<br />
|rowspan="2"| yes<br />
|rowspan="2"| yes<br />
|-<br />
|align="right"|50-500TB per group<br />
|align="right"|[[Data_Management#Quotas_and_purging | depending on group size]]<br />
|-<br />
| $PROJECT<br />
|colspan="2"| by group allocation<br />
|align="right"| 16 MB<br />
| <br />
| yes<br />
| yes<br />
| yes<br />
|-<br />
| $ARCHIVE<br />
|colspan="2"| by group (nearline) allocation<br />
|align="right"| <br />
|<br />
| dual-copy<br />
| no<br />
| no<br />
|-<br />
| $BBUFFER<br />
|colspan="2"| 10 TB per user<br />
|align="right"| 1 MB<br />
| very short<br />
| no<br />
| yes<br />
| yes<br />
|}<br />
<br />
=== Moving data to Niagara ===<br />
<br />
If you need to move data to Niagara for analysis, or when you need to move data off of Niagara, use the following guidelines:<br />
* If your data is less than 10GB, move the data using the login nodes.<br />
* If your data is greater than 10GB, move the data using the datamover nodes nia-datamover1.scinet.utoronto.ca and nia-datamover2.scinet.utoronto.ca .<br />
<br />
Details of how to use the datamover nodes can be found on the [[Data_Management#Moving_data | Data Management ]] page.<br />
<br />
= Loading software modules =<br />
<br />
You have two options for running code on Niagara: use existing software, or [[Niagara_Quickstart#Compiling_on_Niagara:_Example | compile your own]]. This section focuses on the former.<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
== Software stacks: NiaEnv and CCEnv ==<br />
<br />
On Niagara, there are two available software stacks:<br />
<br />
<ol style="list-style-type: decimal;"><br />
<li><p>A [https://docs.scinet.utoronto.ca/index.php/Modules_specific_to_Niagara Niagara software stack] tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with</p><br />
<code>module load NiaEnv</code><br />
This loads the 2018a 'epoch' (set of modules). A newer epoch is available by loading NiaEnv/2019b. This epoch will become the default later in 2019, but you can already make NiaEnv/2019b your default by creating a file call <tt>.modulerc</tt> in your home directory with the line "<tt>module-version NiaEnv/2019b default</tt>"</li><br />
<li><p>The same [https://docs.computecanada.ca/wiki/Modules software stack available on Compute Canada's General Purpose clusters] [https://docs.computecanada.ca/wiki/Graham Graham] and [https://docs.computecanada.ca/wiki/Cedar Cedar]:</p><br />
<code>module load CCEnv</code><br />
<p>Or, if you want the same default modules loaded as on Béluga, then do<br />
</p><p><br />
<code>module load CCEnv StdEnv</code><br/><br />
or, if you want the same default modules loaded as on Cedar and Graham, do<br/><br />
<code>module load CCEnv arch/avx2 StdEnv</code><br />
</p><br />
</li></ol><br />
No modules are loaded by default (except NiaEnv, which does not do anything except make other modules visible).<br />
<br />
== Tips for loading software ==<br />
<br />
* We advise '''''against''''' loading modules in your .bashrc. This can lead to very confusing behaviour under certain circumstances. Our guidelines for .bashrc files can be found [[bashrc guidelines|here]].<br />
* Instead, load modules by hand when needed, or by sourcing a separate script.<br />
* Load run-specific modules inside your job submission script.<br />
* Short names give default versions; e.g. <code>intel</code> → <code>intel/2018.2</code>. It is usually better to be explicit about the versions, for future reproducibility.<br />
* Modules often require other modules to be loaded first. Solve these dependencies by using [[Using_modules#Module_spider | <code>module spider</code>]].<br />
<br />
= Available compilers and interpreters =<br />
<br />
* For most compiled software, one should use the Intel compilers (<tt>icc</tt> for C, <tt>icpc</tt> for C++, and <tt>ifort</tt> for Fortran). Loading an <tt>intel</tt> module makes these available. <br />
* The GNU compiler suite (<tt>gcc, g++, gfortran</tt>) is also available, if you load one of the <tt>gcc</tt> modules.<br />
* To compile mpi code, you must additionally load an <tt>openmpi</tt> or <tt>intelmpi</tt> module.<br />
* Open source interpreted, interactive software is also available:<br />
** [[Python]]<br />
** [[R]]<br />
** Julia<br />
** [[Octave]]<br />
<br />
Please visit the corresponding page for details on using these tools. For information on running MATLAB applications on Niagara, visit [[MATLAB| this page]].<br />
<br />
= Using Commercial Software =<br />
<br />
May I use commercial software on Niagara?<br />
* Possibly, but you have to bring your own license for it. You can connect to an external license server using [[SSH_Tunneling | ssh tunneling]].<br />
* SciNet and Compute Canada have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.<br />
* Thus, the only freely available commercial software installed on Niagara is software that can benefit everyone: Compilers, math libraries and debuggers.<br />
* That means no [[MATLAB]], Gaussian, IDL, <br />
* Open source alternatives like Octave, [[Python]], and [[R]] are available.<br />
* We are happy to help you to install commercial software for which you have a license.<br />
* In some cases, if you have a license, you can use software in the Compute Canada stack.<br />
The list of commercial software which is installed on Niagara, for which you will need a license to use, can be found on the [[Commercial_software | commercial software page]].<br />
<br />
= Compiling on Niagara: Example =<br />
<br />
Suppose one wants to compile an application from two c source files, appl.c and module.c, which use the Math Kernel Library. This is an example of how this would be done:<br />
<source lang="bash"><br />
nia-login07:~$ module list<br />
Currently Loaded Modules:<br />
1) NiaEnv/2018a (S)<br />
Where:<br />
S: Module is Sticky, requires --force to unload or purge<br />
<br />
nia-login07:~$ module load intel/2018.2<br />
<br />
nia-login07:~$ ls<br />
appl.c module.c<br />
<br />
nia-login07:~$ icc -c -O3 -xHost -o appl.o appl.c<br />
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c<br />
nia-login07:~$ icc -o appl module.o appl.o -mkl<br />
<br />
nia-login07:~$ ./appl<br />
</source><br />
Note:<br />
* The optimization flags -O3 -xHost allow the Intel compiler to use instructions specific to the architecture CPU that is present (instead of for more generic x86_64 CPUs).<br />
* Linking with the Intel Math Kernel Library (MKL) is easy when using the intel compiler, it just requires the -mkl flags.<br />
* If compiling with gcc, the optimization flags would be -O3 -march=native. For the way to link with the MKL, it is suggested to use the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL link line advisor].<br />
<br />
= Testing and Debugging =<br />
<br />
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.<br />
* Small test jobs can be run on the login nodes. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than a couple of cores.<br />
* You can run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after <code>module load ddt</code>.<br />
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:<br />
nia-login07:~$ debugjob --clean N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 22 minutes. The <tt>--clean</tt> argument is optional but recommended as it will start the session without any modules loaded, thus mimicking more closely what happens when you submit a job script.<br />
<br />
Finally, if your debugjob process takes more than 1 hour, you can request an interactive job from the regular queue using the salloc command. Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.<br />
nia-login07:~$ salloc --nodes N --time=M:00:00 --x11<br />
where N is again the number of nodes, and M is the number of hours you wish the job to run.<br />
The <tt>--x11</tt> is required if you need to use graphics while testing your code through salloc, e.g. when using a debugger such as [[Parallel Debugging with DDT|DDT]] or DDD, See the [[Testing_With_Graphics | Testing with graphics]] page for the options in that case.<br />
<br />
= Submitting jobs =<br />
<br />
<!-- == Progressive approach to run jobs on niagara == --><br />
<!-- We would like to emphasize the need for users to adopt a more progressive and explicit approach for testing, running and scaling up of jobs on niagara. [[Progressive_Approach | '''Here is a set of steps we suggest that you follow.''']] --><br />
<br />
Once you have compiled and tested your code or workflow on the Niagara login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Niagara's 1548 compute nodes. When and where your job runs is determined by the scheduler.<br />
<br />
Niagara uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
nia-login07:scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
Jobs will run under your group's RRG allocation, or, if the your group has none, under a RAS allocation (previously called `default' allocation).<br />
<br />
Some example job scripts can be found below.<br />
<br />
Keep in mind:<br />
* Scheduling is by node, so in multiples of 40 cores.<br />
* Your job's maximum walltime is 24 hours. <br />
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access.<br />
* Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below). <br />
* [[Data_Management#Moving_data | Move your data]] to Niagara before you submit your job.<br />
<br />
== Scheduling by Node ==<br />
<br />
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node what resources should be allocated. On Niagara things are a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.<br />
** No other users are running anything on them.<br />
** You can [[SSH]] into them to see how things are going.<br />
* Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes and 202GB is the amount of memory on the node.<br />
* If you run serial jobs you must still use all 40 cores on the node. Visit the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page for examples of how to do this.<br />
* Since there are 40 cores per node, your job should use N x 40 cores. If you do not, we will contact you to help you optimize your workflow. Or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the job runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Limit on Running jobs<br />
!Limit on Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 50 || 1000 || 1 node (40&nbsp;cores) || default:&nbsp;20&nbsp;nodes&nbsp;(800&nbsp;cores) <br> with&nbsp;allocation:&nbsp;1000&nbsp;nodes&nbsp;(40000&nbsp;cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40&nbsp;cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== File Input/Output Tips ==<br />
<br />
It is important to understand the file systems, so as to perform your file I/O (Input/Output) responsibly. Refer to the [[Data_Management | Data Management]] page for details about the file systems.<br />
* Your files can be seen on all Niagara login and compute nodes.<br />
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.<br />
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.<br />
* Accessing data sets which consist of many, small files leads to poor performance on GPFS.<br />
* Avoid reading and writing lots of small amounts of data to disk. Many small files on the system waste space and are slower to access, read and write. If you must write many small files, use [[User_Ramdisk | ramdisk]].<br />
* Write data out in a binary format. This is faster and takes less space.<br />
* The [[Burst Buffer]] is another option for I/O heavy-jobs and for speeding up [[Checkpoints|checkpoints]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:scratch$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li>First line indicates that this is a bash script.</li><br />
<li>Lines starting with <code>#SBATCH</code> go to SLURM.</li><br />
<li>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</li><br />
<li>In this case, SLURM looks for 2 nodes each running 40 tasks (for a total of 80 tasks), for 1 hour</li><br />
<li>Note that the mpifun flag "--ppn" (processors per node) is ignored.</li><br />
<li>Once it found such a node, it runs the script:<br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application (SLURM will inform mpirun or srun on how many processes to run).<br />
</li><br />
</ul><br />
<li>To use hyperthreading, just change <code>--ntasks-per-node=40</code> to <code>--ntasks-per-node=80</code>, and add <code>--bind-to none</code> to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name=openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script from your scratch directory with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.<br />
<br />
== Monitoring queued jobs ==<br />
<br />
Once the job is incorporated into the queue, there are some command you can use to monitor its progress.<br />
<br />
<ul><br />
<li><p><code>squeue</code> or <code>sqc</code> (a caching version of squeue) to show the job queue (<code>squeue -u $USER</code> for just your jobs);</p></li><br />
<li><p><code>squeue -j JOBID</code> to get information on a specific job</p><br />
<p>(alternatively, <code>scontrol show job JOBID</code>, which is more verbose).</p></li><br />
<li><p><code>squeue --start -j JOBID</code> to get an estimate for when a job will run; these tend not to be very accurate predictions.</p></li><br />
<li><p><code>scancel -i JOBID</code> to cancel the job.</p></li><br />
<li><p><code>jobperf JOBID</code> to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.</p></li><br />
<li><p><code>sacct</code> to get information on your recent jobs.</p></li><br />
</ul><br />
<br />
Further instructions for monitoring your jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]]. The [https://my.scinet.utoronto.ca my.SciNet] site is also a very useful tool for monitoring your current and past usage.<br />
<br />
= Visualization =<br />
Information about how to use visualization tools on Niagara is available on [[Visualization]] page.<br />
<br />
= Support =<br />
<br />
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]<br />
* [mailto:niagara@computecanada.ca niagara@computecanada.ca]</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=2239Teach2019-07-02T20:14:22Z<p>Bmundim: /* Submit a Job */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with 16 cores (from two octal core Intel XeonSandybridge E5-2650 CPUs) running at 2.0GHz with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
The first time you login to Teach cluster, please make sure to check if the login node ssh key fingerprint<br />
matches. [[Teach_fingerprints | See here how]].<br />
<br />
== Interactive jobs ==<br />
<br />
The login node teach01 is shared between students of a <br />
number of different courses. Use this node to develop and compile <br />
code, to run short tests, and to submit computations to the scheduler. <br />
<br />
For a interactive sessions on a compute node of the teach cluster, use the 'debugjob' command. <br />
teach01:~$ debugjob -n C<br />
where C is the number of cores. An interactive session defaults to four hours when using at most one node (C<=16), and becomes 60 minutes when using four nodes (i.e., 48<C<=64), which the maximum number of nodes allowed for an interactive session by debugjob.<br />
<br />
For a short interactive sessions on a dedicated compute node of the teach cluster, use the 'debugjob' command as follows: <br />
teach01:~$ debugjob N<br />
where N is the number of nodes. On the Teach cluster, this is equivalent to <tt>debugjob -n 16*N </tt>. The positive integer number <tt>N</tt> can at most be 4.<br />
<br />
If no arguments are given to <tt>debugjob</tt>, it allocatesa single core on a Teach compute node.<br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
It is worth mentioning some differences between the Niagara and Teach clusters:<br />
* On the Teach cluster, $HOME is writable on the compute nodes. On Niagara, $HOME is read-only on the compute nodes, so in most cases, you will want to submit from your $SCRATCH directory.<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There are 2 queues available: Compute queue and debug queue. Their usage limits are listed on the table below.<br />
* 7 of the teach computing nodes have more memory than the 64GB default memory size. 5 of them have 128GB and 2 of them, 256GB. To run a big memory job on these nodes you need to add the following directive to your submitting script: #SBATCH --constraint=m128G. Replace m128G for m256G if you want your job to run exclusively on the 256GB nodes.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the most common case.<br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Interactive testing or troubleshooting || debug || 1 || 1 || 1 core || 4 nodes (64 cores)|| N/A || 4 hours<br />
|-<br />
|Compute jobs ||compute || 6 || 12 || 1 core || 8 nodes (128 cores)|| 15 minutes || 4 hours<br />
|}<br />
<br />
Within these limits, jobs may still have to wait in the queue. Although there are no allocations on the teach cluster, the waiting time still depends on many factors, such as the number of nodes and the walltime, how many other jobs are waiting in the queue, and whether a job can fill an otherwise unused spot in the schedule.<br />
<br />
== Jupyter Hub ==<br />
<br />
Some courses, like the Summer School, use Jupyter notebooks. In those cases, some (or all) of the large memory compute nodes are dedicated as jupyterhub nodes. <br />
<br />
To connect to these, you must first set up an ssh tunnel from your local computer to the jupyterhub node in the SciNet datacenter.<br />
On a local terminal on your computer (i.e., not logged into SciNet), use the following command:<br />
<br />
ssh -L8888:jupyterhub7:8000 teach.scinet.utoronto.ca -N<br />
<br />
Instead of jupyterhub7, you can also choose jupyterhub1, jupyterhub2, jupyterhub3, jupyterhub4, jupyterhub5, or jupyterhub 6.<br />
<br />
Note: It turns out that for many computers, in particular for Macs, this ssh command should be the first ssh to teach.scinet.utoronto.ca, i.e, you cannot already have another ssh session to teach running on your computer.<br />
<br />
Also Note that this command will seem to 'hang' there, but the tunnel will have been established.<br />
<br />
Next, open your browser and go to <tt>https://localhost:8888</tt> and you can login to the jupyterhub. <br />
<br />
Note: You will likely have to tell your browser to trust this site.<br />
<br />
== Software Modules ==<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
<br />
{| class="wikitable sortable" style="width:100%" <br />
! style="width: 25%" align="center" | Module <br />
! style="width: 17%" align="center" | Versions (2018a) <br />
! align="center" | Description <br />
|- <br />
| align="left" | [https://www.continuum.io/anaconda-overview anaconda2] <br />
| align="left" | 5.1.0 <br />
| <div>Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture</div> <br />
|- <br />
| align="left" | [https://www.r-project.org/ anaconda3] <br />
| align="left" | 5.2.0 <br />
| <div>R is a free software environment for statistical computing and graphics</div> <br />
|- <br />
| align="left" | [https://github.com/smirarab/ASTRAL astral] <br />
| align="left" | 4.7.12 <br />
| <div>ASTRAL is a tool for estimating an unrooted species tree given a set of unrooted gene trees</div> <br />
|- <br />
| align="left" | [https://www.htslib.org/ bcftools] <br />
| align="left" | 1.8 <br />
| <div>SAMtools is a suite of programs for interacting with high-throughput sequencing data</div> <br />
|- <br />
| align="left" | [https://github.com/arq5x/bedtools2 bedtools] <br />
| align="left" | 2.27.1 <br />
| <div>The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage</div> <br />
|- <br />
| align="left" | [https://blast.ncbi.nlm.nih.gov/ blast+] <br />
| align="left" | 2.7.1 <br />
| <div>Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences</div> <br />
|- <br />
| align="left" | [https://www.boost.org/ boost] <br />
| align="left" | 1.67.0&nbsp; 1.66.0 <br />
| <div>Boost provides free peer-reviewed portable C++ source libraries</div> <br />
|- <br />
| align="left" | [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml bowtie2] <br />
| align="left" | 2.3.4.3 <br />
| <div>Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences</div> <br />
|- <br />
| align="left" | [http://bio-bwa.sourceforge.net/ bwa] <br />
| align="left" | 0.7.17 <br />
| <div>Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome</div> <br />
|- <br />
| align="left" | [https://github.com/brentp/bwa-meth bwameth] <br />
| align="left" | 0.4.0 <br />
| <div>Fast and accurate alignment of BS-Seq reads</div> <br />
|- <br />
| align="left" | [https://www.cmake.org cmake] <br />
| align="left" | 3.12.3 <br />
| <div>CMake, the cross-platform, open-source build system</div> <br />
|- <br />
| align="left" | [http://opensource.scilifelab.se/projects/cutadapt/ cutadapt] <br />
| align="left" | 2.1 <br />
| <div>Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads</div> <br />
|- <br />
| align="left" | [http://deeptools.readthedocs.org/ deeptools] <br />
| align="left" | 3.2.1-anaconda2 <br />
| <div>deepTools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq</div> <br />
|- <br />
| align="left" | [https://bioconductor.org/packages/release/bioc/html/DEXSeq.html dexseq] <br />
| align="left" | 1.24.4 <br />
| <div>Inference of differential exon usage in RNA sequencing</div> <br />
|- <br />
| align="left" | [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ fastqc] <br />
| align="left" | 0.11.8 <br />
| <div>FastQC is a quality control application for high throughput sequence data</div> <br />
|- <br />
| align="left" | [http://www.fftw.org fftw] <br />
| align="left" | 3.3.7 <br />
| <div>FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data</div> <br />
|- <br />
| align="left" | [https://gcc.gnu.org gcc] <br />
| align="left" | 7.3.0 <br />
| <div>The GNU Compiler Collection for C, C++, and Fortran</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/gdb/gdb.html gdb] <br />
| align="left" | 8.1 <br />
| <div>The GNU Project Debugger</div> <br />
|- <br />
| align="left" | [https://git-annex.branchable.com/ git-annex] <br />
| align="left" | 2.8.1 <br />
| <div>git-annex allows managing files with git, without checking the file contents into git</div> <br />
|- <br />
| align="left" | [https://gmplib.org/ gmp] <br />
| align="left" | 6.1.2 <br />
| <div>GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/parallel gnu-parallel] <br />
| align="left" | 20180322 <br />
| <div>GNU parallel is a shell tool for executing (usually serial) jobs in parallel</div> <br />
|- <br />
| align="left" | [http://gnuplot.sourceforge.net/ gnuplot] <br />
| align="left" | 5.2.2 <br />
| <div>Portable interactive, function plotting utility</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/gsl/ gsl] <br />
| align="left" | 2.4 <br />
| <div>The GNU Scientific Library (GSL) is a numerical library for C and C++</div> <br />
|- <br />
| align="left" | [https://portal.hdfgroup.org/display/support hdf5] <br />
| align="left" | 1.8.20&nbsp; 1.10.4 <br />
| <div>HDF5 is a data model, library, and file format for storing and managing data</div> <br />
|- <br />
| align="left" | [https://ccb.jhu.edu/software/hisat2/index.shtml hisat2] <br />
| align="left" | 2.1.0 <br />
| <div>HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome)</div> <br />
|- <br />
| align="left" | [https://www-huber.embl.de/users/anders/HTSeq/ htseq] <br />
| align="left" | 0.11.1-anaconda2&nbsp; 0.11.1 <br />
| <div>A framework to process and analyze data from high-throughput sequencing (HTS) assays</div> <br />
|- <br />
| align="left" | [http://www.htslib.org/ htslib] <br />
| align="left" | 1.8 <br />
| <div>A C library for reading/writing high-throughput sequencing data</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/parallel-studio-xe intel] <br />
| align="left" | 2018.4 <br />
| <div>Intel compilers suite for C, C++, and Fortran, including the MKL, TBB, IPP, DAAL, and PSTL libraries</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/mpi-library intelmpi] <br />
| align="left" | 2018.4 <br />
| <div>Intel MPI library with compiler wrappers for C, C++, and Fortran</div> <br />
|- <br />
| align="left" | [https://java.com/ java] <br />
| align="left" | 1.8.0_201 <br />
| <div>Java Platform, Standard Edition (Java SE) lets you develop and deploy Java applications on desktops and servers</div> <br />
|- <br />
| align="left" | [https://github.com/LMDB/lmdb lmdb] <br />
| align="left" | 0.9.22 <br />
| <div>OpenLDAP's Lightning Memory-Mapped Database (LMDB) library</div> <br />
|- <br />
| align="left" | [https://github.com/dpryan79/MethylDackel methyldackel] <br />
| align="left" | 0.4.0 <br />
| <div>A (mostly) universal methylation extractor for BS-seq experiments</div> <br />
|- <br />
| align="left" | [https://www.bioinf.uni-leipzig.de/Software/metilene/ metilene] <br />
| align="left" | 0.2.7 <br />
| <div>Fast and sensitive detection of differential DNA methylation</div> <br />
|- <br />
| align="left" | [http://hollywood.mit.edu/burgelab/miso miso] <br />
| align="left" | 0.5.4 <br />
| <div>A probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/mkl mkl] <br />
| align="left" | 2018.4 <br />
| <div>Intel Math Kernel Library</div> <br />
|- <br />
| align="left" | [https://www.unidata.ucar.edu/software/netcdf/ netcdf] <br />
| align="left" | 4.6.1 <br />
| <div>NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data</div> <br />
|- <br />
| align="left" | [https://www.open-mpi.org/ openmpi] <br />
| align="left" | 3.1.1 <br />
| <div>The Open MPI Project is an open source MPI-2 implementation</div> <br />
|- <br />
| align="left" | [http://oprofile.sourceforge.net oprofile] <br />
| align="left" | 1.3.0 <br />
| <div>OProfile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead</div> <br />
|- <br />
| align="left" | [http://www.stevekellylab.com/software/orthofinder orthofinder] <br />
| align="left" | 2.2.7 <br />
| <div>Program for identifying orthologous protein sequence families</div> <br />
|- <br />
| align="left" | [http://www.astro.caltech.edu/~tjp/pgplot/ pgplot] <br />
| align="left" | 5.2.2-x <br />
| <div>Graphics subroutine library for C/C++ and Fortran</div> <br />
|- <br />
| align="left" | [http://prinseq.sourceforge.net prinseq] <br />
| align="left" | 0.20.4 <br />
| <div>A bioinformatics tool to PRe-process and show INformation of SEQuence data</div> <br />
|- <br />
| align="left" | [https://python.org/ python] <br />
| align="left" | 3.6.8 <br />
| <div>Python is a programming language that lets you work more quickly and integrate your systems more effectively</div> <br />
|- <br />
| align="left" | [https://www.r-project.org/ r] <br />
| align="left" | 3.5.1&nbsp; 3.5.0 <br />
| <div>R is a free software environment for statistical computing and graphics</div> <br />
|- <br />
| align="left" | [https://github.com/vanzonr/rarray rarray] <br />
| align="left" | 1.2 <br />
| <div>Library for runtime multi-dimensional arrays in C++</div> <br />
|- <br />
| align="left" | [https://github.com/stamatak/standard-RAxML raxml] <br />
| align="left" | 8.2.12 <br />
| <div>RAxML search algorithm for maximum likelihood based inference of phylogenetic trees</div> <br />
|- <br />
| align="left" | [https://www.htslib.org/ samtools] <br />
| align="left" | 1.8 <br />
| <div>SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format</div> <br />
|- <br />
| align="left" | [https://www.sylabs.io/docs/ singularity] <br />
| align="left" | 2.6.1 <br />
| <div>Singularity is a portable application stack packaging and runtime utility.</div> <br />
|- <br />
| align="left" | [https://www.hwaci.com/sw/sqlite/ sqlite] <br />
| align="left" | 3.23.0 <br />
| <div>SQLite: SQL Database Engine in a C Library</div> <br />
|- <br />
| align="left" | [https://ccb.jhu.edu/software/stringtie stringtie] <br />
| align="left" | 1.3.5 <br />
| <div>StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts</div> <br />
|- <br />
| align="left" | [https://01.org/tbb/ tbb] <br />
| align="left" | 2019.4 <br />
| <div>Intel(R) Threading Building Blocks (Intel(R) TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable, composable and have future-proof scalability</div> <br />
|- <br />
| align="left" | [https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ trimgalore] <br />
| align="left" | 0.6.0 <br />
| <div>Function A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries</div> <br />
|- <br />
| align="left" | [https://cran.rstudio.com/web/packages/UpSetR upsetr] <br />
| align="left" | 1.3.3 <br />
| <div>R implementation of the UpSet set visualization technique published by Lex, Gehlenborg, et al</div> <br />
|- <br />
| align="left" | [http://valgrind.org valgrind] <br />
| align="left" | 3.14.0 <br />
| <div>Valgrind provides debugging and profiling tools</div> <br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=2238Teach2019-07-02T20:06:52Z<p>Bmundim: /* Limits */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with 16 cores (from two octal core Intel XeonSandybridge E5-2650 CPUs) running at 2.0GHz with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
The first time you login to Teach cluster, please make sure to check if the login node ssh key fingerprint<br />
matches. [[Teach_fingerprints | See here how]].<br />
<br />
== Interactive jobs ==<br />
<br />
The login node teach01 is shared between students of a <br />
number of different courses. Use this node to develop and compile <br />
code, to run short tests, and to submit computations to the scheduler. <br />
<br />
For a interactive sessions on a compute node of the teach cluster, use the 'debugjob' command. <br />
teach01:~$ debugjob -n C<br />
where C is the number of cores. An interactive session defaults to four hours when using at most one node (C<=16), and becomes 60 minutes when using four nodes (i.e., 48<C<=64), which the maximum number of nodes allowed for an interactive session by debugjob.<br />
<br />
For a short interactive sessions on a dedicated compute node of the teach cluster, use the 'debugjob' command as follows: <br />
teach01:~$ debugjob N<br />
where N is the number of nodes. On the Teach cluster, this is equivalent to <tt>debugjob -n 16*N </tt>. The positive integer number <tt>N</tt> can at most be 4.<br />
<br />
If no arguments are given to <tt>debugjob</tt>, it allocatesa single core on a Teach compute node.<br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
It is worth mentioning some differences between the Niagara and Teach clusters:<br />
* On the Teach cluster, $HOME is writable on the compute nodes. On Niagara, $HOME is read-only on the compute nodes, so in most cases, you will want to submit from your $SCRATCH directory.<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There is only 1 queue available: the compute queue. Its usage limit is listed on the table below.<br />
* 7 of the teach computing nodes have more memory than the 64GB default memory size. 5 of them have 128GB and 2 of them, 256GB. To run a big memory job on these nodes you need to add the following directive to your submitting script: #SBATCH --constraint=m128G. Replace m128G for m256G if you want your job to run exclusively on the 256GB nodes.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the most common case.<br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Interactive testing or troubleshooting || debug || 1 || 1 || 1 core || 4 nodes (64 cores)|| N/A || 4 hours<br />
|-<br />
|Compute jobs ||compute || 6 || 12 || 1 core || 8 nodes (128 cores)|| 15 minutes || 4 hours<br />
|}<br />
<br />
Within these limits, jobs may still have to wait in the queue. Although there are no allocations on the teach cluster, the waiting time still depends on many factors, such as the number of nodes and the walltime, how many other jobs are waiting in the queue, and whether a job can fill an otherwise unused spot in the schedule.<br />
<br />
== Jupyter Hub ==<br />
<br />
Some courses, like the Summer School, use Jupyter notebooks. In those cases, some (or all) of the large memory compute nodes are dedicated as jupyterhub nodes. <br />
<br />
To connect to these, you must first set up an ssh tunnel from your local computer to the jupyterhub node in the SciNet datacenter.<br />
On a local terminal on your computer (i.e., not logged into SciNet), use the following command:<br />
<br />
ssh -L8888:jupyterhub7:8000 teach.scinet.utoronto.ca -N<br />
<br />
Instead of jupyterhub7, you can also choose jupyterhub1, jupyterhub2, jupyterhub3, jupyterhub4, jupyterhub5, or jupyterhub 6.<br />
<br />
Note: It turns out that for many computers, in particular for Macs, this ssh command should be the first ssh to teach.scinet.utoronto.ca, i.e, you cannot already have another ssh session to teach running on your computer.<br />
<br />
Also Note that this command will seem to 'hang' there, but the tunnel will have been established.<br />
<br />
Next, open your browser and go to <tt>https://localhost:8888</tt> and you can login to the jupyterhub. <br />
<br />
Note: You will likely have to tell your browser to trust this site.<br />
<br />
== Software Modules ==<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
<br />
{| class="wikitable sortable" style="width:100%" <br />
! style="width: 25%" align="center" | Module <br />
! style="width: 17%" align="center" | Versions (2018a) <br />
! align="center" | Description <br />
|- <br />
| align="left" | [https://www.continuum.io/anaconda-overview anaconda2] <br />
| align="left" | 5.1.0 <br />
| <div>Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture</div> <br />
|- <br />
| align="left" | [https://www.r-project.org/ anaconda3] <br />
| align="left" | 5.2.0 <br />
| <div>R is a free software environment for statistical computing and graphics</div> <br />
|- <br />
| align="left" | [https://github.com/smirarab/ASTRAL astral] <br />
| align="left" | 4.7.12 <br />
| <div>ASTRAL is a tool for estimating an unrooted species tree given a set of unrooted gene trees</div> <br />
|- <br />
| align="left" | [https://www.htslib.org/ bcftools] <br />
| align="left" | 1.8 <br />
| <div>SAMtools is a suite of programs for interacting with high-throughput sequencing data</div> <br />
|- <br />
| align="left" | [https://github.com/arq5x/bedtools2 bedtools] <br />
| align="left" | 2.27.1 <br />
| <div>The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage</div> <br />
|- <br />
| align="left" | [https://blast.ncbi.nlm.nih.gov/ blast+] <br />
| align="left" | 2.7.1 <br />
| <div>Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences</div> <br />
|- <br />
| align="left" | [https://www.boost.org/ boost] <br />
| align="left" | 1.67.0&nbsp; 1.66.0 <br />
| <div>Boost provides free peer-reviewed portable C++ source libraries</div> <br />
|- <br />
| align="left" | [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml bowtie2] <br />
| align="left" | 2.3.4.3 <br />
| <div>Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences</div> <br />
|- <br />
| align="left" | [http://bio-bwa.sourceforge.net/ bwa] <br />
| align="left" | 0.7.17 <br />
| <div>Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome</div> <br />
|- <br />
| align="left" | [https://github.com/brentp/bwa-meth bwameth] <br />
| align="left" | 0.4.0 <br />
| <div>Fast and accurate alignment of BS-Seq reads</div> <br />
|- <br />
| align="left" | [https://www.cmake.org cmake] <br />
| align="left" | 3.12.3 <br />
| <div>CMake, the cross-platform, open-source build system</div> <br />
|- <br />
| align="left" | [http://opensource.scilifelab.se/projects/cutadapt/ cutadapt] <br />
| align="left" | 2.1 <br />
| <div>Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads</div> <br />
|- <br />
| align="left" | [http://deeptools.readthedocs.org/ deeptools] <br />
| align="left" | 3.2.1-anaconda2 <br />
| <div>deepTools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq</div> <br />
|- <br />
| align="left" | [https://bioconductor.org/packages/release/bioc/html/DEXSeq.html dexseq] <br />
| align="left" | 1.24.4 <br />
| <div>Inference of differential exon usage in RNA sequencing</div> <br />
|- <br />
| align="left" | [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ fastqc] <br />
| align="left" | 0.11.8 <br />
| <div>FastQC is a quality control application for high throughput sequence data</div> <br />
|- <br />
| align="left" | [http://www.fftw.org fftw] <br />
| align="left" | 3.3.7 <br />
| <div>FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data</div> <br />
|- <br />
| align="left" | [https://gcc.gnu.org gcc] <br />
| align="left" | 7.3.0 <br />
| <div>The GNU Compiler Collection for C, C++, and Fortran</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/gdb/gdb.html gdb] <br />
| align="left" | 8.1 <br />
| <div>The GNU Project Debugger</div> <br />
|- <br />
| align="left" | [https://git-annex.branchable.com/ git-annex] <br />
| align="left" | 2.8.1 <br />
| <div>git-annex allows managing files with git, without checking the file contents into git</div> <br />
|- <br />
| align="left" | [https://gmplib.org/ gmp] <br />
| align="left" | 6.1.2 <br />
| <div>GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/parallel gnu-parallel] <br />
| align="left" | 20180322 <br />
| <div>GNU parallel is a shell tool for executing (usually serial) jobs in parallel</div> <br />
|- <br />
| align="left" | [http://gnuplot.sourceforge.net/ gnuplot] <br />
| align="left" | 5.2.2 <br />
| <div>Portable interactive, function plotting utility</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/gsl/ gsl] <br />
| align="left" | 2.4 <br />
| <div>The GNU Scientific Library (GSL) is a numerical library for C and C++</div> <br />
|- <br />
| align="left" | [https://portal.hdfgroup.org/display/support hdf5] <br />
| align="left" | 1.8.20&nbsp; 1.10.4 <br />
| <div>HDF5 is a data model, library, and file format for storing and managing data</div> <br />
|- <br />
| align="left" | [https://ccb.jhu.edu/software/hisat2/index.shtml hisat2] <br />
| align="left" | 2.1.0 <br />
| <div>HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome)</div> <br />
|- <br />
| align="left" | [https://www-huber.embl.de/users/anders/HTSeq/ htseq] <br />
| align="left" | 0.11.1-anaconda2&nbsp; 0.11.1 <br />
| <div>A framework to process and analyze data from high-throughput sequencing (HTS) assays</div> <br />
|- <br />
| align="left" | [http://www.htslib.org/ htslib] <br />
| align="left" | 1.8 <br />
| <div>A C library for reading/writing high-throughput sequencing data</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/parallel-studio-xe intel] <br />
| align="left" | 2018.4 <br />
| <div>Intel compilers suite for C, C++, and Fortran, including the MKL, TBB, IPP, DAAL, and PSTL libraries</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/mpi-library intelmpi] <br />
| align="left" | 2018.4 <br />
| <div>Intel MPI library with compiler wrappers for C, C++, and Fortran</div> <br />
|- <br />
| align="left" | [https://java.com/ java] <br />
| align="left" | 1.8.0_201 <br />
| <div>Java Platform, Standard Edition (Java SE) lets you develop and deploy Java applications on desktops and servers</div> <br />
|- <br />
| align="left" | [https://github.com/LMDB/lmdb lmdb] <br />
| align="left" | 0.9.22 <br />
| <div>OpenLDAP's Lightning Memory-Mapped Database (LMDB) library</div> <br />
|- <br />
| align="left" | [https://github.com/dpryan79/MethylDackel methyldackel] <br />
| align="left" | 0.4.0 <br />
| <div>A (mostly) universal methylation extractor for BS-seq experiments</div> <br />
|- <br />
| align="left" | [https://www.bioinf.uni-leipzig.de/Software/metilene/ metilene] <br />
| align="left" | 0.2.7 <br />
| <div>Fast and sensitive detection of differential DNA methylation</div> <br />
|- <br />
| align="left" | [http://hollywood.mit.edu/burgelab/miso miso] <br />
| align="left" | 0.5.4 <br />
| <div>A probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/mkl mkl] <br />
| align="left" | 2018.4 <br />
| <div>Intel Math Kernel Library</div> <br />
|- <br />
| align="left" | [https://www.unidata.ucar.edu/software/netcdf/ netcdf] <br />
| align="left" | 4.6.1 <br />
| <div>NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data</div> <br />
|- <br />
| align="left" | [https://www.open-mpi.org/ openmpi] <br />
| align="left" | 3.1.1 <br />
| <div>The Open MPI Project is an open source MPI-2 implementation</div> <br />
|- <br />
| align="left" | [http://oprofile.sourceforge.net oprofile] <br />
| align="left" | 1.3.0 <br />
| <div>OProfile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead</div> <br />
|- <br />
| align="left" | [http://www.stevekellylab.com/software/orthofinder orthofinder] <br />
| align="left" | 2.2.7 <br />
| <div>Program for identifying orthologous protein sequence families</div> <br />
|- <br />
| align="left" | [http://www.astro.caltech.edu/~tjp/pgplot/ pgplot] <br />
| align="left" | 5.2.2-x <br />
| <div>Graphics subroutine library for C/C++ and Fortran</div> <br />
|- <br />
| align="left" | [http://prinseq.sourceforge.net prinseq] <br />
| align="left" | 0.20.4 <br />
| <div>A bioinformatics tool to PRe-process and show INformation of SEQuence data</div> <br />
|- <br />
| align="left" | [https://python.org/ python] <br />
| align="left" | 3.6.8 <br />
| <div>Python is a programming language that lets you work more quickly and integrate your systems more effectively</div> <br />
|- <br />
| align="left" | [https://www.r-project.org/ r] <br />
| align="left" | 3.5.1&nbsp; 3.5.0 <br />
| <div>R is a free software environment for statistical computing and graphics</div> <br />
|- <br />
| align="left" | [https://github.com/vanzonr/rarray rarray] <br />
| align="left" | 1.2 <br />
| <div>Library for runtime multi-dimensional arrays in C++</div> <br />
|- <br />
| align="left" | [https://github.com/stamatak/standard-RAxML raxml] <br />
| align="left" | 8.2.12 <br />
| <div>RAxML search algorithm for maximum likelihood based inference of phylogenetic trees</div> <br />
|- <br />
| align="left" | [https://www.htslib.org/ samtools] <br />
| align="left" | 1.8 <br />
| <div>SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format</div> <br />
|- <br />
| align="left" | [https://www.sylabs.io/docs/ singularity] <br />
| align="left" | 2.6.1 <br />
| <div>Singularity is a portable application stack packaging and runtime utility.</div> <br />
|- <br />
| align="left" | [https://www.hwaci.com/sw/sqlite/ sqlite] <br />
| align="left" | 3.23.0 <br />
| <div>SQLite: SQL Database Engine in a C Library</div> <br />
|- <br />
| align="left" | [https://ccb.jhu.edu/software/stringtie stringtie] <br />
| align="left" | 1.3.5 <br />
| <div>StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts</div> <br />
|- <br />
| align="left" | [https://01.org/tbb/ tbb] <br />
| align="left" | 2019.4 <br />
| <div>Intel(R) Threading Building Blocks (Intel(R) TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable, composable and have future-proof scalability</div> <br />
|- <br />
| align="left" | [https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ trimgalore] <br />
| align="left" | 0.6.0 <br />
| <div>Function A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries</div> <br />
|- <br />
| align="left" | [https://cran.rstudio.com/web/packages/UpSetR upsetr] <br />
| align="left" | 1.3.3 <br />
| <div>R implementation of the UpSet set visualization technique published by Lex, Gehlenborg, et al</div> <br />
|- <br />
| align="left" | [http://valgrind.org valgrind] <br />
| align="left" | 3.14.0 <br />
| <div>Valgrind provides debugging and profiling tools</div> <br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=2237Teach2019-07-02T20:06:02Z<p>Bmundim: /* Limits */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with 16 cores (from two octal core Intel XeonSandybridge E5-2650 CPUs) running at 2.0GHz with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
The first time you login to Teach cluster, please make sure to check if the login node ssh key fingerprint<br />
matches. [[Teach_fingerprints | See here how]].<br />
<br />
== Interactive jobs ==<br />
<br />
The login node teach01 is shared between students of a <br />
number of different courses. Use this node to develop and compile <br />
code, to run short tests, and to submit computations to the scheduler. <br />
<br />
For a interactive sessions on a compute node of the teach cluster, use the 'debugjob' command. <br />
teach01:~$ debugjob -n C<br />
where C is the number of cores. An interactive session defaults to four hours when using at most one node (C<=16), and becomes 60 minutes when using four nodes (i.e., 48<C<=64), which the maximum number of nodes allowed for an interactive session by debugjob.<br />
<br />
For a short interactive sessions on a dedicated compute node of the teach cluster, use the 'debugjob' command as follows: <br />
teach01:~$ debugjob N<br />
where N is the number of nodes. On the Teach cluster, this is equivalent to <tt>debugjob -n 16*N </tt>. The positive integer number <tt>N</tt> can at most be 4.<br />
<br />
If no arguments are given to <tt>debugjob</tt>, it allocatesa single core on a Teach compute node.<br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
It is worth mentioning some differences between the Niagara and Teach clusters:<br />
* On the Teach cluster, $HOME is writable on the compute nodes. On Niagara, $HOME is read-only on the compute nodes, so in most cases, you will want to submit from your $SCRATCH directory.<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There is only 1 queue available: the compute queue. Its usage limit is listed on the table below.<br />
* 7 of the teach computing nodes have more memory than the 64GB default memory size. 5 of them have 128GB and 2 of them, 256GB. To run a big memory job on these nodes you need to add the following directive to your submitting script: #SBATCH --constraint=m128G. Replace m128G for m256G if you want your job to run exclusively on the 256GB nodes.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the most common case.<br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 core || 4 nodes (64 cores)|| N/A || 4 hours<br />
|-<br />
|Compute jobs ||compute || 6 || 12 || 1 core || 8 nodes (128 cores)|| 15 minutes || 4 hours<br />
|}<br />
<br />
Within these limits, jobs may still have to wait in the queue. Although there are no allocations on the teach cluster, the waiting time still depends on many factors, such as the number of nodes and the walltime, how many other jobs are waiting in the queue, and whether a job can fill an otherwise unused spot in the schedule.<br />
<br />
== Jupyter Hub ==<br />
<br />
Some courses, like the Summer School, use Jupyter notebooks. In those cases, some (or all) of the large memory compute nodes are dedicated as jupyterhub nodes. <br />
<br />
To connect to these, you must first set up an ssh tunnel from your local computer to the jupyterhub node in the SciNet datacenter.<br />
On a local terminal on your computer (i.e., not logged into SciNet), use the following command:<br />
<br />
ssh -L8888:jupyterhub7:8000 teach.scinet.utoronto.ca -N<br />
<br />
Instead of jupyterhub7, you can also choose jupyterhub1, jupyterhub2, jupyterhub3, jupyterhub4, jupyterhub5, or jupyterhub 6.<br />
<br />
Note: It turns out that for many computers, in particular for Macs, this ssh command should be the first ssh to teach.scinet.utoronto.ca, i.e, you cannot already have another ssh session to teach running on your computer.<br />
<br />
Also Note that this command will seem to 'hang' there, but the tunnel will have been established.<br />
<br />
Next, open your browser and go to <tt>https://localhost:8888</tt> and you can login to the jupyterhub. <br />
<br />
Note: You will likely have to tell your browser to trust this site.<br />
<br />
== Software Modules ==<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
<br />
{| class="wikitable sortable" style="width:100%" <br />
! style="width: 25%" align="center" | Module <br />
! style="width: 17%" align="center" | Versions (2018a) <br />
! align="center" | Description <br />
|- <br />
| align="left" | [https://www.continuum.io/anaconda-overview anaconda2] <br />
| align="left" | 5.1.0 <br />
| <div>Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture</div> <br />
|- <br />
| align="left" | [https://www.r-project.org/ anaconda3] <br />
| align="left" | 5.2.0 <br />
| <div>R is a free software environment for statistical computing and graphics</div> <br />
|- <br />
| align="left" | [https://github.com/smirarab/ASTRAL astral] <br />
| align="left" | 4.7.12 <br />
| <div>ASTRAL is a tool for estimating an unrooted species tree given a set of unrooted gene trees</div> <br />
|- <br />
| align="left" | [https://www.htslib.org/ bcftools] <br />
| align="left" | 1.8 <br />
| <div>SAMtools is a suite of programs for interacting with high-throughput sequencing data</div> <br />
|- <br />
| align="left" | [https://github.com/arq5x/bedtools2 bedtools] <br />
| align="left" | 2.27.1 <br />
| <div>The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage</div> <br />
|- <br />
| align="left" | [https://blast.ncbi.nlm.nih.gov/ blast+] <br />
| align="left" | 2.7.1 <br />
| <div>Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences</div> <br />
|- <br />
| align="left" | [https://www.boost.org/ boost] <br />
| align="left" | 1.67.0&nbsp; 1.66.0 <br />
| <div>Boost provides free peer-reviewed portable C++ source libraries</div> <br />
|- <br />
| align="left" | [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml bowtie2] <br />
| align="left" | 2.3.4.3 <br />
| <div>Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences</div> <br />
|- <br />
| align="left" | [http://bio-bwa.sourceforge.net/ bwa] <br />
| align="left" | 0.7.17 <br />
| <div>Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome</div> <br />
|- <br />
| align="left" | [https://github.com/brentp/bwa-meth bwameth] <br />
| align="left" | 0.4.0 <br />
| <div>Fast and accurate alignment of BS-Seq reads</div> <br />
|- <br />
| align="left" | [https://www.cmake.org cmake] <br />
| align="left" | 3.12.3 <br />
| <div>CMake, the cross-platform, open-source build system</div> <br />
|- <br />
| align="left" | [http://opensource.scilifelab.se/projects/cutadapt/ cutadapt] <br />
| align="left" | 2.1 <br />
| <div>Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads</div> <br />
|- <br />
| align="left" | [http://deeptools.readthedocs.org/ deeptools] <br />
| align="left" | 3.2.1-anaconda2 <br />
| <div>deepTools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq</div> <br />
|- <br />
| align="left" | [https://bioconductor.org/packages/release/bioc/html/DEXSeq.html dexseq] <br />
| align="left" | 1.24.4 <br />
| <div>Inference of differential exon usage in RNA sequencing</div> <br />
|- <br />
| align="left" | [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ fastqc] <br />
| align="left" | 0.11.8 <br />
| <div>FastQC is a quality control application for high throughput sequence data</div> <br />
|- <br />
| align="left" | [http://www.fftw.org fftw] <br />
| align="left" | 3.3.7 <br />
| <div>FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data</div> <br />
|- <br />
| align="left" | [https://gcc.gnu.org gcc] <br />
| align="left" | 7.3.0 <br />
| <div>The GNU Compiler Collection for C, C++, and Fortran</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/gdb/gdb.html gdb] <br />
| align="left" | 8.1 <br />
| <div>The GNU Project Debugger</div> <br />
|- <br />
| align="left" | [https://git-annex.branchable.com/ git-annex] <br />
| align="left" | 2.8.1 <br />
| <div>git-annex allows managing files with git, without checking the file contents into git</div> <br />
|- <br />
| align="left" | [https://gmplib.org/ gmp] <br />
| align="left" | 6.1.2 <br />
| <div>GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/parallel gnu-parallel] <br />
| align="left" | 20180322 <br />
| <div>GNU parallel is a shell tool for executing (usually serial) jobs in parallel</div> <br />
|- <br />
| align="left" | [http://gnuplot.sourceforge.net/ gnuplot] <br />
| align="left" | 5.2.2 <br />
| <div>Portable interactive, function plotting utility</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/gsl/ gsl] <br />
| align="left" | 2.4 <br />
| <div>The GNU Scientific Library (GSL) is a numerical library for C and C++</div> <br />
|- <br />
| align="left" | [https://portal.hdfgroup.org/display/support hdf5] <br />
| align="left" | 1.8.20&nbsp; 1.10.4 <br />
| <div>HDF5 is a data model, library, and file format for storing and managing data</div> <br />
|- <br />
| align="left" | [https://ccb.jhu.edu/software/hisat2/index.shtml hisat2] <br />
| align="left" | 2.1.0 <br />
| <div>HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome)</div> <br />
|- <br />
| align="left" | [https://www-huber.embl.de/users/anders/HTSeq/ htseq] <br />
| align="left" | 0.11.1-anaconda2&nbsp; 0.11.1 <br />
| <div>A framework to process and analyze data from high-throughput sequencing (HTS) assays</div> <br />
|- <br />
| align="left" | [http://www.htslib.org/ htslib] <br />
| align="left" | 1.8 <br />
| <div>A C library for reading/writing high-throughput sequencing data</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/parallel-studio-xe intel] <br />
| align="left" | 2018.4 <br />
| <div>Intel compilers suite for C, C++, and Fortran, including the MKL, TBB, IPP, DAAL, and PSTL libraries</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/mpi-library intelmpi] <br />
| align="left" | 2018.4 <br />
| <div>Intel MPI library with compiler wrappers for C, C++, and Fortran</div> <br />
|- <br />
| align="left" | [https://java.com/ java] <br />
| align="left" | 1.8.0_201 <br />
| <div>Java Platform, Standard Edition (Java SE) lets you develop and deploy Java applications on desktops and servers</div> <br />
|- <br />
| align="left" | [https://github.com/LMDB/lmdb lmdb] <br />
| align="left" | 0.9.22 <br />
| <div>OpenLDAP's Lightning Memory-Mapped Database (LMDB) library</div> <br />
|- <br />
| align="left" | [https://github.com/dpryan79/MethylDackel methyldackel] <br />
| align="left" | 0.4.0 <br />
| <div>A (mostly) universal methylation extractor for BS-seq experiments</div> <br />
|- <br />
| align="left" | [https://www.bioinf.uni-leipzig.de/Software/metilene/ metilene] <br />
| align="left" | 0.2.7 <br />
| <div>Fast and sensitive detection of differential DNA methylation</div> <br />
|- <br />
| align="left" | [http://hollywood.mit.edu/burgelab/miso miso] <br />
| align="left" | 0.5.4 <br />
| <div>A probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/mkl mkl] <br />
| align="left" | 2018.4 <br />
| <div>Intel Math Kernel Library</div> <br />
|- <br />
| align="left" | [https://www.unidata.ucar.edu/software/netcdf/ netcdf] <br />
| align="left" | 4.6.1 <br />
| <div>NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data</div> <br />
|- <br />
| align="left" | [https://www.open-mpi.org/ openmpi] <br />
| align="left" | 3.1.1 <br />
| <div>The Open MPI Project is an open source MPI-2 implementation</div> <br />
|- <br />
| align="left" | [http://oprofile.sourceforge.net oprofile] <br />
| align="left" | 1.3.0 <br />
| <div>OProfile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead</div> <br />
|- <br />
| align="left" | [http://www.stevekellylab.com/software/orthofinder orthofinder] <br />
| align="left" | 2.2.7 <br />
| <div>Program for identifying orthologous protein sequence families</div> <br />
|- <br />
| align="left" | [http://www.astro.caltech.edu/~tjp/pgplot/ pgplot] <br />
| align="left" | 5.2.2-x <br />
| <div>Graphics subroutine library for C/C++ and Fortran</div> <br />
|- <br />
| align="left" | [http://prinseq.sourceforge.net prinseq] <br />
| align="left" | 0.20.4 <br />
| <div>A bioinformatics tool to PRe-process and show INformation of SEQuence data</div> <br />
|- <br />
| align="left" | [https://python.org/ python] <br />
| align="left" | 3.6.8 <br />
| <div>Python is a programming language that lets you work more quickly and integrate your systems more effectively</div> <br />
|- <br />
| align="left" | [https://www.r-project.org/ r] <br />
| align="left" | 3.5.1&nbsp; 3.5.0 <br />
| <div>R is a free software environment for statistical computing and graphics</div> <br />
|- <br />
| align="left" | [https://github.com/vanzonr/rarray rarray] <br />
| align="left" | 1.2 <br />
| <div>Library for runtime multi-dimensional arrays in C++</div> <br />
|- <br />
| align="left" | [https://github.com/stamatak/standard-RAxML raxml] <br />
| align="left" | 8.2.12 <br />
| <div>RAxML search algorithm for maximum likelihood based inference of phylogenetic trees</div> <br />
|- <br />
| align="left" | [https://www.htslib.org/ samtools] <br />
| align="left" | 1.8 <br />
| <div>SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format</div> <br />
|- <br />
| align="left" | [https://www.sylabs.io/docs/ singularity] <br />
| align="left" | 2.6.1 <br />
| <div>Singularity is a portable application stack packaging and runtime utility.</div> <br />
|- <br />
| align="left" | [https://www.hwaci.com/sw/sqlite/ sqlite] <br />
| align="left" | 3.23.0 <br />
| <div>SQLite: SQL Database Engine in a C Library</div> <br />
|- <br />
| align="left" | [https://ccb.jhu.edu/software/stringtie stringtie] <br />
| align="left" | 1.3.5 <br />
| <div>StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts</div> <br />
|- <br />
| align="left" | [https://01.org/tbb/ tbb] <br />
| align="left" | 2019.4 <br />
| <div>Intel(R) Threading Building Blocks (Intel(R) TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable, composable and have future-proof scalability</div> <br />
|- <br />
| align="left" | [https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ trimgalore] <br />
| align="left" | 0.6.0 <br />
| <div>Function A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries</div> <br />
|- <br />
| align="left" | [https://cran.rstudio.com/web/packages/UpSetR upsetr] <br />
| align="left" | 1.3.3 <br />
| <div>R implementation of the UpSet set visualization technique published by Lex, Gehlenborg, et al</div> <br />
|- <br />
| align="left" | [http://valgrind.org valgrind] <br />
| align="left" | 3.14.0 <br />
| <div>Valgrind provides debugging and profiling tools</div> <br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=2226Teach2019-06-24T12:58:03Z<p>Bmundim: /* Login/Devel Node */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with 16 cores (from two octal core Intel XeonSandybridge E5-2650 CPUs) running at 2.0GHz with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
The first time you login to Teach cluster, please make sure to check if the login node ssh key fingerprint<br />
matches. [[Teach_fingerprints | See here how]].<br />
<br />
== Interactive jobs ==<br />
<br />
The login node teach01 is shared between students of a <br />
number of different courses. Use this node to develop and compile <br />
code, to run short tests, and to submit computations to the scheduler. <br />
<br />
For a interactive sessions on a compute node of the teach cluster, use the 'debugjob' command. <br />
teach01:~$ debugjob -n C<br />
where C is the number of cores. An interactive session defaults to four hours when using at most one node (C<=16), and becomes 60 minutes when using four nodes (i.e., 48<C<=64), which the maximum number of nodes allowed for an interactive session by debugjob.<br />
<br />
For a short interactive sessions on a dedicated compute node of the teach cluster, use the 'debugjob' command as follows: <br />
teach01:~$ debugjob N<br />
where N is the number of nodes. On the Teach cluster, this is equivalent to <tt>debugjob -n 16*N </tt>. The positive integer number <tt>N</tt> can at most be 4.<br />
<br />
If no arguments are given to <tt>debugjob</tt>, it allocatesa single core on a Teach compute node.<br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
It is worth mentioning some differences between the Niagara and Teach clusters:<br />
* On the Teach cluster, $HOME is writable on the compute nodes. On Niagara, $HOME is read-only on the compute nodes, so in most cases, you will want to submit from your $SCRATCH directory.<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There is only 1 queue available: the compute queue. Its usage limit is listed on the table below.<br />
* 7 of the teach computing nodes have more memory than the 64GB default memory size. 5 of them have 128GB and 2 of them, 256GB. To run a big memory job on these nodes you need to add the following directive to your submitting script: #SBATCH --constraint=m128G. Replace m128G for m256G if you want your job to run exclusively on the 256GB nodes.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the only case on teach cluster. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 6 || 12 || 1 core || 8 nodes (128 cores)|| 15 minutes || 4 hours<br />
|}<br />
<br />
Within these limits, jobs may still have to wait in the queue. Although there are no allocations on the teach cluster, the waiting time still depends on many factors, such as the number of nodes and the walltime, how many other jobs are waiting in the queue, and whether a job can fill an otherwise unused spot in the schedule.<br />
<br />
== Jupyter Hub ==<br />
<br />
Some courses, like the Summer School, use Jupyter notebooks. In those cases, some (or all) of the large memory compute nodes are dedicated as jupyterhub nodes. <br />
<br />
To connect to these, you must first set up an ssh tunnel with the following command:<br />
<br />
ssh -L8888:jupyterhub7:8000 teach.scinet.utoronto.ca -N<br />
<br />
Instead of jupyterhub7, you can also choose jupyterhub1, jupyterhub2, jupyterhub3, jupyterhub4, jupyterhub5, or jupyterhub 6.<br />
<br />
This command will seem to 'hang' there, but the tunnel will have been established.<br />
<br />
Next, open your browser and go to <tt>https://localhost:8888</tt> and you can login to the jupyterhub. <br />
<br />
Note: You will likely have to tell your browser to trust this site.<br />
<br />
== Software Modules ==<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
<br />
{| class="wikitable sortable" style="width:100%" <br />
! style="width: 25%" align="center" | Module <br />
! style="width: 17%" align="center" | Versions (2018a) <br />
! align="center" | Description <br />
|- <br />
| align="left" | [https://www.continuum.io/anaconda-overview anaconda2] <br />
| align="left" | 5.1.0 <br />
| <div>Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture</div> <br />
|- <br />
| align="left" | [https://www.r-project.org/ anaconda3] <br />
| align="left" | 5.2.0 <br />
| <div>R is a free software environment for statistical computing and graphics</div> <br />
|- <br />
| align="left" | [https://github.com/smirarab/ASTRAL astral] <br />
| align="left" | 4.7.12 <br />
| <div>ASTRAL is a tool for estimating an unrooted species tree given a set of unrooted gene trees</div> <br />
|- <br />
| align="left" | [https://www.htslib.org/ bcftools] <br />
| align="left" | 1.8 <br />
| <div>SAMtools is a suite of programs for interacting with high-throughput sequencing data</div> <br />
|- <br />
| align="left" | [https://github.com/arq5x/bedtools2 bedtools] <br />
| align="left" | 2.27.1 <br />
| <div>The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage</div> <br />
|- <br />
| align="left" | [https://blast.ncbi.nlm.nih.gov/ blast+] <br />
| align="left" | 2.7.1 <br />
| <div>Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences</div> <br />
|- <br />
| align="left" | [https://www.boost.org/ boost] <br />
| align="left" | 1.67.0&nbsp; 1.66.0 <br />
| <div>Boost provides free peer-reviewed portable C++ source libraries</div> <br />
|- <br />
| align="left" | [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml bowtie2] <br />
| align="left" | 2.3.4.3 <br />
| <div>Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences</div> <br />
|- <br />
| align="left" | [http://bio-bwa.sourceforge.net/ bwa] <br />
| align="left" | 0.7.17 <br />
| <div>Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome</div> <br />
|- <br />
| align="left" | [https://github.com/brentp/bwa-meth bwameth] <br />
| align="left" | 0.4.0 <br />
| <div>Fast and accurate alignment of BS-Seq reads</div> <br />
|- <br />
| align="left" | [https://www.cmake.org cmake] <br />
| align="left" | 3.12.3 <br />
| <div>CMake, the cross-platform, open-source build system</div> <br />
|- <br />
| align="left" | [http://opensource.scilifelab.se/projects/cutadapt/ cutadapt] <br />
| align="left" | 2.1 <br />
| <div>Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads</div> <br />
|- <br />
| align="left" | [http://deeptools.readthedocs.org/ deeptools] <br />
| align="left" | 3.2.1-anaconda2 <br />
| <div>deepTools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq</div> <br />
|- <br />
| align="left" | [https://bioconductor.org/packages/release/bioc/html/DEXSeq.html dexseq] <br />
| align="left" | 1.24.4 <br />
| <div>Inference of differential exon usage in RNA sequencing</div> <br />
|- <br />
| align="left" | [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ fastqc] <br />
| align="left" | 0.11.8 <br />
| <div>FastQC is a quality control application for high throughput sequence data</div> <br />
|- <br />
| align="left" | [http://www.fftw.org fftw] <br />
| align="left" | 3.3.7 <br />
| <div>FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data</div> <br />
|- <br />
| align="left" | [https://gcc.gnu.org gcc] <br />
| align="left" | 7.3.0 <br />
| <div>The GNU Compiler Collection for C, C++, and Fortran</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/gdb/gdb.html gdb] <br />
| align="left" | 8.1 <br />
| <div>The GNU Project Debugger</div> <br />
|- <br />
| align="left" | [https://git-annex.branchable.com/ git-annex] <br />
| align="left" | 2.8.1 <br />
| <div>git-annex allows managing files with git, without checking the file contents into git</div> <br />
|- <br />
| align="left" | [https://gmplib.org/ gmp] <br />
| align="left" | 6.1.2 <br />
| <div>GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/parallel gnu-parallel] <br />
| align="left" | 20180322 <br />
| <div>GNU parallel is a shell tool for executing (usually serial) jobs in parallel</div> <br />
|- <br />
| align="left" | [http://gnuplot.sourceforge.net/ gnuplot] <br />
| align="left" | 5.2.2 <br />
| <div>Portable interactive, function plotting utility</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/gsl/ gsl] <br />
| align="left" | 2.4 <br />
| <div>The GNU Scientific Library (GSL) is a numerical library for C and C++</div> <br />
|- <br />
| align="left" | [https://portal.hdfgroup.org/display/support hdf5] <br />
| align="left" | 1.8.20&nbsp; 1.10.4 <br />
| <div>HDF5 is a data model, library, and file format for storing and managing data</div> <br />
|- <br />
| align="left" | [https://ccb.jhu.edu/software/hisat2/index.shtml hisat2] <br />
| align="left" | 2.1.0 <br />
| <div>HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome)</div> <br />
|- <br />
| align="left" | [https://www-huber.embl.de/users/anders/HTSeq/ htseq] <br />
| align="left" | 0.11.1-anaconda2&nbsp; 0.11.1 <br />
| <div>A framework to process and analyze data from high-throughput sequencing (HTS) assays</div> <br />
|- <br />
| align="left" | [http://www.htslib.org/ htslib] <br />
| align="left" | 1.8 <br />
| <div>A C library for reading/writing high-throughput sequencing data</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/parallel-studio-xe intel] <br />
| align="left" | 2018.4 <br />
| <div>Intel compilers suite for C, C++, and Fortran, including the MKL, TBB, IPP, DAAL, and PSTL libraries</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/mpi-library intelmpi] <br />
| align="left" | 2018.4 <br />
| <div>Intel MPI library with compiler wrappers for C, C++, and Fortran</div> <br />
|- <br />
| align="left" | [https://java.com/ java] <br />
| align="left" | 1.8.0_201 <br />
| <div>Java Platform, Standard Edition (Java SE) lets you develop and deploy Java applications on desktops and servers</div> <br />
|- <br />
| align="left" | [https://github.com/LMDB/lmdb lmdb] <br />
| align="left" | 0.9.22 <br />
| <div>OpenLDAP's Lightning Memory-Mapped Database (LMDB) library</div> <br />
|- <br />
| align="left" | [https://github.com/dpryan79/MethylDackel methyldackel] <br />
| align="left" | 0.4.0 <br />
| <div>A (mostly) universal methylation extractor for BS-seq experiments</div> <br />
|- <br />
| align="left" | [https://www.bioinf.uni-leipzig.de/Software/metilene/ metilene] <br />
| align="left" | 0.2.7 <br />
| <div>Fast and sensitive detection of differential DNA methylation</div> <br />
|- <br />
| align="left" | [http://hollywood.mit.edu/burgelab/miso miso] <br />
| align="left" | 0.5.4 <br />
| <div>A probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/mkl mkl] <br />
| align="left" | 2018.4 <br />
| <div>Intel Math Kernel Library</div> <br />
|- <br />
| align="left" | [https://www.unidata.ucar.edu/software/netcdf/ netcdf] <br />
| align="left" | 4.6.1 <br />
| <div>NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data</div> <br />
|- <br />
| align="left" | [https://www.open-mpi.org/ openmpi] <br />
| align="left" | 3.1.1 <br />
| <div>The Open MPI Project is an open source MPI-2 implementation</div> <br />
|- <br />
| align="left" | [http://oprofile.sourceforge.net oprofile] <br />
| align="left" | 1.3.0 <br />
| <div>OProfile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead</div> <br />
|- <br />
| align="left" | [http://www.stevekellylab.com/software/orthofinder orthofinder] <br />
| align="left" | 2.2.7 <br />
| <div>Program for identifying orthologous protein sequence families</div> <br />
|- <br />
| align="left" | [http://www.astro.caltech.edu/~tjp/pgplot/ pgplot] <br />
| align="left" | 5.2.2-x <br />
| <div>Graphics subroutine library for C/C++ and Fortran</div> <br />
|- <br />
| align="left" | [http://prinseq.sourceforge.net prinseq] <br />
| align="left" | 0.20.4 <br />
| <div>A bioinformatics tool to PRe-process and show INformation of SEQuence data</div> <br />
|- <br />
| align="left" | [https://python.org/ python] <br />
| align="left" | 3.6.8 <br />
| <div>Python is a programming language that lets you work more quickly and integrate your systems more effectively</div> <br />
|- <br />
| align="left" | [https://www.r-project.org/ r] <br />
| align="left" | 3.5.1&nbsp; 3.5.0 <br />
| <div>R is a free software environment for statistical computing and graphics</div> <br />
|- <br />
| align="left" | [https://github.com/vanzonr/rarray rarray] <br />
| align="left" | 1.2 <br />
| <div>Library for runtime multi-dimensional arrays in C++</div> <br />
|- <br />
| align="left" | [https://github.com/stamatak/standard-RAxML raxml] <br />
| align="left" | 8.2.12 <br />
| <div>RAxML search algorithm for maximum likelihood based inference of phylogenetic trees</div> <br />
|- <br />
| align="left" | [https://www.htslib.org/ samtools] <br />
| align="left" | 1.8 <br />
| <div>SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format</div> <br />
|- <br />
| align="left" | [https://www.sylabs.io/docs/ singularity] <br />
| align="left" | 2.6.1 <br />
| <div>Singularity is a portable application stack packaging and runtime utility.</div> <br />
|- <br />
| align="left" | [https://www.hwaci.com/sw/sqlite/ sqlite] <br />
| align="left" | 3.23.0 <br />
| <div>SQLite: SQL Database Engine in a C Library</div> <br />
|- <br />
| align="left" | [https://ccb.jhu.edu/software/stringtie stringtie] <br />
| align="left" | 1.3.5 <br />
| <div>StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts</div> <br />
|- <br />
| align="left" | [https://01.org/tbb/ tbb] <br />
| align="left" | 2019.4 <br />
| <div>Intel(R) Threading Building Blocks (Intel(R) TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable, composable and have future-proof scalability</div> <br />
|- <br />
| align="left" | [https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ trimgalore] <br />
| align="left" | 0.6.0 <br />
| <div>Function A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries</div> <br />
|- <br />
| align="left" | [https://cran.rstudio.com/web/packages/UpSetR upsetr] <br />
| align="left" | 1.3.3 <br />
| <div>R implementation of the UpSet set visualization technique published by Lex, Gehlenborg, et al</div> <br />
|- <br />
| align="left" | [http://valgrind.org valgrind] <br />
| align="left" | 3.14.0 <br />
| <div>Valgrind provides debugging and profiling tools</div> <br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=2225Teach2019-06-24T12:57:41Z<p>Bmundim: /* Login/Devel Node */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with 16 cores (from two octal core Intel XeonSandybridge E5-2650 CPUs) running at 2.0GHz with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
The first time you login to Teach cluster, please make sure to check if the login node ssh key fingerprint<br />
matches. See here how. [[Teach_fingerprints | See here how]].<br />
<br />
== Interactive jobs ==<br />
<br />
The login node teach01 is shared between students of a <br />
number of different courses. Use this node to develop and compile <br />
code, to run short tests, and to submit computations to the scheduler. <br />
<br />
For a interactive sessions on a compute node of the teach cluster, use the 'debugjob' command. <br />
teach01:~$ debugjob -n C<br />
where C is the number of cores. An interactive session defaults to four hours when using at most one node (C<=16), and becomes 60 minutes when using four nodes (i.e., 48<C<=64), which the maximum number of nodes allowed for an interactive session by debugjob.<br />
<br />
For a short interactive sessions on a dedicated compute node of the teach cluster, use the 'debugjob' command as follows: <br />
teach01:~$ debugjob N<br />
where N is the number of nodes. On the Teach cluster, this is equivalent to <tt>debugjob -n 16*N </tt>. The positive integer number <tt>N</tt> can at most be 4.<br />
<br />
If no arguments are given to <tt>debugjob</tt>, it allocatesa single core on a Teach compute node.<br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
It is worth mentioning some differences between the Niagara and Teach clusters:<br />
* On the Teach cluster, $HOME is writable on the compute nodes. On Niagara, $HOME is read-only on the compute nodes, so in most cases, you will want to submit from your $SCRATCH directory.<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There is only 1 queue available: the compute queue. Its usage limit is listed on the table below.<br />
* 7 of the teach computing nodes have more memory than the 64GB default memory size. 5 of them have 128GB and 2 of them, 256GB. To run a big memory job on these nodes you need to add the following directive to your submitting script: #SBATCH --constraint=m128G. Replace m128G for m256G if you want your job to run exclusively on the 256GB nodes.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the only case on teach cluster. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 6 || 12 || 1 core || 8 nodes (128 cores)|| 15 minutes || 4 hours<br />
|}<br />
<br />
Within these limits, jobs may still have to wait in the queue. Although there are no allocations on the teach cluster, the waiting time still depends on many factors, such as the number of nodes and the walltime, how many other jobs are waiting in the queue, and whether a job can fill an otherwise unused spot in the schedule.<br />
<br />
== Jupyter Hub ==<br />
<br />
Some courses, like the Summer School, use Jupyter notebooks. In those cases, some (or all) of the large memory compute nodes are dedicated as jupyterhub nodes. <br />
<br />
To connect to these, you must first set up an ssh tunnel with the following command:<br />
<br />
ssh -L8888:jupyterhub7:8000 teach.scinet.utoronto.ca -N<br />
<br />
Instead of jupyterhub7, you can also choose jupyterhub1, jupyterhub2, jupyterhub3, jupyterhub4, jupyterhub5, or jupyterhub 6.<br />
<br />
This command will seem to 'hang' there, but the tunnel will have been established.<br />
<br />
Next, open your browser and go to <tt>https://localhost:8888</tt> and you can login to the jupyterhub. <br />
<br />
Note: You will likely have to tell your browser to trust this site.<br />
<br />
== Software Modules ==<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
<br />
{| class="wikitable sortable" style="width:100%" <br />
! style="width: 25%" align="center" | Module <br />
! style="width: 17%" align="center" | Versions (2018a) <br />
! align="center" | Description <br />
|- <br />
| align="left" | [https://www.continuum.io/anaconda-overview anaconda2] <br />
| align="left" | 5.1.0 <br />
| <div>Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture</div> <br />
|- <br />
| align="left" | [https://www.r-project.org/ anaconda3] <br />
| align="left" | 5.2.0 <br />
| <div>R is a free software environment for statistical computing and graphics</div> <br />
|- <br />
| align="left" | [https://github.com/smirarab/ASTRAL astral] <br />
| align="left" | 4.7.12 <br />
| <div>ASTRAL is a tool for estimating an unrooted species tree given a set of unrooted gene trees</div> <br />
|- <br />
| align="left" | [https://www.htslib.org/ bcftools] <br />
| align="left" | 1.8 <br />
| <div>SAMtools is a suite of programs for interacting with high-throughput sequencing data</div> <br />
|- <br />
| align="left" | [https://github.com/arq5x/bedtools2 bedtools] <br />
| align="left" | 2.27.1 <br />
| <div>The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage</div> <br />
|- <br />
| align="left" | [https://blast.ncbi.nlm.nih.gov/ blast+] <br />
| align="left" | 2.7.1 <br />
| <div>Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences</div> <br />
|- <br />
| align="left" | [https://www.boost.org/ boost] <br />
| align="left" | 1.67.0&nbsp; 1.66.0 <br />
| <div>Boost provides free peer-reviewed portable C++ source libraries</div> <br />
|- <br />
| align="left" | [http://bowtie-bio.sourceforge.net/bowtie2/index.shtml bowtie2] <br />
| align="left" | 2.3.4.3 <br />
| <div>Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences</div> <br />
|- <br />
| align="left" | [http://bio-bwa.sourceforge.net/ bwa] <br />
| align="left" | 0.7.17 <br />
| <div>Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome</div> <br />
|- <br />
| align="left" | [https://github.com/brentp/bwa-meth bwameth] <br />
| align="left" | 0.4.0 <br />
| <div>Fast and accurate alignment of BS-Seq reads</div> <br />
|- <br />
| align="left" | [https://www.cmake.org cmake] <br />
| align="left" | 3.12.3 <br />
| <div>CMake, the cross-platform, open-source build system</div> <br />
|- <br />
| align="left" | [http://opensource.scilifelab.se/projects/cutadapt/ cutadapt] <br />
| align="left" | 2.1 <br />
| <div>Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads</div> <br />
|- <br />
| align="left" | [http://deeptools.readthedocs.org/ deeptools] <br />
| align="left" | 3.2.1-anaconda2 <br />
| <div>deepTools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq</div> <br />
|- <br />
| align="left" | [https://bioconductor.org/packages/release/bioc/html/DEXSeq.html dexseq] <br />
| align="left" | 1.24.4 <br />
| <div>Inference of differential exon usage in RNA sequencing</div> <br />
|- <br />
| align="left" | [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ fastqc] <br />
| align="left" | 0.11.8 <br />
| <div>FastQC is a quality control application for high throughput sequence data</div> <br />
|- <br />
| align="left" | [http://www.fftw.org fftw] <br />
| align="left" | 3.3.7 <br />
| <div>FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data</div> <br />
|- <br />
| align="left" | [https://gcc.gnu.org gcc] <br />
| align="left" | 7.3.0 <br />
| <div>The GNU Compiler Collection for C, C++, and Fortran</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/gdb/gdb.html gdb] <br />
| align="left" | 8.1 <br />
| <div>The GNU Project Debugger</div> <br />
|- <br />
| align="left" | [https://git-annex.branchable.com/ git-annex] <br />
| align="left" | 2.8.1 <br />
| <div>git-annex allows managing files with git, without checking the file contents into git</div> <br />
|- <br />
| align="left" | [https://gmplib.org/ gmp] <br />
| align="left" | 6.1.2 <br />
| <div>GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/parallel gnu-parallel] <br />
| align="left" | 20180322 <br />
| <div>GNU parallel is a shell tool for executing (usually serial) jobs in parallel</div> <br />
|- <br />
| align="left" | [http://gnuplot.sourceforge.net/ gnuplot] <br />
| align="left" | 5.2.2 <br />
| <div>Portable interactive, function plotting utility</div> <br />
|- <br />
| align="left" | [https://www.gnu.org/software/gsl/ gsl] <br />
| align="left" | 2.4 <br />
| <div>The GNU Scientific Library (GSL) is a numerical library for C and C++</div> <br />
|- <br />
| align="left" | [https://portal.hdfgroup.org/display/support hdf5] <br />
| align="left" | 1.8.20&nbsp; 1.10.4 <br />
| <div>HDF5 is a data model, library, and file format for storing and managing data</div> <br />
|- <br />
| align="left" | [https://ccb.jhu.edu/software/hisat2/index.shtml hisat2] <br />
| align="left" | 2.1.0 <br />
| <div>HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome)</div> <br />
|- <br />
| align="left" | [https://www-huber.embl.de/users/anders/HTSeq/ htseq] <br />
| align="left" | 0.11.1-anaconda2&nbsp; 0.11.1 <br />
| <div>A framework to process and analyze data from high-throughput sequencing (HTS) assays</div> <br />
|- <br />
| align="left" | [http://www.htslib.org/ htslib] <br />
| align="left" | 1.8 <br />
| <div>A C library for reading/writing high-throughput sequencing data</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/parallel-studio-xe intel] <br />
| align="left" | 2018.4 <br />
| <div>Intel compilers suite for C, C++, and Fortran, including the MKL, TBB, IPP, DAAL, and PSTL libraries</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/mpi-library intelmpi] <br />
| align="left" | 2018.4 <br />
| <div>Intel MPI library with compiler wrappers for C, C++, and Fortran</div> <br />
|- <br />
| align="left" | [https://java.com/ java] <br />
| align="left" | 1.8.0_201 <br />
| <div>Java Platform, Standard Edition (Java SE) lets you develop and deploy Java applications on desktops and servers</div> <br />
|- <br />
| align="left" | [https://github.com/LMDB/lmdb lmdb] <br />
| align="left" | 0.9.22 <br />
| <div>OpenLDAP's Lightning Memory-Mapped Database (LMDB) library</div> <br />
|- <br />
| align="left" | [https://github.com/dpryan79/MethylDackel methyldackel] <br />
| align="left" | 0.4.0 <br />
| <div>A (mostly) universal methylation extractor for BS-seq experiments</div> <br />
|- <br />
| align="left" | [https://www.bioinf.uni-leipzig.de/Software/metilene/ metilene] <br />
| align="left" | 0.2.7 <br />
| <div>Fast and sensitive detection of differential DNA methylation</div> <br />
|- <br />
| align="left" | [http://hollywood.mit.edu/burgelab/miso miso] <br />
| align="left" | 0.5.4 <br />
| <div>A probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples</div> <br />
|- <br />
| align="left" | [https://software.intel.com/en-us/mkl mkl] <br />
| align="left" | 2018.4 <br />
| <div>Intel Math Kernel Library</div> <br />
|- <br />
| align="left" | [https://www.unidata.ucar.edu/software/netcdf/ netcdf] <br />
| align="left" | 4.6.1 <br />
| <div>NetCDF (network Common Data Form) is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data</div> <br />
|- <br />
| align="left" | [https://www.open-mpi.org/ openmpi] <br />
| align="left" | 3.1.1 <br />
| <div>The Open MPI Project is an open source MPI-2 implementation</div> <br />
|- <br />
| align="left" | [http://oprofile.sourceforge.net oprofile] <br />
| align="left" | 1.3.0 <br />
| <div>OProfile is a system-wide profiler for Linux systems, capable of profiling all running code at low overhead</div> <br />
|- <br />
| align="left" | [http://www.stevekellylab.com/software/orthofinder orthofinder] <br />
| align="left" | 2.2.7 <br />
| <div>Program for identifying orthologous protein sequence families</div> <br />
|- <br />
| align="left" | [http://www.astro.caltech.edu/~tjp/pgplot/ pgplot] <br />
| align="left" | 5.2.2-x <br />
| <div>Graphics subroutine library for C/C++ and Fortran</div> <br />
|- <br />
| align="left" | [http://prinseq.sourceforge.net prinseq] <br />
| align="left" | 0.20.4 <br />
| <div>A bioinformatics tool to PRe-process and show INformation of SEQuence data</div> <br />
|- <br />
| align="left" | [https://python.org/ python] <br />
| align="left" | 3.6.8 <br />
| <div>Python is a programming language that lets you work more quickly and integrate your systems more effectively</div> <br />
|- <br />
| align="left" | [https://www.r-project.org/ r] <br />
| align="left" | 3.5.1&nbsp; 3.5.0 <br />
| <div>R is a free software environment for statistical computing and graphics</div> <br />
|- <br />
| align="left" | [https://github.com/vanzonr/rarray rarray] <br />
| align="left" | 1.2 <br />
| <div>Library for runtime multi-dimensional arrays in C++</div> <br />
|- <br />
| align="left" | [https://github.com/stamatak/standard-RAxML raxml] <br />
| align="left" | 8.2.12 <br />
| <div>RAxML search algorithm for maximum likelihood based inference of phylogenetic trees</div> <br />
|- <br />
| align="left" | [https://www.htslib.org/ samtools] <br />
| align="left" | 1.8 <br />
| <div>SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format</div> <br />
|- <br />
| align="left" | [https://www.sylabs.io/docs/ singularity] <br />
| align="left" | 2.6.1 <br />
| <div>Singularity is a portable application stack packaging and runtime utility.</div> <br />
|- <br />
| align="left" | [https://www.hwaci.com/sw/sqlite/ sqlite] <br />
| align="left" | 3.23.0 <br />
| <div>SQLite: SQL Database Engine in a C Library</div> <br />
|- <br />
| align="left" | [https://ccb.jhu.edu/software/stringtie stringtie] <br />
| align="left" | 1.3.5 <br />
| <div>StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts</div> <br />
|- <br />
| align="left" | [https://01.org/tbb/ tbb] <br />
| align="left" | 2019.4 <br />
| <div>Intel(R) Threading Building Blocks (Intel(R) TBB) lets you easily write parallel C++ programs that take full advantage of multicore performance, that are portable, composable and have future-proof scalability</div> <br />
|- <br />
| align="left" | [https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ trimgalore] <br />
| align="left" | 0.6.0 <br />
| <div>Function A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries</div> <br />
|- <br />
| align="left" | [https://cran.rstudio.com/web/packages/UpSetR upsetr] <br />
| align="left" | 1.3.3 <br />
| <div>R implementation of the UpSet set visualization technique published by Lex, Gehlenborg, et al</div> <br />
|- <br />
| align="left" | [http://valgrind.org valgrind] <br />
| align="left" | 3.14.0 <br />
| <div>Valgrind provides debugging and profiling tools</div> <br />
|}</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach_fingerprints&diff=2224Teach fingerprints2019-06-24T12:54:06Z<p>Bmundim: </p>
<hr />
<div>The first time you login to Teach cluster you'll be asked to confirm the ssh host key fingerprints:<br />
<br />
$ ssh teach.scinet.utoronto.ca<br />
The authenticity of host 'teach.scinet.utoronto.ca (142.150.188.92)' can't be established.<br />
ED25519 key fingerprint is b4:ae:76:a5:2b:37:8d:57:06:0e:9a:de:62:00:26:be.<br />
Are you sure you want to continue connecting (yes/no)?<br />
<br />
Make sure the fingerprints are correct! You should see either one of the ED25519 fingerprints:<br />
<br />
ED25519 key fingerprint is SHA256:SauX2nL+Yso9KBo2Ca6GH/V9cSFLFXwxOECGWXZ5pxc.<br />
ED25519 key fingerprint is MD5:b4:ae:76:a5:2b:37:8d:57:06:0e:9a:de:62:00:26:be.<br />
<br />
or one of the following RSA fingerprints:<br />
<br />
RSA key fingerprint is SHA256:k6YEhYsI73M+NJIpZ8yF+wqWeuXS9avNs2s5QS/0VhU.<br />
RSA key fingerprint is MD5:98:e7:7a:07:89:ef:3f:d8:68:3d:47:9c:6e:a6:71:5e.<br />
<br />
depending on how old your ssh client is and how it is configured.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach_fingerprints&diff=2223Teach fingerprints2019-06-24T12:52:45Z<p>Bmundim: Created page with "The first time you login to Teach cluster you'll be asked to confirm the ssh host key fingerprints: $ ssh teach.scinet.utoronto.ca The authenticity of host 'teach.scinet.ut..."</p>
<hr />
<div>The first time you login to Teach cluster you'll be asked to confirm the ssh host key fingerprints:<br />
<br />
$ ssh teach.scinet.utoronto.ca<br />
The authenticity of host 'teach.scinet.utoronto.ca (142.150.188.92)' can't be established.<br />
ED25519 key fingerprint is b4:ae:76:a5:2b:37:8d:57:06:0e:9a:de:62:00:26:be.<br />
Are you sure you want to continue connecting (yes/no)?<br />
<br />
Make sure the fingerprints are correct! You should see either one of the ED25519 fingerprints:<br />
<br />
ED25519 key fingerprint is SHA256:SauX2nL+Yso9KBo2Ca6GH/V9cSFLFXwxOECGWXZ5pxc.<br />
ED25519 key fingerprint is MD5:b4:ae:76:a5:2b:37:8d:57:06:0e:9a:de:62:00:26:be.<br />
<br />
or one of the following RSA fingerprints:<br />
<br />
RSA key fingerprint is SHA256:k6YEhYsI73M+NJIpZ8yF+wqWeuXS9avNs2s5QS/0VhU.<br />
RSA key fingerprint is MD5:98:e7:7a:07:89:ef:3f:d8:68:3d:47:9c:6e:a6:71:5e.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=2201Teach2019-06-10T16:07:47Z<p>Bmundim: /* Limits */ Remove reference to debug partition</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with 16 cores (from two octal core Intel XeonSandybridge E5-2650 CPUs) running at 2.0GHz with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
<br />
== Software Modules ==<br />
<pre> <br />
anaconda3/5.2.0<br />
blast+/2.7.1<br />
boost/1.66.0, boost/1.67.0<br />
cmake/3.12.3<br />
ddt/18.1.2, ddt/18.2<br />
fftw/3.3.7<br />
gcc/7.3.0<br />
gmp/6.1.2<br />
gnu-parallel/20180322<br />
gnuplot/5.2.2<br />
gsl/2.4<br />
hdf5/1.8.20<br />
hisat2/2.1.0<br />
htseq/0.11.1<br />
intel/2018.4<br />
intelmpi/2018.4<br />
lmdb/0.9.22<br />
mkl/2018.4<br />
netcdf/4.6.1<br />
openmpi/3.1.1<br />
oprofile/1.3.0<br />
orthofinder/2.2.7<br />
r/3.4.3-anaconda5.1.0, r/3.5.0, r/3.5.1<br />
samtools/1.8<br />
stringtie/1.3.5<br />
valgrind/3.14.0<br />
</pre><br />
<br />
== Interactive jobs ==<br />
<br />
The login node teach01 is shared between students of a <br />
number of different courses. Use this node to develop and compile <br />
code, to run short tests, and to submit computations to the scheduler. <br />
<br />
For a short interactive sessions on a dedicated compute node of the teach cluster, use the 'debugjob' command. <br />
teach01:~$ debugjob N<br />
where N is the number of nodes. An interactive session defaults to three hours when N=1, and 45 minutes when N=4 (the maximum number of nodes allowed for an interactive session by debugjob).<br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
In most cases, you will want to submit from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
It is worth mentioning some differences between niagara and teach clusters:<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There is only 1 queue available: the compute queue. Its usage limit is listed on the table below.<br />
* 7 of the teach computing nodes have more memory than the 64GB default memory size. 5 of them have 128GB and 2 of them, 256GB. To run a big memory job on these nodes you need to add the following directive to your submitting script: #SBATCH --constraint=m128G. Replace m128G for m256G if you want your job to run exclusively on the 256GB nodes.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the only case on teach cluster. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 6 || 12 || 1 core || 8 nodes (128 cores)|| 15 minutes || 4 hours<br />
|}<br />
<br />
Within these limits, jobs will still have to wait in the queue. The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=2200Teach2019-06-10T16:05:37Z<p>Bmundim: /* Submit a Job */ Add info about submitting jobs with constraint to big memory nodes</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with 16 cores (from two octal core Intel XeonSandybridge E5-2650 CPUs) running at 2.0GHz with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
<br />
== Software Modules ==<br />
<pre> <br />
anaconda3/5.2.0<br />
blast+/2.7.1<br />
boost/1.66.0, boost/1.67.0<br />
cmake/3.12.3<br />
ddt/18.1.2, ddt/18.2<br />
fftw/3.3.7<br />
gcc/7.3.0<br />
gmp/6.1.2<br />
gnu-parallel/20180322<br />
gnuplot/5.2.2<br />
gsl/2.4<br />
hdf5/1.8.20<br />
hisat2/2.1.0<br />
htseq/0.11.1<br />
intel/2018.4<br />
intelmpi/2018.4<br />
lmdb/0.9.22<br />
mkl/2018.4<br />
netcdf/4.6.1<br />
openmpi/3.1.1<br />
oprofile/1.3.0<br />
orthofinder/2.2.7<br />
r/3.4.3-anaconda5.1.0, r/3.5.0, r/3.5.1<br />
samtools/1.8<br />
stringtie/1.3.5<br />
valgrind/3.14.0<br />
</pre><br />
<br />
== Interactive jobs ==<br />
<br />
The login node teach01 is shared between students of a <br />
number of different courses. Use this node to develop and compile <br />
code, to run short tests, and to submit computations to the scheduler. <br />
<br />
For a short interactive sessions on a dedicated compute node of the teach cluster, use the 'debugjob' command. <br />
teach01:~$ debugjob N<br />
where N is the number of nodes. An interactive session defaults to three hours when N=1, and 45 minutes when N=4 (the maximum number of nodes allowed for an interactive session by debugjob).<br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
In most cases, you will want to submit from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
It is worth mentioning some differences between niagara and teach clusters:<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There is only 1 queue available: the compute queue. Its usage limit is listed on the table below.<br />
* 7 of the teach computing nodes have more memory than the 64GB default memory size. 5 of them have 128GB and 2 of them, 256GB. To run a big memory job on these nodes you need to add the following directive to your submitting script: #SBATCH --constraint=m128G. Replace m128G for m256G if you want your job to run exclusively on the 256GB nodes.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 6 || 12 || 1 core || 8 nodes (128 cores)|| 15 minutes || 4 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (16 cores) || 4 nodes (64 cores)|| N/A || 1 hour<br />
|}<br />
<br />
Within these limits, jobs will still have to wait in the queue. The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=2198Teach2019-06-10T15:56:30Z<p>Bmundim: /* Interactive jobs */ Some rewording.</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with 16 cores (from two octal core Intel XeonSandybridge E5-2650 CPUs) running at 2.0GHz with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
<br />
== Software Modules ==<br />
<pre> <br />
anaconda3/5.2.0<br />
blast+/2.7.1<br />
boost/1.66.0, boost/1.67.0<br />
cmake/3.12.3<br />
ddt/18.1.2, ddt/18.2<br />
fftw/3.3.7<br />
gcc/7.3.0<br />
gmp/6.1.2<br />
gnu-parallel/20180322<br />
gnuplot/5.2.2<br />
gsl/2.4<br />
hdf5/1.8.20<br />
hisat2/2.1.0<br />
htseq/0.11.1<br />
intel/2018.4<br />
intelmpi/2018.4<br />
lmdb/0.9.22<br />
mkl/2018.4<br />
netcdf/4.6.1<br />
openmpi/3.1.1<br />
oprofile/1.3.0<br />
orthofinder/2.2.7<br />
r/3.4.3-anaconda5.1.0, r/3.5.0, r/3.5.1<br />
samtools/1.8<br />
stringtie/1.3.5<br />
valgrind/3.14.0<br />
</pre><br />
<br />
== Interactive jobs ==<br />
<br />
The login node teach01 is shared between students of a <br />
number of different courses. Use this node to develop and compile <br />
code, to run short tests, and to submit computations to the scheduler. <br />
<br />
For a short interactive sessions on a dedicated compute node of the teach cluster, use the 'debugjob' command. <br />
teach01:~$ debugjob N<br />
where N is the number of nodes. An interactive session defaults to three hours when N=1, and 45 minutes when N=4 (the maximum number of nodes allowed for an interactive session by debugjob).<br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
In most cases, you will want to submit from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
It is worth mentioning some differences between niagara and teach clusters:<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There are 2 queues available: Compute queue and debug queue. Their usage limits are listed on the table below.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 6 || 12 || 1 core || 8 nodes (128 cores)|| 15 minutes || 4 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (16 cores) || 4 nodes (64 cores)|| N/A || 1 hour<br />
|}<br />
<br />
Within these limits, jobs will still have to wait in the queue. The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Slurm&diff=2076Slurm2019-04-12T18:52:50Z<p>Bmundim: /* Limits */</p>
<hr />
<div>The queueing system used at SciNet is based around the [https://slurm.schedmd.com Slurm Workload Manager]. This "scheduler", Slurm, determines which jobs will be run on which compute nodes, and when. This page outlines how to submit jobs, how to interact with the scheduler, and some of the most common Slurm commands.<br />
<br />
Some common questions about the queuing system can be found on the [[FAQ]] as well.<br />
<br />
= Submitting jobs =<br />
<br />
You submit jobs from a Niagara login node. This is done by passing a script to the sbatch command:<br />
<br />
nia-login07:~$ sbatch jobscript.sh<br />
<br />
This puts the job, described by the job script, into the queue. The scheduler will will run the job on the compute nodes in due course. A typical submission script is as follows.<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
<br />
Some notes about this example:<br />
* The first line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>).<br />
* In this case, SLURM looks for 2 nodes with 40 cores on which to run 80 tasks, for 1 hour.<br />
* Note that the mpifun flag "--ppn" (processors per node) is ignored. Slurm takes care of this detail.<br />
* Once the scheduler finds a spot to run the job, it runs the script:<br />
** It changes to the submission directory;<br />
** Loads modules;<br />
** Runs the <code>mpi_example</code> application.<br />
* To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).<br />
<br />
To create a job script appropriate for your work, you must modify the commands above to instruct Slurm to run the commands you need run.<br />
<br />
== Things to remember ==<br />
<br />
There are some things to always bear in mind when crafting your submission script:<br />
* Scheduling is by node, so in multiples of 40 cores. You are expected to use all 40 cores! If you are running serial jobs, and need assistance bundling your work into multiples of 40, please see the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page.<br />
* Jobs must write to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access. Download data you need before submitting your job.<br />
* Jobs will run under your group's RRG allocation. If your group does not have an allocation, your job will run under your group's RAS allocation (previously called `default' allocation). Note that groups with an allocation cannot run under a default allocation.<br />
* The maximum [[Wallclock_time | walltime]] for all users is 24 hours. The minimum and default walltime is 15 minutes.<br />
<br />
= Scheduling details =<br />
<br />
We now present the details of how to write a job script, and some extra commands which you might find useful.<br />
<br />
== SLURM nomenclature: jobs, nodes, tasks, cpus, cores, threads ==<br />
<br />
SLURM has a somewhat different way of referring to things like MPI processes and thread tasks, as compared to our previous scheduler, MOAB. The SLURM nomenclature is reflected in the names of scheduler options (i.e., resource requests). SLURM strictly enforces those requests, so it is important to get this right.<br />
<br />
{| class="wikitable"<br />
!term <br />
!meaning <br />
!SLURM term<br />
!related scheduler options <br />
|-<br />
|job<br />
|scheduled piece of work for which specific resources were requested.<br />
|job<br />
|<tt>sbatch, salloc</tt><br />
|-<br />
|node<br />
|basic computing component with several cores (40 for Niagara) that share memory <br />
|node<br />
|<tt>--nodes -N</tt><br />
|-<br />
|mpi process<br />
|one of a group of running programs using Message Passing Interface for parallel computing<br />
|task<br />
|<tt>--ntasks -n --ntasks-per-node</tt><br />
|-<br />
|core ''or'' physical cpu<br />
|A fully functional independent physical execution unit.<br />
| - <br />
| -<br />
|-<br />
|logical cpu<br />
|An execution unit that the operating system can assign work to. Operating systems can be configured to overload physical cores with multiple logical cpus using hyperthreading.<br />
|cpu<br />
|<tt>--cpus-per-task</tt><br />
|-<br />
|thread<br />
|one of possibly multiple simultaneous execution paths within a program, which can share memory.<br />
| -<br />
| <tt>--cpus-per-task</tt> '''and''' <tt>OMP_NUM_THREADS</tt><br />
|-<br />
|hyperthread<br />
|a thread run in a collection of threads that is larger than the number of physical cores.<br />
| -<br />
| -<br />
|}<br />
<br />
== Scheduling by Node ==<br />
<br />
* On many systems that use SLURM, the scheduler will deduce from the job script specifications (the number of tasks and the number of cpus-per-node) what resources should be allocated. On Niagara, this is a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours.<br />
** No other users are running anything on them.<br />
** You can ssh into them, while your job is running, to see how things are going.<br />
* Whatever you request of the scheduler, your request will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes. Each node has about 202GB of RAM available.<br />
* You should try to use all the cores on the nodes allocated to your job. Since there are 40 cores per node, your job should use N x 40 cores. If this is not the case, we will be contacted you to help you optimize your workflow. Again, users which have serials jobs should consult the [[Running Serial Jobs on Niagara | serial jobs]] page.<br />
<br />
== Hyperthreading: Logical CPUs vs. cores ==<br />
<br />
Hyperthreading, a technology that leverages more of the physical hardware by pretending there are twice as many logical cores than real cores, is enabled on Niagara.<br />
The operating system and scheduler see 80 logical CPUs.<br />
<br />
Using 80 logical CPUs versus 40 real cores typically gives about a 5-10% speedup, depending on your application (your mileage may vary).<br />
<br />
Because Niagara is scheduled by node, hyperthreading is actually fairly easy to use:<br />
* Ask for a certain number of nodes, N, for your job.<br />
* You know that you get 40 x N cores, so you will use (at least) a total of 40 x N MPI processes or threads (mpirun, srun, and the OS will automaticallly spread these over the real cores).<br />
* But you should also test if running 80 x N MPI processes or threads gives you any speedup.<br />
* Regardless, your usage will be counted as 40 x N x (walltime in years).<br />
<br />
Many applications which are communication-heavy can benefit from the use of hyperthreading.<br />
<br />
= Submission script details =<br />
<br />
This section outlines some details of how to interact with the scheduler, and how it implements Niagara's scheduling policies.<br />
<br />
== Queues ==<br />
<br />
There are 3 queues available on SciNet systems. These queues have different limits; see the [[#Limits | Limits]] section for further details.<br />
<br />
=== Compute ===<br />
<br />
The compute queue is the default queue. Most jobs will run in this queue. If no flags are specified in the submission script this is the queue where your job will land.<br />
<br />
=== Debug ===<br />
<br />
The Debug queue is a high-priority queue, used for short-term testing of your code. Do NOT use the debug queue for production work. You can use the debug queue one of two ways. To submit a standard job script to the debug queue, add the line<br />
#SBATCH -p debug<br />
to your submission script. This will put the job into the debug queue, and it should run in short order.<br />
<br />
To request an interactive debug session, where you retain control over the command line prompt, at a login node type the command<br />
nia-login07:~$ salloc -p debug --nodes 1 --time=1:00:00<br />
This will request 1 node for 1 hour. You can similarly request a debug session using the 'debugjob' command:<br />
nia-login07:~$ debugjob N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.<br />
<br />
=== Archive ===<br />
<br />
The archivelong and archiveshort queues are only used by the [[HPSS]] system. See that page for details on how to use these queues.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 50 || 1000 || 1 node (40&nbsp;cores) || default:&nbsp;20&nbsp;nodes&nbsp;(800&nbsp;cores) <br> with&nbsp;allocation:&nbsp;1000&nbsp;nodes&nbsp;(40000&nbsp;cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== Slurm Accounts ==<br />
<br />
To be able to prioritise jobs based on groups and allocations, the Slurm scheduler uses the concept of ''accounts''. Each group that has a Resource for Research Groups (RRG) or Research Platforms and Portals (RPP) allocation (awarded through an annual competition by Compute Canada) has an account that starts with <tt>rrg-</tt> or <tt>rpp-</tt>. Slurm assigns a 'fairshare' priority to these accounts based on the size of the award in core-years. Groups without an RRG or RPP can use Niagara using a so-called Rapid Access Service (RAS), and have an account that starts with <tt>def-</tt>.<br />
<br />
On Niagara, most users will only ever use one account, and those users do not need to specify the account to Slurm. However, users that are part of collaborations may be able to use multiple accounts, i.e., that of their sponsor and that of their collaborator, but this mean that they need to select the right account when running jobs. <br />
<br />
To select the account, just add <br />
<br />
#SBATCH -A [account]<br />
<br />
to the job scripts, or use the <tt>-A [account]</tt> to <tt>salloc</tt> or <tt>debugjob</tt>. <br />
<br />
To see which accounts you have access to, or what their names are, use the command<br />
<br />
sshare -U<br />
<br />
It has been noted that, in some cases, using the '-A' flag does not result in the appropriate account being used. To get around this, specify the account when sbatch is invoked:<br />
sbatch -A account myjobscript.sh<br />
<br />
== Slurm environment variables ==<br />
<br />
There are many environment variables built into Slurm. These are some which you may find useful:<br />
* SLURM_SUBMIT_DIR: directory from which the job was submitted.<br />
* SLURM_SUBMIT_HOST: host from which the job was submitted.<br />
* SLURM_JOB_ID: the job's id.<br />
* SLURM_JOB_NUM_NODES: number of nodes in the job.<br />
* SLURM_JOB_NODELIST: list of nodes assigned to the job.<br />
* SLURM_JOB_ACCOUNT: account associated with the job.<br />
<br />
Any of these environment variables can be accessed from within your job script.<br />
<br />
== Passing Variables to submission scripts ==<br />
It is possible to pass values through environment variables into your SLURM submission scripts.<br />
For doing so with already defined variables in your shell, just add the following directive in the submission script,<br />
<br />
#SBATCH --export=ALL<br />
<br />
and you will have access to any predefined environment variable.<br />
<br />
A better way is to specify explicitly which variables you want to pass into the submision script,<br />
<br />
sbatch --export=i=15,j='test' jobscript.sbatch<br />
<br />
You can even set the job name and output files using environment variables, eg.<br />
<br />
i="simulation"<br />
j=14<br />
sbatch --job-name=$i.$j.run --output=$i.$j.out --export=i=$i,j=$j jobscript.sbatch<br />
<br />
(The latter only works on the command line; you cannot use environment variables in <tt>#SBATCH</tt> lines in the job script.)<br />
<br />
'''Command line arguments:'''<br />
<br />
Command line arguments can also be used in the same way as command line argument for shell scripts. All command line arguments given to sbatch that follow after the job script name, will be passed to the job script. In fact, SLURM will not look at any of these arguments, so you must place all sbatch arguments before the script name, e.g.:<br />
<br />
sbatch -p debug jobscript.sbatch FirstArgument SecondArgument ...<br />
<br />
In this example, <tt>-p debug</tt> is interpreted by SLURM, while in your submission script you can access <tt>FirstArgument</tt>, <tt>SecondArgument</tt>, etc., by referring to <code>$1, $2, ...</code>.<br />
<br />
== Job arrays ==<br />
<br />
Sometimes you need to run the same job script many times, but just tweaking one value each time. One way of accomplishing this is using job arrays. Job arrays are invoked using the "-a" flag with sbatch:<br />
sbatch -a 1-100 myjobscript.sh<br />
This will submit 100 instances of myjobscript.sh. Within the job script you can distinguish which of those instances is running using the environment variable SLURM_ARRAY_TASK_ID.<br />
<br />
Note that Niagara [[#Limits | currently]] has a limit of 1000 submitted jobs for users within groups with allocations, and 200 submitted jobs without an allocation.<br />
<br />
== Job dependencies ==<br />
<br />
You can make one job dependent on the successful completion of another job using the following command:<br />
sbatch --dependency=afterok:JOBID myjobscript.sh<br />
This will make the current job submission not start until the parent job, with jobid JOBID, successfully completes. There are many job dependency options available. Visit the [https://slurm.schedmd.com/sbatch.html#OPT_dependency Slurm sbatch page ] for the full list. <br />
<br />
If the parent job fails (that is, ends with a non-zero exit code) the dependent job can never be scheduled and will be automatically cancelled.<br />
<br />
== Email Notification ==<br />
Email notification works, but you need to add the email address and type of notification you may want to receive in your submission script, eg.<br />
<br />
#SBATCH --mail-user=YOUR.email.ADDRESS<br />
#SBATCH --mail-type=ALL<br />
<br />
The sbatch man page (type <tt>man sbatch</tt> on Niagara) explains all possible mail-types.<br />
<br />
= Monitoring jobs =<br />
<br />
There are many options available for monitoring your jobs. The most basic of which is the squeue command:<br />
<br />
nia-login07:~$ squeue -u USERNAME<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
292047 compute myjob4 username PD 0:00 4 (Priority)<br />
292048 compute myjob3 username PD 0:00 4 (Priority)<br />
266829 compute myjob2 username R 18:56:17 2 nia[1397-1398]<br />
266828 compute myjob1 username R 18:56:46 1 nia1298<br />
<br />
Here you can see that we have two running jobs ('R') and two pending jobs ('PD'). The nodes being used are listed.<br />
<br />
== Job status ==<br />
<br />
To get an estimate of when a job will start, use the command<br />
squeue --start -j JOBID<br />
Note that this is only an estimate, and tends not to be very accurate.<br />
<br />
Information about a specific job can be found using the <br />
squeue -j JOBID<br />
or alternatively<br />
scontrol show job JOBID<br />
which is more verbose.<br />
<br />
== SSHing to a node ==<br />
<br />
Once your job has started, the node belongs to you. As such you may, from a login node, SSH into the node to check the performance of your job. The first step is to find out which nodes are being used (see above). Once you have your list of nodes, you can SSH into them directly. Once there, you can run the 'top' or 'free' commands to check both CPU and memory usage.<br />
<br />
== jobperf ==<br />
<br />
The jobperf script will give you feedback on the performance of your currently-running job:<br />
nia-login07:~$ jobperf 123456<br />
----------------------------------------------------------------------------------------------------<br />
RUNNING IDLE USER MEMORY(MB) PROCESS NAMES<br />
HOSTNAME # %CPU %MEM DISK SLEEP NAME RAMDISK USED AVAIL (excl:bash,sh,ssh,sshd)<br />
----------------------------------------------------------------------------------------------------<br />
nia1013 71 6999% 0.5% 0 22 ejspence 0 15060 178017 14*gmx_mpi mpiexec slurm_script<br />
nia1014 79 7677% 0.1% 0 18 ejspence 0 14803 178274 13*gmx_mpi<br />
nia1295 79 7517% 0.4% 0 18 ejspence 0 15199 177878 13*gmx_mpi<br />
----------------------------------------------------------------------------------------------------<br />
<br />
Here you can see both the CPU and memory usage of the job, for all nodes being used.<br />
<br />
== Other commands ==<br />
<br />
Some other commands had can be useful for dealing with your jobs:<br />
* <code>scancel -i JOBID</code> cancels a specific job.<br />
* <code>sacct</code> gives information about your recent jobs.<br />
* <code>sinfo -p compute</code> gives a list of available nodes.<br />
* <code>qsum</code> gives a summary of the queue by user.<br />
<br />
= Example submission scripts =<br />
<br />
Here we present some examples of how to create submission scripts for running parallel jobs. Serial job examples can be found on the [[Running_Serial_Jobs_on_Niagara | serial jobs page]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=8<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li><p>First line indicates that this is a bash script.</p></li><br />
<li><p>Lines starting with <code>#SBATCH</code> go to SLURM.</p></li><br />
<li><p>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</p></li><br />
<li><p>In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.</p></li><br />
<li><p>Note that the mpifun flag "--ppn" (processors per node) is ignored.</p></li><br />
<li><p>Once it found such a node, it runs the script:</p><br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application.</li><br />
</ul><br />
<li>To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Slurm&diff=1978Slurm2019-03-08T20:18:46Z<p>Bmundim: /* Job dependencies */</p>
<hr />
<div>The queueing system used at SciNet is based around the [https://slurm.schedmd.com Slurm Workload Manager]. This "scheduler", Slurm, determines which jobs will be run on which compute nodes, and when. This page outlines how to submit jobs, how to interact with the scheduler, and some of the most common Slurm commands.<br />
<br />
Some common questions about the queuing system can be found on the [[FAQ]] as well.<br />
<br />
= Submitting jobs =<br />
<br />
You submit jobs from a Niagara login node. This is done by passing a script to the sbatch command:<br />
<br />
nia-login07:~$ sbatch jobscript.sh<br />
<br />
This puts the job, described by the job script, into the queue. The scheduler will will run the job on the compute nodes in due course. A typical submission script is as follows.<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
<br />
Some notes about this example:<br />
* The first line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>).<br />
* In this case, SLURM looks for 2 nodes with 40 cores on which to run 80 tasks, for 1 hour.<br />
* Note that the mpifun flag "--ppn" (processors per node) is ignored. Slurm takes care of this detail.<br />
* Once the scheduler finds a spot to run the job, it runs the script:<br />
** It changes to the submission directory;<br />
** Loads modules;<br />
** Runs the <code>mpi_example</code> application.<br />
* To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).<br />
<br />
To create a job script appropriate for your work, you must modify the commands above to instruct Slurm to run the commands you need run.<br />
<br />
== Things to remember ==<br />
<br />
There are some things to always bear in mind when crafting your submission script:<br />
* Scheduling is by node, so in multiples of 40 cores. You are expected to use all 40 cores! If you are running serial jobs, and need assistance bundling your work into multiples of 40, please see the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page.<br />
* Jobs must write to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access. Download data you need before submitting your job.<br />
* Jobs will run under your group's RRG allocation. If your group does not have an allocation, your job will run under your group's RAS allocation (previously called `default' allocation). Note that groups with an allocation cannot run under a default allocation.<br />
* The maximum [[Wallclock_time | walltime]] for all users is 24 hours. The minimum and default walltime is 15 minutes.<br />
<br />
= Scheduling details =<br />
<br />
We now present the details of how to write a job script, and some extra commands which you might find useful.<br />
<br />
== SLURM nomenclature: jobs, nodes, tasks, cpus, cores, threads ==<br />
<br />
SLURM has a somewhat different way of referring to things like MPI processes and thread tasks, as compared to our previous scheduler, MOAB. The SLURM nomenclature is reflected in the names of scheduler options (i.e., resource requests). SLURM strictly enforces those requests, so it is important to get this right.<br />
<br />
{| class="wikitable"<br />
!term <br />
!meaning <br />
!SLURM term<br />
!related scheduler options <br />
|-<br />
|job<br />
|scheduled piece of work for which specific resources were requested.<br />
|job<br />
|<tt>sbatch, salloc</tt><br />
|-<br />
|node<br />
|basic computing component with several cores (40 for Niagara) that share memory <br />
|node<br />
|<tt>--nodes -N</tt><br />
|-<br />
|mpi process<br />
|one of a group of running programs using Message Passing Interface for parallel computing<br />
|task<br />
|<tt>--ntasks -n --ntasks-per-node</tt><br />
|-<br />
|core ''or'' physical cpu<br />
|A fully functional independent physical execution unit.<br />
| - <br />
| -<br />
|-<br />
|logical cpu<br />
|An execution unit that the operating system can assign work to. Operating systems can be configured to overload physical cores with multiple logical cpus using hyperthreading.<br />
|cpu<br />
|<tt>--cpus-per-task</tt><br />
|-<br />
|thread<br />
|one of possibly multiple simultaneous execution paths within a program, which can share memory.<br />
| -<br />
| <tt>--cpus-per-task</tt> '''and''' <tt>OMP_NUM_THREADS</tt><br />
|-<br />
|hyperthread<br />
|a thread run in a collection of threads that is larger than the number of physical cores.<br />
| -<br />
| -<br />
|}<br />
<br />
== Scheduling by Node ==<br />
<br />
* On many systems that use SLURM, the scheduler will deduce from the job script specifications (the number of tasks and the number of cpus-per-node) what resources should be allocated. On Niagara, this is a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours.<br />
** No other users are running anything on them.<br />
** You can ssh into them, while your job is running, to see how things are going.<br />
* Whatever you request of the scheduler, your request will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes. Each node has about 202GB of RAM available.<br />
* You should try to use all the cores on the nodes allocated to your job. Since there are 40 cores per node, your job should use N x 40 cores. If this is not the case, we will be contacted you to help you optimize your workflow. Again, users which have serials jobs should consult the [[Running Serial Jobs on Niagara | serial jobs]] page.<br />
<br />
== Hyperthreading: Logical CPUs vs. cores ==<br />
<br />
Hyperthreading, a technology that leverages more of the physical hardware by pretending there are twice as many logical cores than real cores, is enabled on Niagara.<br />
The operating system and scheduler see 80 logical CPUs.<br />
<br />
Using 80 logical CPUs versus 40 real cores typically gives about a 5-10% speedup, depending on your application (your mileage may vary).<br />
<br />
Because Niagara is scheduled by node, hyperthreading is actually fairly easy to use:<br />
* Ask for a certain number of nodes, N, for your job.<br />
* You know that you get 40 x N cores, so you will use (at least) a total of 40 x N MPI processes or threads (mpirun, srun, and the OS will automaticallly spread these over the real cores).<br />
* But you should also test if running 80 x N MPI processes or threads gives you any speedup.<br />
* Regardless, your usage will be counted as 40 x N x (walltime in years).<br />
<br />
Many applications which are communication-heavy can benefit from the use of hyperthreading.<br />
<br />
= Submission script details =<br />
<br />
This section outlines some details of how to interact with the scheduler, and how it implements Niagara's scheduling policies.<br />
<br />
== Queues ==<br />
<br />
There are 3 queues available on SciNet systems. These queues have different limits; see the [[#Limits | Limits]] section for further details.<br />
<br />
=== Compute ===<br />
<br />
The compute queue is the default queue. Most jobs will run in this queue. If no flags are specified in the submission script this is the queue where your job will land.<br />
<br />
=== Debug ===<br />
<br />
The Debug queue is a high-priority queue, used for short-term testing of your code. Do NOT use the debug queue for production work. You can use the debug queue one of two ways. To submit a standard job script to the debug queue, add the line<br />
#SBATCH -p debug<br />
to your submission script. This will put the job into the debug queue, and it should run in short order.<br />
<br />
To request an interactive debug session, where you retain control over the command line prompt, at a login node type the command<br />
nia-login07:~$ salloc -p debug --nodes 1 --time=1:00:00<br />
This will request 1 node for 1 hour. You can similarly request a debug session using the 'debugjob' command:<br />
nia-login07:~$ debugjob N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.<br />
<br />
=== Archive ===<br />
<br />
The archivelong and archiveshort queues are only used by the [[HPSS]] system. See that page for details on how to use these queues.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Compute jobs without allocation ("default")||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== Slurm Accounts ==<br />
<br />
To be able to prioritise jobs based on groups and allocations, the Slurm scheduler uses the concept of ''accounts''. Each group that has a Resource for Research Groups (RRG) or Research Platforms and Portals (RPP) allocation (awarded through an annual competition by Compute Canada) has an account that starts with <tt>rrg-</tt> or <tt>rpp-</tt>. Slurm assigns a 'fairshare' priority to these accounts based on the size of the award in core-years. Groups without an RRG or RPP can use Niagara using a so-called Rapid Access Service (RAS), and have an account that starts with <tt>def-</tt>.<br />
<br />
On Niagara, most users will only ever use one account, and those users do not need to specify the account to Slurm. However, users that are part of collaborations may be able to use multiple accounts, i.e., that of their sponsor and that of their collaborator, but this mean that they need to select the right account when running jobs. <br />
<br />
To select the account, just add <br />
<br />
#SBATCH -A [account]<br />
<br />
to the job scripts, or use the <tt>-A [account]</tt> to <tt>salloc</tt> or <tt>debugjob</tt>. <br />
<br />
To see which accounts you have access to, or what their names are, use the command<br />
<br />
sshare -U<br />
<br />
It has been noted that, in some cases, using the '-A' flag does not result in the appropriate account being used. To get around this, specify the account when sbatch is invoked:<br />
sbatch -A account myjobscript.sh<br />
<br />
== Slurm environment variables ==<br />
<br />
There are many environment variables built into Slurm. These are some which you may find useful:<br />
* SLURM_SUBMIT_DIR: directory from which the job was submitted.<br />
* SLURM_SUBMIT_HOST: host from which the job was submitted.<br />
* SLURM_JOB_ID: the job's id.<br />
* SLURM_JOB_NUM_NODES: number of nodes in the job.<br />
* SLURM_JOB_NODELIST: list of nodes assigned to the job.<br />
* SLURM_JOB_ACCOUNT: account associated with the job.<br />
<br />
Any of these environment variables can be accessed from within your job script.<br />
<br />
== Passing Variables to submission scripts ==<br />
It is possible to pass values through environment variables into your SLURM submission scripts.<br />
For doing so with already defined variables in your shell, just add the following directive in the submission script,<br />
<br />
#SBATCH --export=ALL<br />
<br />
and you will have access to any predefined environment variable.<br />
<br />
A better way is to specify explicitly which variables you want to pass into the submision script,<br />
<br />
sbatch --export=i=15,j='test' jobscript.sbatch<br />
<br />
You can even set the job name and output files using environment variables, eg.<br />
<br />
i="simulation"<br />
j=14<br />
sbatch --job-name=$i.$j.run --output=$i.$j.out --export=i=$i,j=$j jobscript.sbatch<br />
<br />
(The latter only works on the command line; you cannot use environment variables in <tt>#SBATCH</tt> lines in the job script.)<br />
<br />
'''Command line arguments:'''<br />
<br />
Command line arguments can also be used in the same way as command line argument for shell scripts. All command line arguments given to sbatch that follow after the job script name, will be passed to the job script. In fact, SLURM will not look at any of these arguments, so you must place all sbatch arguments before the script name, e.g.:<br />
<br />
sbatch -p debug jobscript.sbatch FirstArgument SecondArgument ...<br />
<br />
In this example, <tt>-p debug</tt> is interpreted by SLURM, while in your submission script you can access <tt>FirstArgument</tt>, <tt>SecondArgument</tt>, etc., by referring to <code>$1, $2, ...</code>.<br />
<br />
== Job arrays ==<br />
<br />
Sometimes you need to run the same job script many times, but just tweaking one value each time. One way of accomplishing this is using job arrays. Job arrays are invoked using the "-a" flag with sbatch:<br />
sbatch -a 1-100 myjobscript.sh<br />
This will submit 100 instances of myjobscript.sh. Within the job script you can distinguish which of those instances is running using the environment variable SLURM_ARRAY_TASK_ID.<br />
<br />
Note that Niagara [[#Limits | currently]] has a limit of 1000 submitted jobs for users within groups with allocations, and 200 submitted jobs without an allocation.<br />
<br />
== Job dependencies ==<br />
<br />
You can make one job dependent on the successful completion of another job using the following command:<br />
sbatch --dependency=afterok:JOBID myjobscript.sh<br />
This will make the current job submission not start until the parent job, with jobid JOBID, successfully completes. There are many job dependency options available. Visit the [https://slurm.schedmd.com/sbatch.html#OPT_dependency Slurm sbatch page ] for the full list. <br />
<br />
If the parent job fails (that is, ends with a non-zero exit code) the dependent job can never be scheduled and will be automatically cancelled.<br />
<br />
== Email Notification ==<br />
Email notification works, but you need to add the email address and type of notification you may want to receive in your submission script, eg.<br />
<br />
#SBATCH --mail-user=YOUR.email.ADDRESS<br />
#SBATCH --mail-type=ALL<br />
<br />
The sbatch man page (type <tt>man sbatch</tt> on Niagara) explains all possible mail-types.<br />
<br />
= Monitoring jobs =<br />
<br />
There are many options available for monitoring your jobs. The most basic of which is the squeue command:<br />
<br />
nia-login07:~$ squeue -u USERNAME<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
292047 compute myjob4 username PD 0:00 4 (Priority)<br />
292048 compute myjob3 username PD 0:00 4 (Priority)<br />
266829 compute myjob2 username R 18:56:17 2 nia[1397-1398]<br />
266828 compute myjob1 username R 18:56:46 1 nia1298<br />
<br />
Here you can see that we have two running jobs ('R') and two pending jobs ('PD'). The nodes being used are listed.<br />
<br />
== Job status ==<br />
<br />
To get an estimate of when a job will start, use the command<br />
squeue --start -j JOBID<br />
Note that this is only an estimate, and tend not to be very accurate.<br />
<br />
Information about a specific job can be found using the <br />
squeue -j JOBID<br />
or alternatively<br />
scontrol show job JOBID<br />
which is more verbose.<br />
<br />
== SSHing to a node ==<br />
<br />
Once your job has started, the node belongs to you. As such you may, from a login node, SSH into the node to check the performance of your job. The first step is to find out which nodes are being used (see above). Once you have your list of nodes, you can SSH into them directly. Once there, you can run the 'top' or 'free' commands to check both CPU and memory usage.<br />
<br />
== jobperf ==<br />
<br />
The jobperf script will give you feedback on the performance of your currently-running job:<br />
nia-login07:~$ jobperf 123456<br />
----------------------------------------------------------------------------------------------------<br />
RUNNING IDLE USER MEMORY(MB) PROCESS NAMES<br />
HOSTNAME # %CPU %MEM DISK SLEEP NAME RAMDISK USED AVAIL (excl:bash,sh,ssh,sshd)<br />
----------------------------------------------------------------------------------------------------<br />
nia1013 71 6999% 0.5% 0 22 ejspence 0 15060 178017 14*gmx_mpi mpiexec slurm_script<br />
nia1014 79 7677% 0.1% 0 18 ejspence 0 14803 178274 13*gmx_mpi<br />
nia1295 79 7517% 0.4% 0 18 ejspence 0 15199 177878 13*gmx_mpi<br />
----------------------------------------------------------------------------------------------------<br />
<br />
Here you can see both the CPU and memory usage of the job, for all nodes being used.<br />
<br />
== Other commands ==<br />
<br />
Some other commands had can be useful for dealing with your jobs:<br />
* <code>scancel -i JOBID</code> cancels a specific job.<br />
* <code>sacct</code> gives information about your recent jobs.<br />
* <code>sinfo -p compute</code> gives a list of available nodes.<br />
* <code>qsum</code> gives a summary of the queue by user.<br />
<br />
= Example submission scripts =<br />
<br />
Here we present some examples of how to create submission scripts for running parallel jobs. Serial job examples can be found on the [[Running_Serial_Jobs_on_Niagara | serial jobs page]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=8<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li><p>First line indicates that this is a bash script.</p></li><br />
<li><p>Lines starting with <code>#SBATCH</code> go to SLURM.</p></li><br />
<li><p>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</p></li><br />
<li><p>In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.</p></li><br />
<li><p>Note that the mpifun flag "--ppn" (processors per node) is ignored.</p></li><br />
<li><p>Once it found such a node, it runs the script:</p><br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application.</li><br />
</ul><br />
<li>To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Slurm&diff=1944Slurm2019-02-20T15:43:57Z<p>Bmundim: /* Example submission scripts */</p>
<hr />
<div>The queueing system used at SciNet is based around the [https://slurm.schedmd.com Slurm Workload Manager]. This "scheduler", Slurm, determines which jobs will be run on which compute nodes, and when. This page outlines how to submit jobs, how to interact with the scheduler, and some of the most common Slurm commands.<br />
<br />
Some common questions about the queuing system can be found on the [[FAQ]] as well.<br />
<br />
= Submitting jobs =<br />
<br />
You submit jobs from a Niagara login node. This is done by passing a script to the sbatch command:<br />
<br />
nia-login07:~$ sbatch jobscript.sh<br />
<br />
This puts the job, described by the job script, into the queue. The scheduler will will run the job on the compute nodes in due course. A typical submission script is as follows.<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
<br />
Some notes about this example:<br />
* The first line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>).<br />
* In this case, SLURM looks for 2 nodes with 40 cores on which to run 80 tasks, for 1 hour.<br />
* Note that the mpifun flag "--ppn" (processors per node) is ignored. Slurm takes care of this detail.<br />
* Once the scheduler finds a spot to run the job, it runs the script:<br />
** It changes to the submission directory;<br />
** Loads modules;<br />
** Runs the <code>mpi_example</code> application.<br />
* To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).<br />
<br />
To create a job script appropriate for your work, you must modify the commands above to instruct Slurm to run the commands you need run.<br />
<br />
== Things to remember ==<br />
<br />
There are some things to always bear in mind when crafting your submission script:<br />
* Scheduling is by node, so in multiples of 40 cores. You are expected to use all 40 cores! If you are running serial jobs, and need assistance bundling your work into multiples of 40, please see the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page.<br />
* Jobs must write to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access. Download data you need before submitting your job.<br />
* Jobs will run under your group's RRG allocation. If your group does not have an allocation, your job will run under your group's RAS allocation (previously called `default' allocation). Note that groups with an allocation cannot run under a default allocation.<br />
* The maximum [[Wallclock_time | walltime]] for all users is 24 hours. The minimum and default walltime is 15 minutes.<br />
<br />
= Scheduling details =<br />
<br />
We now present the details of how to write a job script, and some extra commands which you might find useful.<br />
<br />
== SLURM nomenclature: jobs, nodes, tasks, cpus, cores, threads ==<br />
<br />
SLURM has a somewhat different way of referring to things like MPI processes and thread tasks, as compared to our previous scheduler, MOAB. The SLURM nomenclature is reflected in the names of scheduler options (i.e., resource requests). SLURM strictly enforces those requests, so it is important to get this right.<br />
<br />
{| class="wikitable"<br />
!term <br />
!meaning <br />
!SLURM term<br />
!related scheduler options <br />
|-<br />
|job<br />
|scheduled piece of work for which specific resources were requested.<br />
|job<br />
|<tt>sbatch, salloc</tt><br />
|-<br />
|node<br />
|basic computing component with several cores (40 for Niagara) that share memory <br />
|node<br />
|<tt>--nodes -N</tt><br />
|-<br />
|mpi process<br />
|one of a group of running programs using Message Passing Interface for parallel computing<br />
|task<br />
|<tt>--ntasks -n --ntasks-per-node</tt><br />
|-<br />
|core ''or'' physical cpu<br />
|A fully functional independent physical execution unit.<br />
| - <br />
| -<br />
|-<br />
|logical cpu<br />
|An execution unit that the operating system can assign work to. Operating systems can be configured to overload physical cores with multiple logical cpus using hyperthreading.<br />
|cpu<br />
|<tt>--cpus-per-task</tt><br />
|-<br />
|thread<br />
|one of possibly multiple simultaneous execution paths within a program, which can share memory.<br />
| -<br />
| <tt>--cpus-per-task</tt> '''and''' <tt>OMP_NUM_THREADS</tt><br />
|-<br />
|hyperthread<br />
|a thread run in a collection of threads that is larger than the number of physical cores.<br />
| -<br />
| -<br />
|}<br />
<br />
== Scheduling by Node ==<br />
<br />
* On many systems that use SLURM, the scheduler will deduce from the job script specifications (the number of tasks and the number of cpus-per-node) what resources should be allocated. On Niagara, this is a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours.<br />
** No other users are running anything on them.<br />
** You can ssh into them, while your job is running, to see how things are going.<br />
* Whatever you request of the scheduler, your request will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes. Each node has about 202GB of RAM available.<br />
* You should try to use all the cores on the nodes allocated to your job. Since there are 40 cores per node, your job should use N x 40 cores. If this is not the case, we will be contacted you to help you optimize your workflow. Again, users which have serials jobs should consult the [[Running Serial Jobs on Niagara | serial jobs]] page.<br />
<br />
== Hyperthreading: Logical CPUs vs. cores ==<br />
<br />
Hyperthreading, a technology that leverages more of the physical hardware by pretending there are twice as many logical cores than real cores, is enabled on Niagara.<br />
The operating system and scheduler see 80 logical CPUs.<br />
<br />
Using 80 logical CPUs versus 40 real cores typically gives about a 5-10% speedup, depending on your application (your mileage may vary).<br />
<br />
Because Niagara is scheduled by node, hyperthreading is actually fairly easy to use:<br />
* Ask for a certain number of nodes, N, for your job.<br />
* You know that you get 40 x N cores, so you will use (at least) a total of 40 x N MPI processes or threads (mpirun, srun, and the OS will automaticallly spread these over the real cores).<br />
* But you should also test if running 80 x N MPI processes or threads gives you any speedup.<br />
* Regardless, your usage will be counted as 40 x N x (walltime in years).<br />
<br />
Many applications which are communication-heavy can benefit from the use of hyperthreading.<br />
<br />
= Submission script details =<br />
<br />
This section outlines some details of how to interact with the scheduler, and how it implements Niagara's scheduling policies.<br />
<br />
== Queues ==<br />
<br />
There are 3 queues available on SciNet systems. These queues have different limits; see the [[#Limits | Limits]] section for further details.<br />
<br />
=== Compute ===<br />
<br />
The compute queue is the default queue. Most jobs will run in this queue. If no flags are specified in the submission script this is the queue where your job will land.<br />
<br />
=== Debug ===<br />
<br />
The Debug queue is a high-priority queue, used for short-term testing of your code. Do NOT use the debug queue for production work. You can use the debug queue one of two ways. To submit a standard job script to the debug queue, add the line<br />
#SBATCH -p debug<br />
to your submission script. This will put the job into the debug queue, and it should run in short order.<br />
<br />
To request an interactive debug session, where you retain control over the command line prompt, at a login node type the command<br />
nia-login07:~$ salloc -p debug --nodes 1 --time=1:00:00<br />
This will request 1 node for 1 hour. You can similarly request a debug session using the 'debugjob' command:<br />
nia-login07:~$ debugjob N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.<br />
<br />
=== Archive ===<br />
<br />
The archivelong and archiveshort queues are only used by the [[HPSS]] system. See that page for details on how to use these queues.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Compute jobs without allocation ("default")||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== Slurm Accounts ==<br />
<br />
To be able to prioritise jobs based on groups and allocations, the Slurm scheduler uses the concept of ''accounts''. Each group that has a Resource for Research Groups (RRG) or Research Platforms and Portals (RPP) allocation (awarded through an annual competition by Compute Canada) has an account that starts with <tt>rrg-</tt> or <tt>rpp-</tt>. Slurm assigns a 'fairshare' priority to these accounts based on the size of the award in core-years. Groups without an RRG or RPP can use Niagara using a so-called Rapid Access Service (RAS), and have an account that starts with <tt>def-</tt>.<br />
<br />
On Niagara, most users will only ever use one account, and those users do not need to specify the account to Slurm. However, users that are part of collaborations may be able to use multiple accounts, i.e., that of their sponsor and that of their collaborator, but this mean that they need to select the right account when running jobs. <br />
<br />
To select the account, just add <br />
<br />
#SBATCH -A [account]<br />
<br />
to the job scripts, or use the <tt>-A [account]</tt> to <tt>salloc</tt> or <tt>debugjob</tt>. <br />
<br />
To see which accounts you have access to, or what their names are, use the command<br />
<br />
sshare -U<br />
<br />
It has been noted that, in some cases, using the '-A' flag does not result in the appropriate account being used. To get around this, specify the account when sbatch is invoked:<br />
sbatch -A account myjobscript.sh<br />
<br />
== Slurm environment variables ==<br />
<br />
There are many environment variables built into Slurm. These are some which you may find useful:<br />
* SLURM_SUBMIT_DIR: directory from which the job was submitted.<br />
* SLURM_SUBMIT_HOST: host from which the job was submitted.<br />
* SLURM_JOB_ID: the job's id.<br />
* SLURM_JOB_NUM_NODES: number of nodes in the job.<br />
* SLURM_JOB_NODELIST: list of nodes assigned to the job.<br />
* SLURM_JOB_ACCOUNT: account associated with the job.<br />
<br />
Any of these environment variables can be accessed from within your job script.<br />
<br />
== Passing Variables to submission scripts ==<br />
It is possible to pass values through environment variables into your SLURM submission scripts.<br />
For doing so with already defined variables in your shell, just add the following directive in the submission script,<br />
<br />
#SBATCH --export=ALL<br />
<br />
and you will have access to any predefined environment variable.<br />
<br />
A better way is to specify explicitly which variables you want to pass into the submision script,<br />
<br />
sbatch --export=i=15,j='test' jobscript.sbatch<br />
<br />
You can even set the job name and output files using environment variables, eg.<br />
<br />
i="simulation"<br />
j=14<br />
sbatch --job-name=$i.$j.run --output=$i.$j.out --export=i=$i,j=$j jobscript.sbatch<br />
<br />
(The latter only works on the command line; you cannot use environment variables in <tt>#SBATCH</tt> lines in the job script.)<br />
<br />
'''Command line arguments:'''<br />
<br />
Command line arguments can also be used in the same way as command line argument for shell scripts. All command line arguments given to sbatch that follow after the job script name, will be passed to the job script. In fact, SLURM will not look at any of these arguments, so you must place all sbatch arguments before the script name, e.g.:<br />
<br />
sbatch -p debug jobscript.sbatch FirstArgument SecondArgument ...<br />
<br />
In this example, <tt>-p debug</tt> is interpreted by SLURM, while in your submission script you can access <tt>FirstArgument</tt>, <tt>SecondArgument</tt>, etc., by referring to <code>$1, $2, ...</code>.<br />
<br />
== Job arrays ==<br />
<br />
Sometimes you need to run the same job script many times, but just tweaking one value each time. One way of accomplishing this is using job arrays. Job arrays are invoked using the "-a" flag with sbatch:<br />
sbatch -a 1-100 myjobscript.sh<br />
This will submit 100 instances of myjobscript.sh. Within the job script you can distinguish which of those instances is running using the environment variable SLURM_ARRAY_TASK_ID.<br />
<br />
Note that Niagara [[#Limits | currently]] has a limit of 1000 submitted jobs for users within groups with allocations, and 200 submitted jobs without an allocation.<br />
<br />
== Job dependencies ==<br />
<br />
You can make one job dependent on the successful completion of another job using the following command:<br />
sbatch --dependency=afterok:JOBID myjobscript.sh<br />
This will make the current job submission not start until the parent job, with jobid JOBID, successfully completes. There are many job dependency options available. Visit the [https://slurm.schedmd.com/sbatch.html Slurm sbatch page ] for the full list. <br />
<br />
If the parent job fails (that is, ends with a non-zero exit code) the dependent job can never be scheduled and will be automatically cancelled.<br />
<br />
== Email Notification ==<br />
Email notification works, but you need to add the email address and type of notification you may want to receive in your submission script, eg.<br />
<br />
#SBATCH --mail-user=YOUR.email.ADDRESS<br />
#SBATCH --mail-type=ALL<br />
<br />
The sbatch man page (type <tt>man sbatch</tt> on Niagara) explains all possible mail-types.<br />
<br />
= Monitoring jobs =<br />
<br />
There are many options available for monitoring your jobs. The most basic of which is the squeue command:<br />
<br />
nia-login07:~$ squeue -u USERNAME<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
292047 compute myjob4 username PD 0:00 4 (Priority)<br />
292048 compute myjob3 username PD 0:00 4 (Priority)<br />
266829 compute myjob2 username R 18:56:17 2 nia[1397-1398]<br />
266828 compute myjob1 username R 18:56:46 1 nia1298<br />
<br />
Here you can see that we have two running jobs ('R') and two pending jobs ('PD'). The nodes being used are listed.<br />
<br />
== Job status ==<br />
<br />
To get an estimate of when a job will start, use the command<br />
squeue --start -j JOBID<br />
Note that this is only an estimate, and tend not to be very accurate.<br />
<br />
Information about a specific job can be found using the <br />
squeue -j JOBID<br />
or alternatively<br />
scontrol show job JOBID<br />
which is more verbose.<br />
<br />
== SSHing to a node ==<br />
<br />
Once your job has started, the node belongs to you. As such you may, from a login node, SSH into the node to check the performance of your job. The first step is to find out which nodes are being used (see above). Once you have your list of nodes, you can SSH into them directly. Once there, you can run the 'top' or 'free' commands to check both CPU and memory usage.<br />
<br />
== jobperf ==<br />
<br />
The jobperf script will give you feedback on the performance of your currently-running job:<br />
nia-login07:~$ jobperf 123456<br />
----------------------------------------------------------------------------------------------------<br />
RUNNING IDLE USER MEMORY(MB) PROCESS NAMES<br />
HOSTNAME # %CPU %MEM DISK SLEEP NAME RAMDISK USED AVAIL (excl:bash,sh,ssh,sshd)<br />
----------------------------------------------------------------------------------------------------<br />
nia1013 71 6999% 0.5% 0 22 ejspence 0 15060 178017 14*gmx_mpi mpiexec slurm_script<br />
nia1014 79 7677% 0.1% 0 18 ejspence 0 14803 178274 13*gmx_mpi<br />
nia1295 79 7517% 0.4% 0 18 ejspence 0 15199 177878 13*gmx_mpi<br />
----------------------------------------------------------------------------------------------------<br />
<br />
Here you can see both the CPU and memory usage of the job, for all nodes being used.<br />
<br />
== Other commands ==<br />
<br />
Some other commands had can be useful for dealing with your jobs:<br />
* <code>scancel -i JOBID</code> cancels a specific job.<br />
* <code>sacct</code> gives information about your recent jobs.<br />
* <code>sinfo -p compute</code> gives a list of available nodes.<br />
* <code>qsum</code> gives a summary of the queue by user.<br />
<br />
= Example submission scripts =<br />
<br />
Here we present some examples of how to create submission scripts for running parallel jobs. Serial job examples can be found on the [[Running_Serial_Jobs_on_Niagara | serial jobs page]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=8<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li><p>First line indicates that this is a bash script.</p></li><br />
<li><p>Lines starting with <code>#SBATCH</code> go to SLURM.</p></li><br />
<li><p>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</p></li><br />
<li><p>In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.</p></li><br />
<li><p>Note that the mpifun flag "--ppn" (processors per node) is ignored.</p></li><br />
<li><p>Once it found such a node, it runs the script:</p><br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application.</li><br />
</ul><br />
<li>To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Slurm&diff=1943Slurm2019-02-20T15:40:37Z<p>Bmundim: /* Limits */</p>
<hr />
<div>The queueing system used at SciNet is based around the [https://slurm.schedmd.com Slurm Workload Manager]. This "scheduler", Slurm, determines which jobs will be run on which compute nodes, and when. This page outlines how to submit jobs, how to interact with the scheduler, and some of the most common Slurm commands.<br />
<br />
Some common questions about the queuing system can be found on the [[FAQ]] as well.<br />
<br />
= Submitting jobs =<br />
<br />
You submit jobs from a Niagara login node. This is done by passing a script to the sbatch command:<br />
<br />
nia-login07:~$ sbatch jobscript.sh<br />
<br />
This puts the job, described by the job script, into the queue. The scheduler will will run the job on the compute nodes in due course. A typical submission script is as follows.<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
<br />
Some notes about this example:<br />
* The first line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>).<br />
* In this case, SLURM looks for 2 nodes with 40 cores on which to run 80 tasks, for 1 hour.<br />
* Note that the mpifun flag "--ppn" (processors per node) is ignored. Slurm takes care of this detail.<br />
* Once the scheduler finds a spot to run the job, it runs the script:<br />
** It changes to the submission directory;<br />
** Loads modules;<br />
** Runs the <code>mpi_example</code> application.<br />
* To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).<br />
<br />
To create a job script appropriate for your work, you must modify the commands above to instruct Slurm to run the commands you need run.<br />
<br />
== Things to remember ==<br />
<br />
There are some things to always bear in mind when crafting your submission script:<br />
* Scheduling is by node, so in multiples of 40 cores. You are expected to use all 40 cores! If you are running serial jobs, and need assistance bundling your work into multiples of 40, please see the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page.<br />
* Jobs must write to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access. Download data you need before submitting your job.<br />
* Jobs will run under your group's RRG allocation. If your group does not have an allocation, your job will run under your group's RAS allocation (previously called `default' allocation). Note that groups with an allocation cannot run under a default allocation.<br />
* The maximum [[Wallclock_time | walltime]] for all users is 24 hours. The minimum and default walltime is 15 minutes.<br />
<br />
= Scheduling details =<br />
<br />
We now present the details of how to write a job script, and some extra commands which you might find useful.<br />
<br />
== SLURM nomenclature: jobs, nodes, tasks, cpus, cores, threads ==<br />
<br />
SLURM has a somewhat different way of referring to things like MPI processes and thread tasks, as compared to our previous scheduler, MOAB. The SLURM nomenclature is reflected in the names of scheduler options (i.e., resource requests). SLURM strictly enforces those requests, so it is important to get this right.<br />
<br />
{| class="wikitable"<br />
!term <br />
!meaning <br />
!SLURM term<br />
!related scheduler options <br />
|-<br />
|job<br />
|scheduled piece of work for which specific resources were requested.<br />
|job<br />
|<tt>sbatch, salloc</tt><br />
|-<br />
|node<br />
|basic computing component with several cores (40 for Niagara) that share memory <br />
|node<br />
|<tt>--nodes -N</tt><br />
|-<br />
|mpi process<br />
|one of a group of running programs using Message Passing Interface for parallel computing<br />
|task<br />
|<tt>--ntasks -n --ntasks-per-node</tt><br />
|-<br />
|core ''or'' physical cpu<br />
|A fully functional independent physical execution unit.<br />
| - <br />
| -<br />
|-<br />
|logical cpu<br />
|An execution unit that the operating system can assign work to. Operating systems can be configured to overload physical cores with multiple logical cpus using hyperthreading.<br />
|cpu<br />
|<tt>--cpus-per-task</tt><br />
|-<br />
|thread<br />
|one of possibly multiple simultaneous execution paths within a program, which can share memory.<br />
| -<br />
| <tt>--cpus-per-task</tt> '''and''' <tt>OMP_NUM_THREADS</tt><br />
|-<br />
|hyperthread<br />
|a thread run in a collection of threads that is larger than the number of physical cores.<br />
| -<br />
| -<br />
|}<br />
<br />
== Scheduling by Node ==<br />
<br />
* On many systems that use SLURM, the scheduler will deduce from the job script specifications (the number of tasks and the number of cpus-per-node) what resources should be allocated. On Niagara, this is a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours.<br />
** No other users are running anything on them.<br />
** You can ssh into them, while your job is running, to see how things are going.<br />
* Whatever you request of the scheduler, your request will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes. Each node has about 202GB of RAM available.<br />
* You should try to use all the cores on the nodes allocated to your job. Since there are 40 cores per node, your job should use N x 40 cores. If this is not the case, we will be contacted you to help you optimize your workflow. Again, users which have serials jobs should consult the [[Running Serial Jobs on Niagara | serial jobs]] page.<br />
<br />
== Hyperthreading: Logical CPUs vs. cores ==<br />
<br />
Hyperthreading, a technology that leverages more of the physical hardware by pretending there are twice as many logical cores than real cores, is enabled on Niagara.<br />
The operating system and scheduler see 80 logical CPUs.<br />
<br />
Using 80 logical CPUs versus 40 real cores typically gives about a 5-10% speedup, depending on your application (your mileage may vary).<br />
<br />
Because Niagara is scheduled by node, hyperthreading is actually fairly easy to use:<br />
* Ask for a certain number of nodes, N, for your job.<br />
* You know that you get 40 x N cores, so you will use (at least) a total of 40 x N MPI processes or threads (mpirun, srun, and the OS will automaticallly spread these over the real cores).<br />
* But you should also test if running 80 x N MPI processes or threads gives you any speedup.<br />
* Regardless, your usage will be counted as 40 x N x (walltime in years).<br />
<br />
Many applications which are communication-heavy can benefit from the use of hyperthreading.<br />
<br />
= Submission script details =<br />
<br />
This section outlines some details of how to interact with the scheduler, and how it implements Niagara's scheduling policies.<br />
<br />
== Queues ==<br />
<br />
There are 3 queues available on SciNet systems. These queues have different limits; see the [[#Limits | Limits]] section for further details.<br />
<br />
=== Compute ===<br />
<br />
The compute queue is the default queue. Most jobs will run in this queue. If no flags are specified in the submission script this is the queue where your job will land.<br />
<br />
=== Debug ===<br />
<br />
The Debug queue is a high-priority queue, used for short-term testing of your code. Do NOT use the debug queue for production work. You can use the debug queue one of two ways. To submit a standard job script to the debug queue, add the line<br />
#SBATCH -p debug<br />
to your submission script. This will put the job into the debug queue, and it should run in short order.<br />
<br />
To request an interactive debug session, where you retain control over the command line prompt, at a login node type the command<br />
nia-login07:~$ salloc -p debug --nodes 1 --time=1:00:00<br />
This will request 1 node for 1 hour. You can similarly request a debug session using the 'debugjob' command:<br />
nia-login07:~$ debugjob N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.<br />
<br />
=== Archive ===<br />
<br />
The archivelong and archiveshort queues are only used by the [[HPSS]] system. See that page for details on how to use these queues.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Compute jobs without allocation ("default")||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== Slurm Accounts ==<br />
<br />
To be able to prioritise jobs based on groups and allocations, the Slurm scheduler uses the concept of ''accounts''. Each group that has a Resource for Research Groups (RRG) or Research Platforms and Portals (RPP) allocation (awarded through an annual competition by Compute Canada) has an account that starts with <tt>rrg-</tt> or <tt>rpp-</tt>. Slurm assigns a 'fairshare' priority to these accounts based on the size of the award in core-years. Groups without an RRG or RPP can use Niagara using a so-called Rapid Access Service (RAS), and have an account that starts with <tt>def-</tt>.<br />
<br />
On Niagara, most users will only ever use one account, and those users do not need to specify the account to Slurm. However, users that are part of collaborations may be able to use multiple accounts, i.e., that of their sponsor and that of their collaborator, but this mean that they need to select the right account when running jobs. <br />
<br />
To select the account, just add <br />
<br />
#SBATCH -A [account]<br />
<br />
to the job scripts, or use the <tt>-A [account]</tt> to <tt>salloc</tt> or <tt>debugjob</tt>. <br />
<br />
To see which accounts you have access to, or what their names are, use the command<br />
<br />
sshare -U<br />
<br />
It has been noted that, in some cases, using the '-A' flag does not result in the appropriate account being used. To get around this, specify the account when sbatch is invoked:<br />
sbatch -A account myjobscript.sh<br />
<br />
== Slurm environment variables ==<br />
<br />
There are many environment variables built into Slurm. These are some which you may find useful:<br />
* SLURM_SUBMIT_DIR: directory from which the job was submitted.<br />
* SLURM_SUBMIT_HOST: host from which the job was submitted.<br />
* SLURM_JOB_ID: the job's id.<br />
* SLURM_JOB_NUM_NODES: number of nodes in the job.<br />
* SLURM_JOB_NODELIST: list of nodes assigned to the job.<br />
* SLURM_JOB_ACCOUNT: account associated with the job.<br />
<br />
Any of these environment variables can be accessed from within your job script.<br />
<br />
== Passing Variables to submission scripts ==<br />
It is possible to pass values through environment variables into your SLURM submission scripts.<br />
For doing so with already defined variables in your shell, just add the following directive in the submission script,<br />
<br />
#SBATCH --export=ALL<br />
<br />
and you will have access to any predefined environment variable.<br />
<br />
A better way is to specify explicitly which variables you want to pass into the submision script,<br />
<br />
sbatch --export=i=15,j='test' jobscript.sbatch<br />
<br />
You can even set the job name and output files using environment variables, eg.<br />
<br />
i="simulation"<br />
j=14<br />
sbatch --job-name=$i.$j.run --output=$i.$j.out --export=i=$i,j=$j jobscript.sbatch<br />
<br />
(The latter only works on the command line; you cannot use environment variables in <tt>#SBATCH</tt> lines in the job script.)<br />
<br />
'''Command line arguments:'''<br />
<br />
Command line arguments can also be used in the same way as command line argument for shell scripts. All command line arguments given to sbatch that follow after the job script name, will be passed to the job script. In fact, SLURM will not look at any of these arguments, so you must place all sbatch arguments before the script name, e.g.:<br />
<br />
sbatch -p debug jobscript.sbatch FirstArgument SecondArgument ...<br />
<br />
In this example, <tt>-p debug</tt> is interpreted by SLURM, while in your submission script you can access <tt>FirstArgument</tt>, <tt>SecondArgument</tt>, etc., by referring to <code>$1, $2, ...</code>.<br />
<br />
== Job arrays ==<br />
<br />
Sometimes you need to run the same job script many times, but just tweaking one value each time. One way of accomplishing this is using job arrays. Job arrays are invoked using the "-a" flag with sbatch:<br />
sbatch -a 1-100 myjobscript.sh<br />
This will submit 100 instances of myjobscript.sh. Within the job script you can distinguish which of those instances is running using the environment variable SLURM_ARRAY_TASK_ID.<br />
<br />
Note that Niagara [[#Limits | currently]] has a limit of 1000 submitted jobs for users within groups with allocations, and 200 submitted jobs without an allocation.<br />
<br />
== Job dependencies ==<br />
<br />
You can make one job dependent on the successful completion of another job using the following command:<br />
sbatch --dependency=afterok:JOBID myjobscript.sh<br />
This will make the current job submission not start until the parent job, with jobid JOBID, successfully completes. There are many job dependency options available. Visit the [https://slurm.schedmd.com/sbatch.html Slurm sbatch page ] for the full list. <br />
<br />
If the parent job fails (that is, ends with a non-zero exit code) the dependent job can never be scheduled and will be automatically cancelled.<br />
<br />
== Email Notification ==<br />
Email notification works, but you need to add the email address and type of notification you may want to receive in your submission script, eg.<br />
<br />
#SBATCH --mail-user=YOUR.email.ADDRESS<br />
#SBATCH --mail-type=ALL<br />
<br />
The sbatch man page (type <tt>man sbatch</tt> on Niagara) explains all possible mail-types.<br />
<br />
= Monitoring jobs =<br />
<br />
There are many options available for monitoring your jobs. The most basic of which is the squeue command:<br />
<br />
nia-login07:~$ squeue -u USERNAME<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
292047 compute myjob4 username PD 0:00 4 (Priority)<br />
292048 compute myjob3 username PD 0:00 4 (Priority)<br />
266829 compute myjob2 username R 18:56:17 2 nia[1397-1398]<br />
266828 compute myjob1 username R 18:56:46 1 nia1298<br />
<br />
Here you can see that we have two running jobs ('R') and two pending jobs ('PD'). The nodes being used are listed.<br />
<br />
== Job status ==<br />
<br />
To get an estimate of when a job will start, use the command<br />
squeue --start -j JOBID<br />
Note that this is only an estimate, and tend not to be very accurate.<br />
<br />
Information about a specific job can be found using the <br />
squeue -j JOBID<br />
or alternatively<br />
scontrol show job JOBID<br />
which is more verbose.<br />
<br />
== SSHing to a node ==<br />
<br />
Once your job has started, the node belongs to you. As such you may, from a login node, SSH into the node to check the performance of your job. The first step is to find out which nodes are being used (see above). Once you have your list of nodes, you can SSH into them directly. Once there, you can run the 'top' or 'free' commands to check both CPU and memory usage.<br />
<br />
== jobperf ==<br />
<br />
The jobperf script will give you feedback on the performance of your currently-running job:<br />
nia-login07:~$ jobperf 123456<br />
----------------------------------------------------------------------------------------------------<br />
RUNNING IDLE USER MEMORY(MB) PROCESS NAMES<br />
HOSTNAME # %CPU %MEM DISK SLEEP NAME RAMDISK USED AVAIL (excl:bash,sh,ssh,sshd)<br />
----------------------------------------------------------------------------------------------------<br />
nia1013 71 6999% 0.5% 0 22 ejspence 0 15060 178017 14*gmx_mpi mpiexec slurm_script<br />
nia1014 79 7677% 0.1% 0 18 ejspence 0 14803 178274 13*gmx_mpi<br />
nia1295 79 7517% 0.4% 0 18 ejspence 0 15199 177878 13*gmx_mpi<br />
----------------------------------------------------------------------------------------------------<br />
<br />
Here you can see both the CPU and memory usage of the job, for all nodes being used.<br />
<br />
== Other commands ==<br />
<br />
Some other commands had can be useful for dealing with your jobs:<br />
* <code>scancel -i JOBID</code> cancels a specific job.<br />
* <code>sacct</code> gives information about your recent jobs.<br />
* <code>sinfo -p compute</code> gives a list of available nodes.<br />
* <code>qsum</code> gives a summary of the queue by user.<br />
<br />
= Example submission scripts =<br />
<br />
Here we present some examples of how to create submission scripts for running parallel jobs. Serial job examples can be found on the [[Running_Serial_Jobs_on_Niagara | serial jobs page]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=8<br />
#SBATCH --ntasks=320<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li><p>First line indicates that this is a bash script.</p></li><br />
<li><p>Lines starting with <code>#SBATCH</code> go to SLURM.</p></li><br />
<li><p>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</p></li><br />
<li><p>In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.</p></li><br />
<li><p>Note that the mpifun flag "--ppn" (processors per node) is ignored.</p></li><br />
<li><p>Once it found such a node, it runs the script:</p><br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application.</li><br />
</ul><br />
<li>To use hyperthreading, just change --ntasks=320 to --ntasks=640, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Slurm&diff=1942Slurm2019-02-20T15:37:11Z<p>Bmundim: /* Limits */</p>
<hr />
<div>The queueing system used at SciNet is based around the [https://slurm.schedmd.com Slurm Workload Manager]. This "scheduler", Slurm, determines which jobs will be run on which compute nodes, and when. This page outlines how to submit jobs, how to interact with the scheduler, and some of the most common Slurm commands.<br />
<br />
Some common questions about the queuing system can be found on the [[FAQ]] as well.<br />
<br />
= Submitting jobs =<br />
<br />
You submit jobs from a Niagara login node. This is done by passing a script to the sbatch command:<br />
<br />
nia-login07:~$ sbatch jobscript.sh<br />
<br />
This puts the job, described by the job script, into the queue. The scheduler will will run the job on the compute nodes in due course. A typical submission script is as follows.<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
<br />
Some notes about this example:<br />
* The first line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>).<br />
* In this case, SLURM looks for 2 nodes with 40 cores on which to run 80 tasks, for 1 hour.<br />
* Note that the mpifun flag "--ppn" (processors per node) is ignored. Slurm takes care of this detail.<br />
* Once the scheduler finds a spot to run the job, it runs the script:<br />
** It changes to the submission directory;<br />
** Loads modules;<br />
** Runs the <code>mpi_example</code> application.<br />
* To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).<br />
<br />
To create a job script appropriate for your work, you must modify the commands above to instruct Slurm to run the commands you need run.<br />
<br />
== Things to remember ==<br />
<br />
There are some things to always bear in mind when crafting your submission script:<br />
* Scheduling is by node, so in multiples of 40 cores. You are expected to use all 40 cores! If you are running serial jobs, and need assistance bundling your work into multiples of 40, please see the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page.<br />
* Jobs must write to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access. Download data you need before submitting your job.<br />
* Jobs will run under your group's RRG allocation. If your group does not have an allocation, your job will run under your group's RAS allocation (previously called `default' allocation). Note that groups with an allocation cannot run under a default allocation.<br />
* The maximum [[Wallclock_time | walltime]] for all users is 24 hours. The minimum and default walltime is 15 minutes.<br />
<br />
= Scheduling details =<br />
<br />
We now present the details of how to write a job script, and some extra commands which you might find useful.<br />
<br />
== SLURM nomenclature: jobs, nodes, tasks, cpus, cores, threads ==<br />
<br />
SLURM has a somewhat different way of referring to things like MPI processes and thread tasks, as compared to our previous scheduler, MOAB. The SLURM nomenclature is reflected in the names of scheduler options (i.e., resource requests). SLURM strictly enforces those requests, so it is important to get this right.<br />
<br />
{| class="wikitable"<br />
!term <br />
!meaning <br />
!SLURM term<br />
!related scheduler options <br />
|-<br />
|job<br />
|scheduled piece of work for which specific resources were requested.<br />
|job<br />
|<tt>sbatch, salloc</tt><br />
|-<br />
|node<br />
|basic computing component with several cores (40 for Niagara) that share memory <br />
|node<br />
|<tt>--nodes -N</tt><br />
|-<br />
|mpi process<br />
|one of a group of running programs using Message Passing Interface for parallel computing<br />
|task<br />
|<tt>--ntasks -n --ntasks-per-node</tt><br />
|-<br />
|core ''or'' physical cpu<br />
|A fully functional independent physical execution unit.<br />
| - <br />
| -<br />
|-<br />
|logical cpu<br />
|An execution unit that the operating system can assign work to. Operating systems can be configured to overload physical cores with multiple logical cpus using hyperthreading.<br />
|cpu<br />
|<tt>--cpus-per-task</tt><br />
|-<br />
|thread<br />
|one of possibly multiple simultaneous execution paths within a program, which can share memory.<br />
| -<br />
| <tt>--cpus-per-task</tt> '''and''' <tt>OMP_NUM_THREADS</tt><br />
|-<br />
|hyperthread<br />
|a thread run in a collection of threads that is larger than the number of physical cores.<br />
| -<br />
| -<br />
|}<br />
<br />
== Scheduling by Node ==<br />
<br />
* On many systems that use SLURM, the scheduler will deduce from the job script specifications (the number of tasks and the number of cpus-per-node) what resources should be allocated. On Niagara, this is a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours.<br />
** No other users are running anything on them.<br />
** You can ssh into them, while your job is running, to see how things are going.<br />
* Whatever you request of the scheduler, your request will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes. Each node has about 202GB of RAM available.<br />
* You should try to use all the cores on the nodes allocated to your job. Since there are 40 cores per node, your job should use N x 40 cores. If this is not the case, we will be contacted you to help you optimize your workflow. Again, users which have serials jobs should consult the [[Running Serial Jobs on Niagara | serial jobs]] page.<br />
<br />
== Hyperthreading: Logical CPUs vs. cores ==<br />
<br />
Hyperthreading, a technology that leverages more of the physical hardware by pretending there are twice as many logical cores than real cores, is enabled on Niagara.<br />
The operating system and scheduler see 80 logical CPUs.<br />
<br />
Using 80 logical CPUs versus 40 real cores typically gives about a 5-10% speedup, depending on your application (your mileage may vary).<br />
<br />
Because Niagara is scheduled by node, hyperthreading is actually fairly easy to use:<br />
* Ask for a certain number of nodes, N, for your job.<br />
* You know that you get 40 x N cores, so you will use (at least) a total of 40 x N MPI processes or threads (mpirun, srun, and the OS will automaticallly spread these over the real cores).<br />
* But you should also test if running 80 x N MPI processes or threads gives you any speedup.<br />
* Regardless, your usage will be counted as 40 x N x (walltime in years).<br />
<br />
Many applications which are communication-heavy can benefit from the use of hyperthreading.<br />
<br />
= Submission script details =<br />
<br />
This section outlines some details of how to interact with the scheduler, and how it implements Niagara's scheduling policies.<br />
<br />
== Queues ==<br />
<br />
There are 3 queues available on SciNet systems. These queues have different limits; see the [[#Limits | Limits]] section for further details.<br />
<br />
=== Compute ===<br />
<br />
The compute queue is the default queue. Most jobs will run in this queue. If no flags are specified in the submission script this is the queue where your job will land.<br />
<br />
=== Debug ===<br />
<br />
The Debug queue is a high-priority queue, used for short-term testing of your code. Do NOT use the debug queue for production work. You can use the debug queue one of two ways. To submit a standard job script to the debug queue, add the line<br />
#SBATCH -p debug<br />
to your submission script. This will put the job into the debug queue, and it should run in short order.<br />
<br />
To request an interactive debug session, where you retain control over the command line prompt, at a login node type the command<br />
nia-login07:~$ salloc -p debug --nodes 1 --time=1:00:00<br />
This will request 1 node for 1 hour. You can similarly request a debug session using the 'debugjob' command:<br />
nia-login07:~$ debugjob N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.<br />
<br />
=== Archive ===<br />
<br />
The archivelong and archiveshort queues are only used by the [[HPSS]] system. See that page for details on how to use these queues.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Compute jobs without allocation ("default")||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Within these limits, jobs will still have to wait in the queue. The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.<br />
<br />
== Slurm Accounts ==<br />
<br />
To be able to prioritise jobs based on groups and allocations, the Slurm scheduler uses the concept of ''accounts''. Each group that has a Resource for Research Groups (RRG) or Research Platforms and Portals (RPP) allocation (awarded through an annual competition by Compute Canada) has an account that starts with <tt>rrg-</tt> or <tt>rpp-</tt>. Slurm assigns a 'fairshare' priority to these accounts based on the size of the award in core-years. Groups without an RRG or RPP can use Niagara using a so-called Rapid Access Service (RAS), and have an account that starts with <tt>def-</tt>.<br />
<br />
On Niagara, most users will only ever use one account, and those users do not need to specify the account to Slurm. However, users that are part of collaborations may be able to use multiple accounts, i.e., that of their sponsor and that of their collaborator, but this mean that they need to select the right account when running jobs. <br />
<br />
To select the account, just add <br />
<br />
#SBATCH -A [account]<br />
<br />
to the job scripts, or use the <tt>-A [account]</tt> to <tt>salloc</tt> or <tt>debugjob</tt>. <br />
<br />
To see which accounts you have access to, or what their names are, use the command<br />
<br />
sshare -U<br />
<br />
It has been noted that, in some cases, using the '-A' flag does not result in the appropriate account being used. To get around this, specify the account when sbatch is invoked:<br />
sbatch -A account myjobscript.sh<br />
<br />
== Slurm environment variables ==<br />
<br />
There are many environment variables built into Slurm. These are some which you may find useful:<br />
* SLURM_SUBMIT_DIR: directory from which the job was submitted.<br />
* SLURM_SUBMIT_HOST: host from which the job was submitted.<br />
* SLURM_JOB_ID: the job's id.<br />
* SLURM_JOB_NUM_NODES: number of nodes in the job.<br />
* SLURM_JOB_NODELIST: list of nodes assigned to the job.<br />
* SLURM_JOB_ACCOUNT: account associated with the job.<br />
<br />
Any of these environment variables can be accessed from within your job script.<br />
<br />
== Passing Variables to submission scripts ==<br />
It is possible to pass values through environment variables into your SLURM submission scripts.<br />
For doing so with already defined variables in your shell, just add the following directive in the submission script,<br />
<br />
#SBATCH --export=ALL<br />
<br />
and you will have access to any predefined environment variable.<br />
<br />
A better way is to specify explicitly which variables you want to pass into the submision script,<br />
<br />
sbatch --export=i=15,j='test' jobscript.sbatch<br />
<br />
You can even set the job name and output files using environment variables, eg.<br />
<br />
i="simulation"<br />
j=14<br />
sbatch --job-name=$i.$j.run --output=$i.$j.out --export=i=$i,j=$j jobscript.sbatch<br />
<br />
(The latter only works on the command line; you cannot use environment variables in <tt>#SBATCH</tt> lines in the job script.)<br />
<br />
'''Command line arguments:'''<br />
<br />
Command line arguments can also be used in the same way as command line argument for shell scripts. All command line arguments given to sbatch that follow after the job script name, will be passed to the job script. In fact, SLURM will not look at any of these arguments, so you must place all sbatch arguments before the script name, e.g.:<br />
<br />
sbatch -p debug jobscript.sbatch FirstArgument SecondArgument ...<br />
<br />
In this example, <tt>-p debug</tt> is interpreted by SLURM, while in your submission script you can access <tt>FirstArgument</tt>, <tt>SecondArgument</tt>, etc., by referring to <code>$1, $2, ...</code>.<br />
<br />
== Job arrays ==<br />
<br />
Sometimes you need to run the same job script many times, but just tweaking one value each time. One way of accomplishing this is using job arrays. Job arrays are invoked using the "-a" flag with sbatch:<br />
sbatch -a 1-100 myjobscript.sh<br />
This will submit 100 instances of myjobscript.sh. Within the job script you can distinguish which of those instances is running using the environment variable SLURM_ARRAY_TASK_ID.<br />
<br />
Note that Niagara [[#Limits | currently]] has a limit of 1000 submitted jobs for users within groups with allocations, and 200 submitted jobs without an allocation.<br />
<br />
== Job dependencies ==<br />
<br />
You can make one job dependent on the successful completion of another job using the following command:<br />
sbatch --dependency=afterok:JOBID myjobscript.sh<br />
This will make the current job submission not start until the parent job, with jobid JOBID, successfully completes. There are many job dependency options available. Visit the [https://slurm.schedmd.com/sbatch.html Slurm sbatch page ] for the full list. <br />
<br />
If the parent job fails (that is, ends with a non-zero exit code) the dependent job can never be scheduled and will be automatically cancelled.<br />
<br />
== Email Notification ==<br />
Email notification works, but you need to add the email address and type of notification you may want to receive in your submission script, eg.<br />
<br />
#SBATCH --mail-user=YOUR.email.ADDRESS<br />
#SBATCH --mail-type=ALL<br />
<br />
The sbatch man page (type <tt>man sbatch</tt> on Niagara) explains all possible mail-types.<br />
<br />
= Monitoring jobs =<br />
<br />
There are many options available for monitoring your jobs. The most basic of which is the squeue command:<br />
<br />
nia-login07:~$ squeue -u USERNAME<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
292047 compute myjob4 username PD 0:00 4 (Priority)<br />
292048 compute myjob3 username PD 0:00 4 (Priority)<br />
266829 compute myjob2 username R 18:56:17 2 nia[1397-1398]<br />
266828 compute myjob1 username R 18:56:46 1 nia1298<br />
<br />
Here you can see that we have two running jobs ('R') and two pending jobs ('PD'). The nodes being used are listed.<br />
<br />
== Job status ==<br />
<br />
To get an estimate of when a job will start, use the command<br />
squeue --start -j JOBID<br />
Note that this is only an estimate, and tend not to be very accurate.<br />
<br />
Information about a specific job can be found using the <br />
squeue -j JOBID<br />
or alternatively<br />
scontrol show job JOBID<br />
which is more verbose.<br />
<br />
== SSHing to a node ==<br />
<br />
Once your job has started, the node belongs to you. As such you may, from a login node, SSH into the node to check the performance of your job. The first step is to find out which nodes are being used (see above). Once you have your list of nodes, you can SSH into them directly. Once there, you can run the 'top' or 'free' commands to check both CPU and memory usage.<br />
<br />
== jobperf ==<br />
<br />
The jobperf script will give you feedback on the performance of your currently-running job:<br />
nia-login07:~$ jobperf 123456<br />
----------------------------------------------------------------------------------------------------<br />
RUNNING IDLE USER MEMORY(MB) PROCESS NAMES<br />
HOSTNAME # %CPU %MEM DISK SLEEP NAME RAMDISK USED AVAIL (excl:bash,sh,ssh,sshd)<br />
----------------------------------------------------------------------------------------------------<br />
nia1013 71 6999% 0.5% 0 22 ejspence 0 15060 178017 14*gmx_mpi mpiexec slurm_script<br />
nia1014 79 7677% 0.1% 0 18 ejspence 0 14803 178274 13*gmx_mpi<br />
nia1295 79 7517% 0.4% 0 18 ejspence 0 15199 177878 13*gmx_mpi<br />
----------------------------------------------------------------------------------------------------<br />
<br />
Here you can see both the CPU and memory usage of the job, for all nodes being used.<br />
<br />
== Other commands ==<br />
<br />
Some other commands had can be useful for dealing with your jobs:<br />
* <code>scancel -i JOBID</code> cancels a specific job.<br />
* <code>sacct</code> gives information about your recent jobs.<br />
* <code>sinfo -p compute</code> gives a list of available nodes.<br />
* <code>qsum</code> gives a summary of the queue by user.<br />
<br />
= Example submission scripts =<br />
<br />
Here we present some examples of how to create submission scripts for running parallel jobs. Serial job examples can be found on the [[Running_Serial_Jobs_on_Niagara | serial jobs page]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=8<br />
#SBATCH --ntasks=320<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li><p>First line indicates that this is a bash script.</p></li><br />
<li><p>Lines starting with <code>#SBATCH</code> go to SLURM.</p></li><br />
<li><p>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</p></li><br />
<li><p>In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.</p></li><br />
<li><p>Note that the mpifun flag "--ppn" (processors per node) is ignored.</p></li><br />
<li><p>Once it found such a node, it runs the script:</p><br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application.</li><br />
</ul><br />
<li>To use hyperthreading, just change --ntasks=320 to --ntasks=640, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Slurm&diff=1941Slurm2019-02-20T15:09:57Z<p>Bmundim: /* Things to remember */</p>
<hr />
<div>The queueing system used at SciNet is based around the [https://slurm.schedmd.com Slurm Workload Manager]. This "scheduler", Slurm, determines which jobs will be run on which compute nodes, and when. This page outlines how to submit jobs, how to interact with the scheduler, and some of the most common Slurm commands.<br />
<br />
Some common questions about the queuing system can be found on the [[FAQ]] as well.<br />
<br />
= Submitting jobs =<br />
<br />
You submit jobs from a Niagara login node. This is done by passing a script to the sbatch command:<br />
<br />
nia-login07:~$ sbatch jobscript.sh<br />
<br />
This puts the job, described by the job script, into the queue. The scheduler will will run the job on the compute nodes in due course. A typical submission script is as follows.<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
<br />
Some notes about this example:<br />
* The first line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>).<br />
* In this case, SLURM looks for 2 nodes with 40 cores on which to run 80 tasks, for 1 hour.<br />
* Note that the mpifun flag "--ppn" (processors per node) is ignored. Slurm takes care of this detail.<br />
* Once the scheduler finds a spot to run the job, it runs the script:<br />
** It changes to the submission directory;<br />
** Loads modules;<br />
** Runs the <code>mpi_example</code> application.<br />
* To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).<br />
<br />
To create a job script appropriate for your work, you must modify the commands above to instruct Slurm to run the commands you need run.<br />
<br />
== Things to remember ==<br />
<br />
There are some things to always bear in mind when crafting your submission script:<br />
* Scheduling is by node, so in multiples of 40 cores. You are expected to use all 40 cores! If you are running serial jobs, and need assistance bundling your work into multiples of 40, please see the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page.<br />
* Jobs must write to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access. Download data you need before submitting your job.<br />
* Jobs will run under your group's RRG allocation. If your group does not have an allocation, your job will run under your group's RAS allocation (previously called `default' allocation). Note that groups with an allocation cannot run under a default allocation.<br />
* The maximum [[Wallclock_time | walltime]] for all users is 24 hours. The minimum and default walltime is 15 minutes.<br />
<br />
= Scheduling details =<br />
<br />
We now present the details of how to write a job script, and some extra commands which you might find useful.<br />
<br />
== SLURM nomenclature: jobs, nodes, tasks, cpus, cores, threads ==<br />
<br />
SLURM has a somewhat different way of referring to things like MPI processes and thread tasks, as compared to our previous scheduler, MOAB. The SLURM nomenclature is reflected in the names of scheduler options (i.e., resource requests). SLURM strictly enforces those requests, so it is important to get this right.<br />
<br />
{| class="wikitable"<br />
!term <br />
!meaning <br />
!SLURM term<br />
!related scheduler options <br />
|-<br />
|job<br />
|scheduled piece of work for which specific resources were requested.<br />
|job<br />
|<tt>sbatch, salloc</tt><br />
|-<br />
|node<br />
|basic computing component with several cores (40 for Niagara) that share memory <br />
|node<br />
|<tt>--nodes -N</tt><br />
|-<br />
|mpi process<br />
|one of a group of running programs using Message Passing Interface for parallel computing<br />
|task<br />
|<tt>--ntasks -n --ntasks-per-node</tt><br />
|-<br />
|core ''or'' physical cpu<br />
|A fully functional independent physical execution unit.<br />
| - <br />
| -<br />
|-<br />
|logical cpu<br />
|An execution unit that the operating system can assign work to. Operating systems can be configured to overload physical cores with multiple logical cpus using hyperthreading.<br />
|cpu<br />
|<tt>--cpus-per-task</tt><br />
|-<br />
|thread<br />
|one of possibly multiple simultaneous execution paths within a program, which can share memory.<br />
| -<br />
| <tt>--cpus-per-task</tt> '''and''' <tt>OMP_NUM_THREADS</tt><br />
|-<br />
|hyperthread<br />
|a thread run in a collection of threads that is larger than the number of physical cores.<br />
| -<br />
| -<br />
|}<br />
<br />
== Scheduling by Node ==<br />
<br />
* On many systems that use SLURM, the scheduler will deduce from the job script specifications (the number of tasks and the number of cpus-per-node) what resources should be allocated. On Niagara, this is a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours.<br />
** No other users are running anything on them.<br />
** You can ssh into them, while your job is running, to see how things are going.<br />
* Whatever you request of the scheduler, your request will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes. Each node has about 202GB of RAM available.<br />
* You should try to use all the cores on the nodes allocated to your job. Since there are 40 cores per node, your job should use N x 40 cores. If this is not the case, we will be contacted you to help you optimize your workflow. Again, users which have serials jobs should consult the [[Running Serial Jobs on Niagara | serial jobs]] page.<br />
<br />
== Hyperthreading: Logical CPUs vs. cores ==<br />
<br />
Hyperthreading, a technology that leverages more of the physical hardware by pretending there are twice as many logical cores than real cores, is enabled on Niagara.<br />
The operating system and scheduler see 80 logical CPUs.<br />
<br />
Using 80 logical CPUs versus 40 real cores typically gives about a 5-10% speedup, depending on your application (your mileage may vary).<br />
<br />
Because Niagara is scheduled by node, hyperthreading is actually fairly easy to use:<br />
* Ask for a certain number of nodes, N, for your job.<br />
* You know that you get 40 x N cores, so you will use (at least) a total of 40 x N MPI processes or threads (mpirun, srun, and the OS will automaticallly spread these over the real cores).<br />
* But you should also test if running 80 x N MPI processes or threads gives you any speedup.<br />
* Regardless, your usage will be counted as 40 x N x (walltime in years).<br />
<br />
Many applications which are communication-heavy can benefit from the use of hyperthreading.<br />
<br />
= Submission script details =<br />
<br />
This section outlines some details of how to interact with the scheduler, and how it implements Niagara's scheduling policies.<br />
<br />
== Queues ==<br />
<br />
There are 3 queues available on SciNet systems. These queues have different limits; see the [[#Limits | Limits]] section for further details.<br />
<br />
=== Compute ===<br />
<br />
The compute queue is the default queue. Most jobs will run in this queue. If no flags are specified in the submission script this is the queue where your job will land.<br />
<br />
=== Debug ===<br />
<br />
The Debug queue is a high-priority queue, used for short-term testing of your code. Do NOT use the debug queue for production work. You can use the debug queue one of two ways. To submit a standard job script to the debug queue, add the line<br />
#SBATCH -p debug<br />
to your submission script. This will put the job into the debug queue, and it should run in short order.<br />
<br />
To request an interactive debug session, where you retain control over the command line prompt, at a login node type the command<br />
nia-login07:~$ salloc -p debug --nodes 1 --time=1:00:00<br />
This will request 1 node for 1 hour. You can similarly request a debug session using the 'debugjob' command:<br />
nia-login07:~$ debugjob N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.<br />
<br />
=== Archive ===<br />
<br />
The archivelong and archiveshort queues are only used by the [[HPSS]] system. See that page for details on how to use these queues.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Compute jobs without allocation ("default")||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 12 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Within these limits, jobs will still have to wait in the queue. The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.<br />
<br />
== Slurm Accounts ==<br />
<br />
To be able to prioritise jobs based on groups and allocations, the Slurm scheduler uses the concept of ''accounts''. Each group that has a Resource for Research Groups (RRG) or Research Platforms and Portals (RPP) allocation (awarded through an annual competition by Compute Canada) has an account that starts with <tt>rrg-</tt> or <tt>rpp-</tt>. Slurm assigns a 'fairshare' priority to these accounts based on the size of the award in core-years. Groups without an RRG or RPP can use Niagara using a so-called Rapid Access Service (RAS), and have an account that starts with <tt>def-</tt>.<br />
<br />
On Niagara, most users will only ever use one account, and those users do not need to specify the account to Slurm. However, users that are part of collaborations may be able to use multiple accounts, i.e., that of their sponsor and that of their collaborator, but this mean that they need to select the right account when running jobs. <br />
<br />
To select the account, just add <br />
<br />
#SBATCH -A [account]<br />
<br />
to the job scripts, or use the <tt>-A [account]</tt> to <tt>salloc</tt> or <tt>debugjob</tt>. <br />
<br />
To see which accounts you have access to, or what their names are, use the command<br />
<br />
sshare -U<br />
<br />
It has been noted that, in some cases, using the '-A' flag does not result in the appropriate account being used. To get around this, specify the account when sbatch is invoked:<br />
sbatch -A account myjobscript.sh<br />
<br />
== Slurm environment variables ==<br />
<br />
There are many environment variables built into Slurm. These are some which you may find useful:<br />
* SLURM_SUBMIT_DIR: directory from which the job was submitted.<br />
* SLURM_SUBMIT_HOST: host from which the job was submitted.<br />
* SLURM_JOB_ID: the job's id.<br />
* SLURM_JOB_NUM_NODES: number of nodes in the job.<br />
* SLURM_JOB_NODELIST: list of nodes assigned to the job.<br />
* SLURM_JOB_ACCOUNT: account associated with the job.<br />
<br />
Any of these environment variables can be accessed from within your job script.<br />
<br />
== Passing Variables to submission scripts ==<br />
It is possible to pass values through environment variables into your SLURM submission scripts.<br />
For doing so with already defined variables in your shell, just add the following directive in the submission script,<br />
<br />
#SBATCH --export=ALL<br />
<br />
and you will have access to any predefined environment variable.<br />
<br />
A better way is to specify explicitly which variables you want to pass into the submision script,<br />
<br />
sbatch --export=i=15,j='test' jobscript.sbatch<br />
<br />
You can even set the job name and output files using environment variables, eg.<br />
<br />
i="simulation"<br />
j=14<br />
sbatch --job-name=$i.$j.run --output=$i.$j.out --export=i=$i,j=$j jobscript.sbatch<br />
<br />
(The latter only works on the command line; you cannot use environment variables in <tt>#SBATCH</tt> lines in the job script.)<br />
<br />
'''Command line arguments:'''<br />
<br />
Command line arguments can also be used in the same way as command line argument for shell scripts. All command line arguments given to sbatch that follow after the job script name, will be passed to the job script. In fact, SLURM will not look at any of these arguments, so you must place all sbatch arguments before the script name, e.g.:<br />
<br />
sbatch -p debug jobscript.sbatch FirstArgument SecondArgument ...<br />
<br />
In this example, <tt>-p debug</tt> is interpreted by SLURM, while in your submission script you can access <tt>FirstArgument</tt>, <tt>SecondArgument</tt>, etc., by referring to <code>$1, $2, ...</code>.<br />
<br />
== Job arrays ==<br />
<br />
Sometimes you need to run the same job script many times, but just tweaking one value each time. One way of accomplishing this is using job arrays. Job arrays are invoked using the "-a" flag with sbatch:<br />
sbatch -a 1-100 myjobscript.sh<br />
This will submit 100 instances of myjobscript.sh. Within the job script you can distinguish which of those instances is running using the environment variable SLURM_ARRAY_TASK_ID.<br />
<br />
Note that Niagara [[#Limits | currently]] has a limit of 1000 submitted jobs for users within groups with allocations, and 200 submitted jobs without an allocation.<br />
<br />
== Job dependencies ==<br />
<br />
You can make one job dependent on the successful completion of another job using the following command:<br />
sbatch --dependency=afterok:JOBID myjobscript.sh<br />
This will make the current job submission not start until the parent job, with jobid JOBID, successfully completes. There are many job dependency options available. Visit the [https://slurm.schedmd.com/sbatch.html Slurm sbatch page ] for the full list. <br />
<br />
If the parent job fails (that is, ends with a non-zero exit code) the dependent job can never be scheduled and will be automatically cancelled.<br />
<br />
== Email Notification ==<br />
Email notification works, but you need to add the email address and type of notification you may want to receive in your submission script, eg.<br />
<br />
#SBATCH --mail-user=YOUR.email.ADDRESS<br />
#SBATCH --mail-type=ALL<br />
<br />
The sbatch man page (type <tt>man sbatch</tt> on Niagara) explains all possible mail-types.<br />
<br />
= Monitoring jobs =<br />
<br />
There are many options available for monitoring your jobs. The most basic of which is the squeue command:<br />
<br />
nia-login07:~$ squeue -u USERNAME<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
292047 compute myjob4 username PD 0:00 4 (Priority)<br />
292048 compute myjob3 username PD 0:00 4 (Priority)<br />
266829 compute myjob2 username R 18:56:17 2 nia[1397-1398]<br />
266828 compute myjob1 username R 18:56:46 1 nia1298<br />
<br />
Here you can see that we have two running jobs ('R') and two pending jobs ('PD'). The nodes being used are listed.<br />
<br />
== Job status ==<br />
<br />
To get an estimate of when a job will start, use the command<br />
squeue --start -j JOBID<br />
Note that this is only an estimate, and tend not to be very accurate.<br />
<br />
Information about a specific job can be found using the <br />
squeue -j JOBID<br />
or alternatively<br />
scontrol show job JOBID<br />
which is more verbose.<br />
<br />
== SSHing to a node ==<br />
<br />
Once your job has started, the node belongs to you. As such you may, from a login node, SSH into the node to check the performance of your job. The first step is to find out which nodes are being used (see above). Once you have your list of nodes, you can SSH into them directly. Once there, you can run the 'top' or 'free' commands to check both CPU and memory usage.<br />
<br />
== jobperf ==<br />
<br />
The jobperf script will give you feedback on the performance of your currently-running job:<br />
nia-login07:~$ jobperf 123456<br />
----------------------------------------------------------------------------------------------------<br />
RUNNING IDLE USER MEMORY(MB) PROCESS NAMES<br />
HOSTNAME # %CPU %MEM DISK SLEEP NAME RAMDISK USED AVAIL (excl:bash,sh,ssh,sshd)<br />
----------------------------------------------------------------------------------------------------<br />
nia1013 71 6999% 0.5% 0 22 ejspence 0 15060 178017 14*gmx_mpi mpiexec slurm_script<br />
nia1014 79 7677% 0.1% 0 18 ejspence 0 14803 178274 13*gmx_mpi<br />
nia1295 79 7517% 0.4% 0 18 ejspence 0 15199 177878 13*gmx_mpi<br />
----------------------------------------------------------------------------------------------------<br />
<br />
Here you can see both the CPU and memory usage of the job, for all nodes being used.<br />
<br />
== Other commands ==<br />
<br />
Some other commands had can be useful for dealing with your jobs:<br />
* <code>scancel -i JOBID</code> cancels a specific job.<br />
* <code>sacct</code> gives information about your recent jobs.<br />
* <code>sinfo -p compute</code> gives a list of available nodes.<br />
* <code>qsum</code> gives a summary of the queue by user.<br />
<br />
= Example submission scripts =<br />
<br />
Here we present some examples of how to create submission scripts for running parallel jobs. Serial job examples can be found on the [[Running_Serial_Jobs_on_Niagara | serial jobs page]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=8<br />
#SBATCH --ntasks=320<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li><p>First line indicates that this is a bash script.</p></li><br />
<li><p>Lines starting with <code>#SBATCH</code> go to SLURM.</p></li><br />
<li><p>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</p></li><br />
<li><p>In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.</p></li><br />
<li><p>Note that the mpifun flag "--ppn" (processors per node) is ignored.</p></li><br />
<li><p>Once it found such a node, it runs the script:</p><br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application.</li><br />
</ul><br />
<li>To use hyperthreading, just change --ntasks=320 to --ntasks=640, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Slurm&diff=1940Slurm2019-02-20T15:09:01Z<p>Bmundim: /* Submitting jobs */</p>
<hr />
<div>The queueing system used at SciNet is based around the [https://slurm.schedmd.com Slurm Workload Manager]. This "scheduler", Slurm, determines which jobs will be run on which compute nodes, and when. This page outlines how to submit jobs, how to interact with the scheduler, and some of the most common Slurm commands.<br />
<br />
Some common questions about the queuing system can be found on the [[FAQ]] as well.<br />
<br />
= Submitting jobs =<br />
<br />
You submit jobs from a Niagara login node. This is done by passing a script to the sbatch command:<br />
<br />
nia-login07:~$ sbatch jobscript.sh<br />
<br />
This puts the job, described by the job script, into the queue. The scheduler will will run the job on the compute nodes in due course. A typical submission script is as follows.<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks-per-node=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
<br />
Some notes about this example:<br />
* The first line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>).<br />
* In this case, SLURM looks for 2 nodes with 40 cores on which to run 80 tasks, for 1 hour.<br />
* Note that the mpifun flag "--ppn" (processors per node) is ignored. Slurm takes care of this detail.<br />
* Once the scheduler finds a spot to run the job, it runs the script:<br />
** It changes to the submission directory;<br />
** Loads modules;<br />
** Runs the <code>mpi_example</code> application.<br />
* To use hyperthreading, just change --ntasks-per-node=40 to --ntasks-per-node=80, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).<br />
<br />
To create a job script appropriate for your work, you must modify the commands above to instruct Slurm to run the commands you need run.<br />
<br />
== Things to remember ==<br />
<br />
There are some things to always bear in mind when crafting your submission script:<br />
* Scheduling is by node, so in multiples of 40 cores. You are expected to use all 40 cores! If you are running serial jobs, and need assistance bundling your work into multiples of 40, please see the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page.<br />
* Jobs must write to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access. Download data you need before submitting your job.<br />
* Jobs will run under your group's RRG allocation. If your group does not have an allocation, your job will run under your group's RAS allocation (previously called `default' allocation). Note that groups with an allocation cannot run under a default allocation.<br />
* The maximum [[Wallclock_time | walltime]] for all users is 24 hours. The minimum and default [[Wallclock_time | walltime]] is 15 minutes.<br />
<br />
= Scheduling details =<br />
<br />
We now present the details of how to write a job script, and some extra commands which you might find useful.<br />
<br />
== SLURM nomenclature: jobs, nodes, tasks, cpus, cores, threads ==<br />
<br />
SLURM has a somewhat different way of referring to things like MPI processes and thread tasks, as compared to our previous scheduler, MOAB. The SLURM nomenclature is reflected in the names of scheduler options (i.e., resource requests). SLURM strictly enforces those requests, so it is important to get this right.<br />
<br />
{| class="wikitable"<br />
!term <br />
!meaning <br />
!SLURM term<br />
!related scheduler options <br />
|-<br />
|job<br />
|scheduled piece of work for which specific resources were requested.<br />
|job<br />
|<tt>sbatch, salloc</tt><br />
|-<br />
|node<br />
|basic computing component with several cores (40 for Niagara) that share memory <br />
|node<br />
|<tt>--nodes -N</tt><br />
|-<br />
|mpi process<br />
|one of a group of running programs using Message Passing Interface for parallel computing<br />
|task<br />
|<tt>--ntasks -n --ntasks-per-node</tt><br />
|-<br />
|core ''or'' physical cpu<br />
|A fully functional independent physical execution unit.<br />
| - <br />
| -<br />
|-<br />
|logical cpu<br />
|An execution unit that the operating system can assign work to. Operating systems can be configured to overload physical cores with multiple logical cpus using hyperthreading.<br />
|cpu<br />
|<tt>--cpus-per-task</tt><br />
|-<br />
|thread<br />
|one of possibly multiple simultaneous execution paths within a program, which can share memory.<br />
| -<br />
| <tt>--cpus-per-task</tt> '''and''' <tt>OMP_NUM_THREADS</tt><br />
|-<br />
|hyperthread<br />
|a thread run in a collection of threads that is larger than the number of physical cores.<br />
| -<br />
| -<br />
|}<br />
<br />
== Scheduling by Node ==<br />
<br />
* On many systems that use SLURM, the scheduler will deduce from the job script specifications (the number of tasks and the number of cpus-per-node) what resources should be allocated. On Niagara, this is a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours.<br />
** No other users are running anything on them.<br />
** You can ssh into them, while your job is running, to see how things are going.<br />
* Whatever you request of the scheduler, your request will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes. Each node has about 202GB of RAM available.<br />
* You should try to use all the cores on the nodes allocated to your job. Since there are 40 cores per node, your job should use N x 40 cores. If this is not the case, we will be contacted you to help you optimize your workflow. Again, users which have serials jobs should consult the [[Running Serial Jobs on Niagara | serial jobs]] page.<br />
<br />
== Hyperthreading: Logical CPUs vs. cores ==<br />
<br />
Hyperthreading, a technology that leverages more of the physical hardware by pretending there are twice as many logical cores than real cores, is enabled on Niagara.<br />
The operating system and scheduler see 80 logical CPUs.<br />
<br />
Using 80 logical CPUs versus 40 real cores typically gives about a 5-10% speedup, depending on your application (your mileage may vary).<br />
<br />
Because Niagara is scheduled by node, hyperthreading is actually fairly easy to use:<br />
* Ask for a certain number of nodes, N, for your job.<br />
* You know that you get 40 x N cores, so you will use (at least) a total of 40 x N MPI processes or threads (mpirun, srun, and the OS will automaticallly spread these over the real cores).<br />
* But you should also test if running 80 x N MPI processes or threads gives you any speedup.<br />
* Regardless, your usage will be counted as 40 x N x (walltime in years).<br />
<br />
Many applications which are communication-heavy can benefit from the use of hyperthreading.<br />
<br />
= Submission script details =<br />
<br />
This section outlines some details of how to interact with the scheduler, and how it implements Niagara's scheduling policies.<br />
<br />
== Queues ==<br />
<br />
There are 3 queues available on SciNet systems. These queues have different limits; see the [[#Limits | Limits]] section for further details.<br />
<br />
=== Compute ===<br />
<br />
The compute queue is the default queue. Most jobs will run in this queue. If no flags are specified in the submission script this is the queue where your job will land.<br />
<br />
=== Debug ===<br />
<br />
The Debug queue is a high-priority queue, used for short-term testing of your code. Do NOT use the debug queue for production work. You can use the debug queue one of two ways. To submit a standard job script to the debug queue, add the line<br />
#SBATCH -p debug<br />
to your submission script. This will put the job into the debug queue, and it should run in short order.<br />
<br />
To request an interactive debug session, where you retain control over the command line prompt, at a login node type the command<br />
nia-login07:~$ salloc -p debug --nodes 1 --time=1:00:00<br />
This will request 1 node for 1 hour. You can similarly request a debug session using the 'debugjob' command:<br />
nia-login07:~$ debugjob N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes.<br />
<br />
=== Archive ===<br />
<br />
The archivelong and archiveshort queues are only used by the [[HPSS]] system. See that page for details on how to use these queues.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Compute jobs without allocation ("default")||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 12 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Within these limits, jobs will still have to wait in the queue. The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.<br />
<br />
== Slurm Accounts ==<br />
<br />
To be able to prioritise jobs based on groups and allocations, the Slurm scheduler uses the concept of ''accounts''. Each group that has a Resource for Research Groups (RRG) or Research Platforms and Portals (RPP) allocation (awarded through an annual competition by Compute Canada) has an account that starts with <tt>rrg-</tt> or <tt>rpp-</tt>. Slurm assigns a 'fairshare' priority to these accounts based on the size of the award in core-years. Groups without an RRG or RPP can use Niagara using a so-called Rapid Access Service (RAS), and have an account that starts with <tt>def-</tt>.<br />
<br />
On Niagara, most users will only ever use one account, and those users do not need to specify the account to Slurm. However, users that are part of collaborations may be able to use multiple accounts, i.e., that of their sponsor and that of their collaborator, but this mean that they need to select the right account when running jobs. <br />
<br />
To select the account, just add <br />
<br />
#SBATCH -A [account]<br />
<br />
to the job scripts, or use the <tt>-A [account]</tt> to <tt>salloc</tt> or <tt>debugjob</tt>. <br />
<br />
To see which accounts you have access to, or what their names are, use the command<br />
<br />
sshare -U<br />
<br />
It has been noted that, in some cases, using the '-A' flag does not result in the appropriate account being used. To get around this, specify the account when sbatch is invoked:<br />
sbatch -A account myjobscript.sh<br />
<br />
== Slurm environment variables ==<br />
<br />
There are many environment variables built into Slurm. These are some which you may find useful:<br />
* SLURM_SUBMIT_DIR: directory from which the job was submitted.<br />
* SLURM_SUBMIT_HOST: host from which the job was submitted.<br />
* SLURM_JOB_ID: the job's id.<br />
* SLURM_JOB_NUM_NODES: number of nodes in the job.<br />
* SLURM_JOB_NODELIST: list of nodes assigned to the job.<br />
* SLURM_JOB_ACCOUNT: account associated with the job.<br />
<br />
Any of these environment variables can be accessed from within your job script.<br />
<br />
== Passing Variables to submission scripts ==<br />
It is possible to pass values through environment variables into your SLURM submission scripts.<br />
For doing so with already defined variables in your shell, just add the following directive in the submission script,<br />
<br />
#SBATCH --export=ALL<br />
<br />
and you will have access to any predefined environment variable.<br />
<br />
A better way is to specify explicitly which variables you want to pass into the submision script,<br />
<br />
sbatch --export=i=15,j='test' jobscript.sbatch<br />
<br />
You can even set the job name and output files using environment variables, eg.<br />
<br />
i="simulation"<br />
j=14<br />
sbatch --job-name=$i.$j.run --output=$i.$j.out --export=i=$i,j=$j jobscript.sbatch<br />
<br />
(The latter only works on the command line; you cannot use environment variables in <tt>#SBATCH</tt> lines in the job script.)<br />
<br />
'''Command line arguments:'''<br />
<br />
Command line arguments can also be used in the same way as command line argument for shell scripts. All command line arguments given to sbatch that follow after the job script name, will be passed to the job script. In fact, SLURM will not look at any of these arguments, so you must place all sbatch arguments before the script name, e.g.:<br />
<br />
sbatch -p debug jobscript.sbatch FirstArgument SecondArgument ...<br />
<br />
In this example, <tt>-p debug</tt> is interpreted by SLURM, while in your submission script you can access <tt>FirstArgument</tt>, <tt>SecondArgument</tt>, etc., by referring to <code>$1, $2, ...</code>.<br />
<br />
== Job arrays ==<br />
<br />
Sometimes you need to run the same job script many times, but just tweaking one value each time. One way of accomplishing this is using job arrays. Job arrays are invoked using the "-a" flag with sbatch:<br />
sbatch -a 1-100 myjobscript.sh<br />
This will submit 100 instances of myjobscript.sh. Within the job script you can distinguish which of those instances is running using the environment variable SLURM_ARRAY_TASK_ID.<br />
<br />
Note that Niagara [[#Limits | currently]] has a limit of 1000 submitted jobs for users within groups with allocations, and 200 submitted jobs without an allocation.<br />
<br />
== Job dependencies ==<br />
<br />
You can make one job dependent on the successful completion of another job using the following command:<br />
sbatch --dependency=afterok:JOBID myjobscript.sh<br />
This will make the current job submission not start until the parent job, with jobid JOBID, successfully completes. There are many job dependency options available. Visit the [https://slurm.schedmd.com/sbatch.html Slurm sbatch page ] for the full list. <br />
<br />
If the parent job fails (that is, ends with a non-zero exit code) the dependent job can never be scheduled and will be automatically cancelled.<br />
<br />
== Email Notification ==<br />
Email notification works, but you need to add the email address and type of notification you may want to receive in your submission script, eg.<br />
<br />
#SBATCH --mail-user=YOUR.email.ADDRESS<br />
#SBATCH --mail-type=ALL<br />
<br />
The sbatch man page (type <tt>man sbatch</tt> on Niagara) explains all possible mail-types.<br />
<br />
= Monitoring jobs =<br />
<br />
There are many options available for monitoring your jobs. The most basic of which is the squeue command:<br />
<br />
nia-login07:~$ squeue -u USERNAME<br />
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)<br />
292047 compute myjob4 username PD 0:00 4 (Priority)<br />
292048 compute myjob3 username PD 0:00 4 (Priority)<br />
266829 compute myjob2 username R 18:56:17 2 nia[1397-1398]<br />
266828 compute myjob1 username R 18:56:46 1 nia1298<br />
<br />
Here you can see that we have two running jobs ('R') and two pending jobs ('PD'). The nodes being used are listed.<br />
<br />
== Job status ==<br />
<br />
To get an estimate of when a job will start, use the command<br />
squeue --start -j JOBID<br />
Note that this is only an estimate, and tend not to be very accurate.<br />
<br />
Information about a specific job can be found using the <br />
squeue -j JOBID<br />
or alternatively<br />
scontrol show job JOBID<br />
which is more verbose.<br />
<br />
== SSHing to a node ==<br />
<br />
Once your job has started, the node belongs to you. As such you may, from a login node, SSH into the node to check the performance of your job. The first step is to find out which nodes are being used (see above). Once you have your list of nodes, you can SSH into them directly. Once there, you can run the 'top' or 'free' commands to check both CPU and memory usage.<br />
<br />
== jobperf ==<br />
<br />
The jobperf script will give you feedback on the performance of your currently-running job:<br />
nia-login07:~$ jobperf 123456<br />
----------------------------------------------------------------------------------------------------<br />
RUNNING IDLE USER MEMORY(MB) PROCESS NAMES<br />
HOSTNAME # %CPU %MEM DISK SLEEP NAME RAMDISK USED AVAIL (excl:bash,sh,ssh,sshd)<br />
----------------------------------------------------------------------------------------------------<br />
nia1013 71 6999% 0.5% 0 22 ejspence 0 15060 178017 14*gmx_mpi mpiexec slurm_script<br />
nia1014 79 7677% 0.1% 0 18 ejspence 0 14803 178274 13*gmx_mpi<br />
nia1295 79 7517% 0.4% 0 18 ejspence 0 15199 177878 13*gmx_mpi<br />
----------------------------------------------------------------------------------------------------<br />
<br />
Here you can see both the CPU and memory usage of the job, for all nodes being used.<br />
<br />
== Other commands ==<br />
<br />
Some other commands had can be useful for dealing with your jobs:<br />
* <code>scancel -i JOBID</code> cancels a specific job.<br />
* <code>sacct</code> gives information about your recent jobs.<br />
* <code>sinfo -p compute</code> gives a list of available nodes.<br />
* <code>qsum</code> gives a summary of the queue by user.<br />
<br />
= Example submission scripts =<br />
<br />
Here we present some examples of how to create submission scripts for running parallel jobs. Serial job examples can be found on the [[Running_Serial_Jobs_on_Niagara | serial jobs page]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=8<br />
#SBATCH --ntasks=320<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li><p>First line indicates that this is a bash script.</p></li><br />
<li><p>Lines starting with <code>#SBATCH</code> go to SLURM.</p></li><br />
<li><p>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</p></li><br />
<li><p>In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.</p></li><br />
<li><p>Note that the mpifun flag "--ppn" (processors per node) is ignored.</p></li><br />
<li><p>Once it found such a node, it runs the script:</p><br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application.</li><br />
</ul><br />
<li>To use hyperthreading, just change --ntasks=320 to --ntasks=640, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=1818Teach2018-12-13T16:07:08Z<p>Bmundim: /* Limits */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with two octal core Intel Xeon (Sandybridge) E5-2650 2.0GHz CPUs with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 x86_64 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
<br />
== Software Modules ==<br />
<pre> <br />
module avail<br />
</pre><br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
In most cases, you will want to submit from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
It is worth mentioning some differences between niagara and teach clusters:<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There are 2 queues available: Compute queue and debug queue. Their usage limits are listed on the table below.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 6 || 12 || 1 core || 8 nodes (128 cores)|| 15 minutes || 4 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (16 cores) || 4 nodes (64 cores)|| N/A || 1 hour<br />
|}<br />
<br />
Within these limits, jobs will still have to wait in the queue. The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=1719Teach2018-11-16T19:31:12Z<p>Bmundim: /* Limits */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with two octal core Intel Xeon (Sandybridge) E5-2650 2.0GHz CPUs with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 x86_64 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
<br />
== Software Modules ==<br />
<pre> <br />
module avail<br />
</pre><br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
In most cases, you will want to submit from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
It is worth mentioning some differences between niagara and teach clusters:<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There are 2 queues available: Compute queue and debug queue. Their usage limits are listed on the table below.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 6 || 12 || 1 core || 5 nodes (80 cores)|| 15 minutes || 4 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (16 cores) || 4 nodes (64 cores)|| N/A || 1 hour<br />
|}<br />
<br />
Within these limits, jobs will still have to wait in the queue. The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=1689Teach2018-11-01T20:38:05Z<p>Bmundim: /* Limits */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with two octal core Intel Xeon (Sandybridge) E5-2650 2.0GHz CPUs with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 x86_64 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
<br />
== Software Modules ==<br />
<pre> <br />
module avail<br />
</pre><br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
In most cases, you will want to submit from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
It is worth mentioning some differences between niagara and teach clusters:<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There are 2 queues available: Compute queue and debug queue. Their usage limits are listed on the table below.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs ||compute || 2 || 10 || 1 core || 5 nodes (80 cores)|| 15 minutes || 4 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (16 cores) || 4 nodes (64 cores)|| N/A || 1 hour<br />
|}<br />
<br />
Within these limits, jobs will still have to wait in the queue. The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=1688Teach2018-11-01T20:32:56Z<p>Bmundim: /* Submit a Job */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with two octal core Intel Xeon (Sandybridge) E5-2650 2.0GHz CPUs with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 x86_64 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
<br />
== Software Modules ==<br />
<pre> <br />
module avail<br />
</pre><br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
In most cases, you will want to submit from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
It is worth mentioning some differences between niagara and teach clusters:<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node (there is no hyperthreading). Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.<br />
* The maximum walltime is currently set to 4 hours.<br />
* There are 2 queues available: Compute queue and debug queue. Their usage limits are listed on the table below.<br />
<br />
== Limits ==<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the most common case. <br />
<br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Running jobs<br />
!Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Compute jobs without allocation ("default")||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 12 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Within these limits, jobs will still have to wait in the queue. The waiting time depends on many factors such as the allocation amount, how much allocation was used in the recent past, the number of nodes and the walltime, and how many other jobs are waiting in the queue.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Teach&diff=1687Teach2018-11-01T20:24:14Z<p>Bmundim: /* Submit a Job */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]] <br />
|name=Teach Cluster <br />
|installed=(orig Feb 2013), Oct 2018<br />
|operatingsystem= Linux (Centos 7.4)<br />
|loginnode= teach01 (from <tt>teach.scinet</tt>)<br />
|nnodes=42 <br />
|rampernode=64 Gb <br />
|corespernode=16 <br />
|interconnect=Infiniband (QDR)<br />
|vendorcompilers=icc/gcc<br />
|queuetype=slurm<br />
}}<br />
<br />
== Teaching Cluster ==<br />
<br />
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production [[Niagara_Quickstart | Niagara ]] system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.<br />
<br />
== Specifications==<br />
<br />
The cluster consists of 42 repurposed x86_64 nodes each with two octal core Intel Xeon (Sandybridge) E5-2650 2.0GHz CPUs with 64GB of RAM per node. <br />
The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 x86_64 cores.<br />
<br />
== Login/Devel Node ==<br />
<br />
Login via ssh with your scinet account to '''<tt>teach.scinet.utoronto.ca</tt>''', which will bring directly to '''<tt>teach01</tt>''' the gateway/devel node for this cluster. <br />
From '''<tt>teach01</tt>''' you can compile, do short tests, and submit your jobs to the queue.<br />
<br />
== Software Modules ==<br />
<pre> <br />
module avail<br />
</pre><br />
<br />
== Submit a Job ==<br />
<br />
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
teach01:~scratch$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
In most cases, you will want to submit from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).<br />
<br />
It is worth mentioning some differences between niagara and teach clusters:<br />
* Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node. Make sure to adjust accordingly the flag --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]]. <br />
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.</div>Bmundimhttps://docs.scinet.utoronto.ca/index.php?title=Niagara_Quickstart&diff=1569Niagara Quickstart2018-09-27T22:06:22Z<p>Bmundim: /* Limits */</p>
<hr />
<div>{{Infobox Computer<br />
|image=[[Image:Niagara.jpg|center|300px|thumb]]<br />
|name=Niagara<br />
|installed=Jan 2018<br />
|operatingsystem= CentOS 7.4 <br />
|loginnode= niagara.scinet.utoronto.ca<br />
|nnodes= 1500 nodes (60,000 cores)<br />
|rampernode=188 GiB / 202 GB <br />
|corespernode=40 (80 hyperthreads)<br />
|interconnect=Mellanox Dragonfly+<br />
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)<br />
|queuetype=Slurm<br />
}}<br />
<br />
=Specifications=<br />
<br />
The Niagara cluster is a large cluster of 1500 Lenovo SD350 servers each with 40 Intel "Skylake" cores at 2.4 GHz. <br />
The peak performance of the cluster is 3.02 PFlops delivered / 4.6 PFlops theoretical. It is the 53rd fastest supercomputer on the [https://www.top500.org/list/2018/06/?page=1 TOP500 list of June 2018]. <br />
<br />
Each node of the cluster has 188 GiB / 202 GB RAM per node (at least 4 GiB/core for user jobs). Being designed for large parallel workloads, it has a fast interconnect consisting of EDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 12 or 24 hours (for default or RAC accounts, respectively) and favours large jobs.<br />
<br />
* See the [https://support.scinet.utoronto.ca/education/go.php/370/content.php/cid/1383/ "Intro to Niagara"] recording<br />
<br />
More detailed hardware characteristics of the Niagara supercomputer can be found [https://docs.computecanada.ca/wiki/Niagara on this page].<br />
<br />
= Getting started on Niagara =<br />
<br />
Those of you new to SciNet and belonging to a group whose primary PI does not have an allocation, as granted in the annual [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions Compute Canada RAC], must first follow the old route of [https://www.scinethpc.ca/getting-a-scinet-account/ requesting a SciNet Consortium Account on the CCDB site] to gain access to Niagara.<br />
<br />
Please read this document carefully. The [[FAQ]] is also a useful resource. If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].<br />
<br />
== Logging in ==<br />
<br />
Niagara runs CentOS 7, which is a type of Linux. You will need to be familiar with Linux systems to function on Niagara. If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&search=scmp101&include=all&filter=Filter Introduction to Linux Shell] class.<br />
<br />
As with all SciNet and CC (Compute Canada) compute systems, access to Niagara is done via [[SSH]] (secure shell) only. Open a terminal window (e.g. Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.computecanada.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Niagara login nodes with your CC credentials:<br />
<br />
$ ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca<br />
<br />
or<br />
<br />
$ ssh -Y MYCCUSERNAME@niagara.computecanada.ca<br />
<br />
* The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.<br />
* These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.<br />
* The optional <code>-Y</code> is needed to open windows from the Niagara command-line onto your local X server.<br />
* To run on Niagara's compute nodes, you must [[#Submitting_jobs | submit a batch job]].<br />
<br />
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.<br />
<br />
== Your various directories ==<br />
<br />
By virtue of your access to Niagara you are granted storage space on the system. There are several directories available to you, each indicated by an associated environment variable.<br />
<br />
=== home and scratch ===<br />
<br />
You have a home and scratch directory on the system, whose locations are of the form<br />
<br />
$HOME=/home/g/groupname/myccusername<br />
$SCRATCH=/scratch/g/groupname/myccusername<br />
<br />
where groupname is the name of your PI's group, and myccusername is your CC username. For example:<br />
<br />
nia-login07:~$ pwd<br />
/home/s/scinet/rzon<br />
nia-login07:~$ cd $SCRATCH<br />
nia-login07:rzon$ pwd<br />
/scratch/s/scinet/rzon<br />
<br />
NOTE: home is read-only on compute nodes.<br />
<br />
=== project and archive ===<br />
<br />
Users from groups with [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC storage allocation] will also have a project and/or archive directory.<br />
<br />
$PROJECT=/project/g/groupname/myccusername<br />
$ARCHIVE=/archive/g/groupname/myccusername<br />
<br />
NOTE: Currently archive space is available only via [[HPSS]].<br />
<br />
'''''IMPORTANT: Future-proof your scripts'''''<br />
<br />
When writing your scripts, use the environment variables (<tt>$HOME</tt>, <tt>$SCRATCH</tt>, <tt>$PROJECT</tt>, <tt>$ARCHIVE</tt>) instead of the actual paths! The paths may change in the future.<br />
<br />
=== Storage and quotas ===<br />
<br />
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use them. This table summarizes the various file systems. See the [[Data_Management | Data Management]] page for more details.<br />
<br />
{| class="wikitable"<br />
! location<br />
!colspan="2"| quota<br />
!align="right"| block size<br />
! expiration time<br />
! backed up<br />
! on login nodes<br />
! on compute nodes<br />
|-<br />
| $HOME<br />
|colspan="2"| 100 GB per user<br />
|align="right"| 1 MB<br />
| <br />
| yes<br />
| yes<br />
| read-only<br />
|-<br />
|rowspan="2"| $SCRATCH<br />
|colspan="2"| 25 TB per user<br />
|align="right" rowspan="2" | 16 MB<br />
|rowspan="2"| 2 months<br />
|rowspan="2"| no<br />
|rowspan="2"| yes<br />
|rowspan="2"| yes<br />
|-<br />
|align="right"|50-500TB per group<br />
|align="right"|[[Data_Management#Quotas_and_purging | depending on group size]]<br />
|-<br />
| $PROJECT<br />
|colspan="2"| by group allocation<br />
|align="right"| 16 MB<br />
| <br />
| yes<br />
| yes<br />
| yes<br />
|-<br />
| $ARCHIVE<br />
|colspan="2"| by group allocation<br />
|align="right"| <br />
|<br />
| dual-copy<br />
| no<br />
| no<br />
|-<br />
| $BBUFFER<br />
|colspan="2"| 10 TB per user<br />
|align="right"| 1 MB<br />
| very short<br />
| no<br />
| yes<br />
| yes<br />
|}<br />
<br />
=== Moving data to Niagara ===<br />
<br />
If you need to move data to Niagara for analysis, or when you need to move data off of Niagara, use the following guidelines:<br />
* If your data is less than 10GB, move the data using the login nodes.<br />
* If your data is greater than 10GB, move the data using the datamover nodes nia-datamover1.scinet.utoronto.ca and nia-datamover2.scinet.utoronto.ca .<br />
<br />
Details of how to use the datamover nodes can be found on the [[Data_Management#Moving_data | Data Management ]] page.<br />
<br />
= Loading software modules =<br />
<br />
You have two options for running code on Niagara: use existing software, or [[Niagara_Quickstart#Compiling_on_Niagara:_Example | compile your own]]. This section focuses on the former.<br />
<br />
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].<br />
<br />
Common module subcommands are:<br />
<br />
* <code>module load <module-name></code>: load the default version of a particular software.<br />
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.<br />
* <code>module purge</code>: unload all currently loaded modules.<br />
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.<br />
* <code>module avail</code>: list loadable software packages.<br />
* <code>module list</code>: list loaded modules.<br />
<br />
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.<br />
<br />
There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.<br />
== Software stacks: NiaEnv and CCEnv ==<br />
<br />
On Niagara, there are two available software stacks:<br />
<br />
<ol style="list-style-type: decimal;"><br />
<li><p>A [https://docs.scinet.utoronto.ca/index.php/Modules_specific_to_Niagara Niagara software stack] tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with</p><br />
<code>module load NiaEnv</code></li><br />
<li><p>The same [https://docs.computecanada.ca/wiki/Modules software stack available on Compute Canada's General Purpose clusters] [https://docs.computecanada.ca/wiki/Graham Graham] and [https://docs.computecanada.ca/wiki/Cedar Cedar], compiled (for now) for a previous generation of CPUs:</p><br />
<code>module load CCEnv</code><br />
<p>Or, if you want the same default modules loaded as on Cedar and Graham, then do<br />
</p><p><br />
<code>module load CCEnv</code><br />
</p><p><br />
<code>module load StdEnv</code><br />
</p><br />
</li></ol><br />
<br />
== Tips for loading software ==<br />
<br />
* We advise '''''against''''' loading modules in your .bashrc. This can lead to very confusing behaviour under certain circumstances. Our guidelines for .bashrc files can be found [[bashrc guidelines|here]].<br />
* Instead, load modules by hand when needed, or by sourcing a separate script.<br />
* Load run-specific modules inside your job submission script.<br />
* Short names give default versions; e.g. <code>intel</code> → <code>intel/2018.2</code>. It is usually better to be explicit about the versions, for future reproducibility.<br />
* Modules often require other modules to be loaded first. Solve these dependencies by using [[Using_modules#Module_spider | <code>module spider</code>]].<br />
<br />
= Available compilers and interpreters =<br />
<br />
* For most compiled software, one should use the Intel compilers (<tt>icc</tt> for C, <tt>icpc</tt> for C++, and <tt>ifort</tt> for Fortran). Loading an <tt>intel</tt> module makes these available. <br />
* The GNU compiler suite (<tt>gcc, g++, gfortran</tt>) is also available, if you load one of the <tt>gcc</tt> modules.<br />
* Open source interpreted, interactive software is also available:<br />
** [[Python]]<br />
** [[R]]<br />
** Julia<br />
** Octave<br />
<br />
Please visit the [[Python]] or [[R]] page for details on using these tools. For information on running MATLAB applications on Niagara, visit [[MATLAB| this page]].<br />
<br />
= Using Commercial Software =<br />
<br />
May I use commercial software on Niagara?<br />
* Possibly, but you have to bring your own license for it. You can connect to an external license server using [[SSH_Tunneling | ssh tunneling]].<br />
* SciNet and Compute Canada have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.<br />
* Thus, the only freely available commercial software installed on Niagara is software that can benefit everyone: Compilers, math libraries and debuggers.<br />
* That means no [[MATLAB]], Gaussian, IDL, <br />
* Open source alternatives like Octave, [[Python]], and [[R]] are available.<br />
* We are happy to help you to install commercial software for which you have a license.<br />
* In some cases, if you have a license, you can use software in the Compute Canada stack.<br />
The list of commercial software which is installed on Niagara, for which you will need a license to use, can be found on the [[Commercial_software | commercial software page]].<br />
<br />
= Compiling on Niagara: Example =<br />
<br />
Suppose one wants to compile an application from two c source files, appl.c and module.c, which use the Math Kernel Library. This is an example of how this would be done:<br />
<source lang="bash"><br />
nia-login07:~$ module list<br />
Currently Loaded Modules:<br />
1) NiaEnv/2018a (S)<br />
Where:<br />
S: Module is Sticky, requires --force to unload or purge<br />
<br />
nia-login07:~$ module load intel/2018.2<br />
<br />
nia-login07:~$ ls<br />
appl.c module.c<br />
<br />
nia-login07:~$ icc -c -O3 -xHost -o appl.o appl.c<br />
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c<br />
nia-login07:~$ icc -o appl module.o appl.o -mkl<br />
<br />
nia-login07:~$ ./appl<br />
</source><br />
Note:<br />
* The optimization flags -O3 -xHost allow the Intel compiler to use instructions specific to the architecture CPU that is present (instead of for more generic x86_64 CPUs).<br />
* Linking with the Intel Math Kernel Library (MKL) is easy when using the intel compiler, it just requires the -mkl flags.<br />
* If compiling with gcc, the optimization flags would be -O3 -march=native. For the way to link with the MKL, it is suggested to use the [https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor MKL link line advisor].<br />
<br />
= Testing =<br />
<br />
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.<br />
* Small test jobs can be run on the login nodes. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than a couple of cores.<br />
* You can run the ddt debugger on the login nodes after <code>module load ddt</code>.<br />
* Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:<br />
nia-login07:~$ debugjob N<br />
where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=4 (the maximum), it gives you 30 minutes. Finally, if your debugjob process takes more than 1 hour, you can request an interactive job from the regular queue using the salloc command. Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.<br />
nia-login07:~$ salloc --nodes N --time=M:00:00<br />
where N is again the number of nodes, and M is the number of hours you wish the job to run.<br />
If you need to use graphics while testing your code through salloc, e.g. when using a debugger such as DDT or DDD, you have the following options, please visit the [[Testing_With_Graphics | Testing with graphics]] page.<br />
<br />
= Submitting jobs =<br />
<br />
<!-- == Progressive approach to run jobs on niagara == --><br />
<!-- We would like to emphasize the need for users to adopt a more progressive and explicit approach for testing, running and scaling up of jobs on niagara. [[Progressive_Approach | '''Here is a set of steps we suggest that you follow.''']] --><br />
<br />
Once you have compiled and tested your code or workflow on the Niagara login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Niagara's 1500 compute nodes. When and where your job runs is determined by the scheduler.<br />
<br />
Niagara uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].<br />
<br />
You submit jobs from a login node by passing a script to the sbatch command:<br />
<br />
nia-login07:~$ sbatch jobscript.sh<br />
<br />
This puts the job in the queue. It will run on the compute nodes in due course.<br />
<br />
Jobs will run under your group's RRG allocation, or, if the your group has none, under a RAS allocation (previously called `default' allocation).<br />
<br />
Keep in mind:<br />
* Scheduling is by node, so in multiples of 40 cores.<br />
* If your group has an allocation, your job's maximum walltime is 24 hours. If your group is without an allocation, your job's maximum walltime is 12 hours.<br />
* Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).<br />
* Compute nodes have no internet access.<br />
* [[Data_Management#Moving_data | Move your data]] to Niagara before you submit your job.<br />
<br />
== Scheduling by Node ==<br />
<br />
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node what resources should be allocated. On Niagara things are a bit different.<br />
* All job resource requests on Niagara are scheduled as a multiple of '''nodes'''.<br />
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.<br />
** No other users are running anything on them.<br />
** You can [[SSH]] into them to see how things are going.<br />
* Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.<br />
* Memory requests to the scheduler are of no use. Your job always gets N x 202GB of RAM, where N is the number of nodes and 202GB is the amount of memory on the node.<br />
* If you run serial jobs you must still use all 40 cores on the node. Visit the [[Running_Serial_Jobs_on_Niagara | serial jobs]] page for examples of how to do this.<br />
* Since there are 40 cores per node, your job should use N x 40 cores. If you do not, we will contact you to help you optimize your workflow. Or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.<br />
<br />
== Limits ==<br />
<br />
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which 'partition' the job runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the <tt>-p</tt> parameter to <tt>sbatch</tt> or <tt>salloc</tt>, but if you do not specify one, your job will run in the <tt>compute</tt> partition, which is the most common case. <br />
<br />
{| class="wikitable"<br />
!Usage<br />
!Partition<br />
!Limit on Running jobs<br />
!Limit on Submitted jobs (incl. running)<br />
!Min. size of jobs<br />
!Max. size of jobs<br />
!Min. walltime<br />
!Max. walltime <br />
|-<br />
|Compute jobs with an allocation||compute || 50 || 1000 || 1 node (40 cores) || 1000 nodes (40000 cores)|| 15 minutes || 24 hours<br />
|-<br />
|Compute jobs without allocation ("default")||compute || 50 || 200 || 1 node (40 cores) || 20 nodes (800 cores)|| 15 minutes || 12 hours<br />
|-<br />
|Testing or troubleshooting || debug || 1 || 1 || 1 node (40 cores) || 4 nodes (160 cores)|| N/A || 1 hour<br />
|-<br />
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (max 5 total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours<br />
|-<br />
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour<br />
|}<br />
<br />
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.<br />
<br />
== File Input/Output Tips ==<br />
<br />
It is important to understand the file systems, so as to perform your file I/O (Input/Output) responsibly. Refer to the [[Data_Management | Data Management]] page for details about the file systems.<br />
* Your files can be seen on all Niagara login and compute nodes.<br />
* $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.<br />
* GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.<br />
* Accessing data sets which consist of many, small files leads to poor performance on GPFS.<br />
* Avoid reading and writing lots of small amounts of data to disk. Many small files on the system waste space and are slower to access, read and write. If you must write many small files, use [[User_Ramdisk | ramdisk]].<br />
* Write data out in a binary format. This is faster and takes less space.<br />
* The [[Burst Buffer]] is another option for I/O heavy-jobs and for speeding up [[Checkpoints|checkpoints]].<br />
<br />
== Example submission script (MPI) ==<br />
<br />
<source lang="bash">#!/bin/bash <br />
#SBATCH --nodes=2<br />
#SBATCH --ntasks=80<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name mpi_job<br />
#SBATCH --output=mpi_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
module load openmpi/3.1.0<br />
<br />
mpirun ./mpi_example<br />
# or "srun ./mpi_example"<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch mpi_job.sh<br />
<br />
<ul><br />
<li>First line indicates that this is a bash script.</li><br />
<li>Lines starting with <code>#SBATCH</code> go to SLURM.</li><br />
<li>sbatch reads these lines as a job request (which it gives the name <code>mpi_job</code>)</li><br />
<li>In this case, SLURM looks for 2 nodes (each of which will have 40 cores) on which to run a total of 80 tasks, for 1 hour.<br>(Instead of specifying <tt>--ntasks=80</tt>, you can also ask for <tt>--ntasks-per-node=40</tt>, which amounts to the same.)</li><br />
<li>Note that the mpifun flag "--ppn" (processors per node) is ignored.</li><br />
<li>Once it found such a node, it runs the script:<br />
<ul><br />
<li>Change to the submission directory;</li><br />
<li>Loads modules;</li><br />
<li>Runs the <code>mpi_example</code> application (SLURM will inform mpirun or srun on how many processes to run).<br />
</li><br />
</ul><br />
<li>To use hyperthreading, just change --ntasks=80 to --ntasks=160, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).</li><br />
</ul><br />
<br />
== Example submission script (OpenMP) ==<br />
<br />
<source lang="bash">#!/bin/bash<br />
#SBATCH --nodes=1<br />
#SBATCH --cpus-per-task=40<br />
#SBATCH --time=1:00:00<br />
#SBATCH --job-name openmp_job<br />
#SBATCH --output=openmp_output_%j.txt<br />
#SBATCH --mail-type=FAIL<br />
<br />
cd $SLURM_SUBMIT_DIR<br />
<br />
module load intel/2018.2<br />
<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
<br />
./openmp_example<br />
# or "srun ./openmp_example".<br />
</source><br />
Submit this script with the command:<br />
<br />
nia-login07:~$ sbatch openmp_job.sh<br />
<br />
* First line indicates that this is a bash script.<br />
* Lines starting with <code>#SBATCH</code> go to SLURM.<br />
* sbatch reads these lines as a job request (which it gives the name <code>openmp_job</code>) .<br />
* In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.<br />
* Once it found such a node, it runs the script:<br />
** Change to the submission directory;<br />
** Loads modules;<br />
** Sets an environment variable;<br />
** Runs the <code>openmp_example</code> application.<br />
* To use hyperthreading, just change <code>--cpus-per-task=40</code> to <code>--cpus-per-task=80</code>.<br />
<br />
== Monitoring queued jobs ==<br />
<br />
Once the job is incorporated into the queue, there are some command you can use to monitor its progress.<br />
<br />
<ul><br />
<li><p><code>squeue</code> or <code>sqc</code> (a caching version of squeue) to show the job queue (<code>squeue -u $USER</code> for just your jobs);</p></li><br />
<li><p><code>squeue -j JOBID</code> to get information on a specific job</p><br />
<p>(alternatively, <code>scontrol show job JOBID</code>, which is more verbose).</p></li><br />
<li><p><code>squeue --start -j JOBID</code> to get an estimate for when a job will run; these tend not to be very accurate predictions.</p></li><br />
<li><p><code>scancel -i JOBID</code> to cancel the job.</p></li><br />
<li><p><code>jobperf JOBID</code> to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.</p></li><br />
<li><p><code>sacct</code> to get information on your recent jobs.</p></li><br />
</ul><br />
<br />
Further instructions for monitoring your jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]]. The [https://my.scinet.utoronto.ca my.SciNet] site is also a very useful tool for monitoring your current and past usage.<br />
<br />
= Visualization =<br />
Information about how to use visualization tools on Niagara is available on [[Visualization]] page.<br />
<br />
= Support =<br />
<br />
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]<br />
* [mailto:niagara@computecanada.ca niagara@computecanada.ca]</div>Bmundim