Teach
Teach Cluster | |
---|---|
Installed | (orig Feb 2013), Oct 2018 |
Operating System | Linux (Centos 7.4) |
Number of Nodes | 42 |
Interconnect | Infiniband (QDR) |
Ram/Node | 64 Gb |
Cores/Node | 16 |
Login/Devel Node | teach01 (from teach.scinet) |
Vendor Compilers | icc/gcc |
Queue Submission | slurm |
Teaching Cluster
SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the production Niagara system, however uses repurposed hardware. This system should not be used for production work as such the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to support@scinet.utoronto.ca.
Specifications
The cluster consists of 42 repurposed x86_64 nodes each with two octal core Intel Xeon (Sandybridge) E5-2650 2.0GHz CPUs with 64GB of RAM per node. The nodes are interconnected with 2.6:1 blocking QDR Infiniband for MPI communications and disk I/O to the SciNet Niagara filesystems. In total this cluster contains 672 x86_64 cores.
Login/Devel Node
Login via ssh with your scinet account to teach.scinet.utoronto.ca, which will bring directly to teach01 the gateway/devel node for this cluster. From teach01 you can compile, do short tests, and submit your jobs to the queue.
Software Modules
module avail
Submit a Job
Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the Slurm page.
You submit jobs from a login node by passing a script to the sbatch command:
teach01:~scratch$ sbatch jobscript.sh
This puts the job in the queue. It will run on the compute nodes in due course.
In most cases, you will want to submit from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).
It is worth mentioning some differences between niagara and teach clusters:
- Each teach cluster node has two CPUs with 8 cores each, a total of 16 cores per node. Make sure to adjust accordingly the flag --ntasks-per-node or --ntasks together with --nodes for the examples found at Slurm page.
- The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.