SciNet Users Documentation - User contributions [en]

Main Page

2026-05-01T16:56:09Z

Rzon:

Main Page

2026-05-01T16:00:23Z

Rzon: /* System Status */

Previous messages

2026-05-01T15:59:21Z

2026-03-12T19:21:34Z

Rzon: /* System Status */

Main Page

2026-03-06T20:49:02Z

Rzon: /* System Status */

2025-12-11T19:56:17Z

Rzon: /* Specifications */

This page describes the usage of the new Teach cluster, installed in Feb 2025. It is currently still somewhat in beta phase.

{{Infobox Computer
|image=[[Image:Ibm_idataplex_dx360_m4.jpg|center|300px|thumb]]
|name=Teach Cluster
|installed=(orig Mar 2020), Feb 2025
|operatingsystem= Linux (Rocky 9.5)
|loginnode=teach-login01
|nnodes=8
|rampernode=188 GiB / 202 GB
|corespernode=40
|interconnect=Infiniband (EDR)
|vendorcompilers=gcc
|queuetype=slurm
}}

== Teaching Cluster ==

SciNet has assembled some older compute hardware into a small cluster provided primarily for teaching purposes. It is configured similarly to the coming SciNet production systems Trillium, however it uses hardware repurposed from its predecessor, [[Niagara_Quickstart|Niagara]]. This system should not be used for production work as the queuing policies are designed to provide fast job turnover and limit the amount of resources one person can use at a time. Questions about its use or problems should be sent to '''support@scinet.utoronto.ca'''.

This Teach cluster is setup differently from its predecessor. See below for the main changes.

== Specifications==

This cluster currently consists of 64 repurposed x86_64 nodes each with 40 cores (from two 20 core Intel CascadeLake CPUs) running at 2.5GHz with 188GB of RAM per node.
The nodes are interconnected with 1:1 non-blocking EDR Infiniband for MPI communications, and disk I/O to a separate view of the VAST file system. In total, this cluster contains 320 cores, but there are plans to expand it if demand warrants it.

== Login/Devel Node ==

Teach runs Rocky Linux 9. You will need to be somewhat familiar with Linux systems to work on Teach. If you are not, it will be worth your time to review our [https://education.scinet.utoronto.ca/tag/index.php?tag=SCMP101 Introduction to Linux Shell] class.

As with all SciNet and {{Alliance}} systems, access to Teach is done via [[SSH]] (secure shell) only. Open a terminal window (e.g. using [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] or [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm] on Windows), and type
ssh -Y USERNAME@teach.scinet.utoronto.ca
This will bring directly to the command line of '''<tt>teach-login01</tt>''' or '''<tt>teach-login02</tt>''', which are the gateway/devel nodes for this cluster.
On these nodes, you can compile, do short tests, and submit your jobs to the queue.

The first time you login to Teach cluster, please make sure to check if the login node ssh key fingerprint
matches. [[Teach_fingerprints | See here how]].

The login nodes are shared between students of a
number of different courses. Use this node to develop and compile
code, to run short tests, and to submit computations to the scheduler (see below).

Note that access to the teach cluster is restricted to temporary accounts that start with the prefix '''lcl_uot''' + the course code + '''s''', and a number. Passwords for these accounts can be changed on the [https://portal.scinet.utoronto.ca/portaluserlogin SciNet user portal]. On the same site, users can upload a public ssh key if they want to connect using ssh keys.

== Software Modules ==

Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be [[Using_modules | found on the modules page]].

The Teach cluster makes the same [https://docs.alliancecan.ca/wiki/Available_software modules available] as on the [https://docs.alliancecan.ca General Purpose clusters of the Digital Research Alliance of Canada], with one caveat. On Teach, by default, only the "gentoo" module is loaded, which provides basic OS-level functionality.

Common module subcommands are:

* <code>module load <module-name></code>: load the default version of a particular software.
* <code>module load <module-name>/<module-version></code>: load a specific version of a particular software.
* <code>module purge</code>: unload all currently loaded modules.
* <code>module spider</code> (or <code>module spider <module-name></code>): list available software packages.
* <code>module avail</code>: list loadable software packages.
* <code>module list</code>: list loaded modules.

For example, to make the GNU compilers (gcc, g++ and gfortran) available, you should type

module load gcc

while the Intel compilers (icc, icpc and ifort) can be loaded by

module load intel

To get the default modules that are loaded on the General Purpose clusters, you can load the "StdEnv" module.

Along with modifying common environment variables, such as the PATH, these modules also create an '''EBROOT<MODULENAME>''' environment variable, which can be used to access commonly needed software directories, such as /include and /lib.

There are handy abbreviations for the module commands. <code>ml</code> is the same as <code>module list</code>, and <code>ml <module-name></code> is the same as <code>module load <module-name></code>.

A list of available software modules can be on [https://docs.alliancecan.ca/wiki/Available_software found on this page].

There are a few addition modules available as well, and more can be made available upon demand of the course instructors. Currently, the only additional modules are:

catch2/3.3.1 - A C++ test framework for unit-tests, TDD and BDD using C++14 and later.
misopy/0.5.2 - A probabilistic framework to analysize RNA-Seq data.
palemoon/33.6.0.1 - A web browser

== Interactive jobs ==

For a interactive sessions on a compute node of the teach cluster that give access to non-shared resources, use the 'debugjob' command.
teach01:~$ debugjob -n C
where C is the number of cores. An interactive session defaults to four hours when using at most one node (C<=40), and becomes 60 minutes when using four nodes (i.e., 120<C<=160), which is the maximum number of nodes allowed for an interactive session by debugjob.

For a short interactive sessions on a dedicated compute node of the teach cluster, use the 'debugjob' command as follows:
teach01:~$ debugjob N
where N is the number of nodes. On the Teach cluster, this is equivalent to <tt>debugjob -n 40*N </tt>. The positive integer number <tt>N</tt> can at most be 4.

If no arguments are given to <tt>debugjob</tt>, it allocates a single core on a Teach compute node.

There are limits on the resources you can get with a debugjob, and how long you can get them. No debugjob can run longer than four hours or use more than 160 cores, and each user can only run one at a time. For longer computations, jobs must be submitted to the scheduler.

== Submit a Job ==

Teach uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].

You submit jobs from a login node by passing a script to the sbatch command:

teach-login01:~$ sbatch jobscript.sh

This puts the job in the queue. It will run on the compute nodes in due course.

Note:
* Make sure to adjust accordingly the flags --ntasks-per-node or --ntasks together with --nodes for the examples found at [[Slurm | Slurm page]].
* The current slurm configuration of the teach cluster allocates compute resources by core as opposed to by node. That means your tasks might land on nodes that have other jobs running, i.e. they might share the node. If you want to avoid that, make sure to add the following directive in your submitting script: #SBATCH --exclusive. This forces your job to use the compute nodes exclusively.
* The maximum wall time is currently set to 4 hours.
* There are 2 queues available: Compute queue and debug queue. Their usage limits are listed on the table below.

== Limits ==
There are limits to the size and duration of your jobs, the number of jobs you can run and the number of jobs you can have queued. It also matters in which 'partition' the jobs runs. 'Partitions' are SLURM-speak for use cases. You specify the partition with the -p parameter to sbatch or salloc, but if you do not specify one, your job will run in the compute partition, which is the most common case.

{| class="wikitable"
!Usage
!Partition
!Running jobs
!Submitted jobs (incl. running)
!Min. size of jobs
!Max. size of jobs
!Min. walltime
!Max. walltime
|-
|Interactive testing or troubleshooting || debug || 1 || 1 || 1 core || 4 nodes (160 cores)|| N/A || 4 hours
|-
|Compute jobs ||compute || 1 || 12 || 1 core || 4 nodes (160 cores)|| 15 minutes || 4 hours
|}

Within these limits, jobs may still have to wait in the queue. Although there are no allocations on the teach cluster, the waiting time still depends on many factors, such as the number of nodes and the wall time, how many other jobs are waiting in the queue, and whether a job can fill an otherwise unused spot in the schedule.

== Main changes from Teach's predecessor ==

Although the cluster is once again called ''Teach'' and you connect with to teach.scinet.utoronto.ca as before, the system is setup differently from the previous Teach cluster in the following ways:

* There are now 2 dedicated login nodes, teach-login01 and teach-login02.
* The ssh fingerprints for these login nodes can be found on [[Teach_fingerprints]].
* The compute nodes have 40 cores. As before, you can request jobs by number of cores.
* Only temporary lcl_uot.... accounts can log in.
* Only the home directories of those account are mounted.
* In particular, the file systems from the other SciNet compute clusters (Niagara, Mist, Trillium,... ) are not and will not be mounted. You'll need to copy over any files that you need to use on the Teach cluster.
* There is no $SCRATCH. You can do all your work on $HOME, which is writable from compute nodes
* The software stack is the one supplied by the Alliance. There is no need to load 'CCEnv' to get them.
* But as before, if you're missing a module, we can still install it for you.