Specifications

Trillium Neptune Nodes
Trillium Neptune Nodes
Installed	March 2023
Operating System	Linux (Rocky 9.2)
Number of Nodes	40 (3,200 cores)
Interconnect	Infiniband+
Ram/Node	484 GiB / 520 GB
Cores/Node	80 (160 hyperthreads)
Login/Devel Node	trillium.scinet.utoronto.ca
Vendor Compilers	icc (C) ifort (fortran) icpc (C++)
Queue Submission	Slurm

The Trillium Neptune Nodes are a special partition of the Trillium cluster for dedicated projects administered by SciNet.

Each node of the cluster has 484 GiB / 520 GB RAM per node (about 6 GiB/core for jobs and roughly 475 GiB/node). It has a fast interconnect consisting of HDR InfiniBand network that is part of Trilliums's overall network topology. An interesting technical aspect is that these were the first nodes at SciNet that were entirely liquid-cooled. The Neptune nodes are all compute nodes that can only be accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours and favours large jobs. Jobs should be submitted from the Trillium (CPU) login nodes.

Login, Storage, and Software

Access to these resources is not open to general users of Trillium or of other CC resource. For those that do have access, the integration with Trillium and its file system means that we can refer to the Trillium Quickstart for:

Logging in
The directory and file system structure
Moving data to Trillium
Loading software modules
Available compilers and interpreters

Some groups of users of these nodes may be offered a different route to login and submit jobs instead of the Trillium login nodes.

The main differences with the regular Trillium nodes lie in the architecture of the CPUs (Intel vs AMD), the number of cores (40 vs. 192 on Trillium), and hyperthreading (which is enabled on neptune nodes but not on the Trillium nodes). Furthermore, there are differences on how to test, to debug, and to submit jobs to this partition, which will be explained below.

Testing and Debugging

You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.

Small test jobs can be run on the Trillium login nodes. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than a couple of cores.
You can run the DDT debugger on the login nodes after module load ddt-cpu.
Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:

tri-login06:~$ debugjob_neptune N

where N is the number of nodes, If N=1, this gives an interactive session one 1 hour, when N=2 (the maximum), it gives you 45 minutes.

Finally, if your debugjob process has to take more than 1 hour, you can request an interactive job from the regular queue using the salloc command. Note, however, that this may take some time to run, since it will be part of the regular queue, and will be run when the scheduler decides.

tri-login06:~$ salloc --nodes N --time=M:00:00 --x11 -p compute_neptune --qos neptune

where N is again the number of nodes, and M is the number of hours you wish the job to run. The --x11 is required if you need to use graphics while testing your code through salloc, e.g. when using a debugger such as DDT.

Submitting jobs

Once you have compiled and tested your code or workflow on a login node, and confirmed that it behaves correctly, you are ready to submit jobs to run on one or more of the 40 Neptune nodes of the Trillium cluster. When and where your job runs is determined by the scheduler.

Trillium uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the Slurm page.

You submit jobs from a login node by passing a script to the sbatch command:

tri-login06:scratch$ sbatch jobscript.sh

This puts the job in the queue. It will run on the compute nodes in due course. Note that you must submit your job from a login node. You cannot submit jobs from the datamover nodes.

In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).

Some example job scripts can be found below.

Keep in mind:

Scheduling is by node, so in multiples of 80 cores.
Your job's maximum walltime is 24 hours.
Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).
Compute nodes have no internet access.
Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below).

Scheduling by Node

On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of cpus-per-node what resources should be allocated. On Trillium things are a bit different.

All job resource requests on Trillium are scheduled as a multiple of nodes.
The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.
- No other users are running anything on them.
- You can SSH into them to see how things are going.
Whatever your requests to the scheduler, it will always be translated into a multiple of nodes allocated to your job.
Memory requests to the scheduler are of no use. Your job always gets N x 520GB of RAM, where N is the number of nodes and 520GB is the amount of memory on the node.
If you run serial jobs you should still aim to use all 80 cores on the node. Visit the serial jobs page for examples of how to do this on Trillium.
Since there are 80 cores per node, your job should use N x 80 cores. You can contact us to get assistance.

Limits

There are safeguard limits to the size and duration of jobs, the number of jobs you can run and the number of jobs you can have queued.

Usage	QOS	Partition	Limit on Running jobs	Limit on Submitted jobs (incl. running)	Min. size of jobs	Max. size of jobs	Min. walltime	Max. walltime
Compute jobs	neptune	compute_neptune	TDB	TDB	1 node (80 cores)	20 nodes (1600 cores)	15 minutes	24 hours
Testing or troubleshooting	neptune	debug_neptune	1	1	1 node (80 cores)	2 nodes (160 cores)	N/A	1 hour

Even if you respect these limits, your jobs may still have to wait in the queue if the nodes are busy.

Example submission script (MPI)

#!/bin/bash 
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=80
#SBATCH --time=1:00:00
#SBATCH -p compute_neptune
#SBATCH --qos neptune
#SBATCH --job-name=mpi_job
#SBATCH --output=mpi_output_%j.txt
#SBATCH --mail-type=FAIL

module load intel/2025.2.0
module load openmpi/5.0.8

source /scinet/vast/etc/vastpreload-openmpi.bash # important if doing MPI-IO

mpirun ./mpi_example # do not use "srun"

Submit this script from your scratch directory with the command:

   tri-login06:scratch$ sbatch mpi_job.sh

First line indicates that this is a bash script.
Lines starting with #SBATCH go to SLURM.
sbatch reads these lines as a job request (which it gives the name mpi_job)
In this case, SLURM looks for 2 nodes each running 80 tasks (for a total of 80 tasks), for 1 hour
These nodes must be in the compute_neptune partition, and in the neptune qos (quality-of-service). Note that both the -p and the --qos options must be specified.
Once it found such a node, it runs the script:
- Change to the submission directory;
- Loads modules;
- Preloads the MPI-IO library;
- Runs the mpi_example application (SLURM will inform mpirun on how many processes to run).
To use hyperthreading, just change --ntasks-per-node=80 to --ntasks-per-node=160, and add --bind-to none to the mpirun command (the latter is necessary for OpenMPI only, not when using IntelMPI).

Example submission script (OpenMP)

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=80
#SBATCH --time=1:00:00
#SBATCH -p compute_neptune
#SBATCH --qos neptune
#SBATCH --job-name=openmp_job
#SBATCH --output=openmp_output_%j.txt
#SBATCH --mail-type=FAIL

module load intel/2025

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

./openmp_example

Submit this script from your scratch directory with the command:

   tri-login06:~$ sbatch openmp_job.sh

First line indicates that this is a bash script.
Lines starting with #SBATCH go to SLURM.
sbatch reads these lines as a job request (which it gives the name openmp_job) .
In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.

These nodes must be in the compute_neptune partition, and in the neptune qos (quality-of-service). Note that both the -p and the --qos options must be specified.

Once it found such a node, it runs the script:
- Change to the submission directory;
- Loads the compiler module;
- Sets an environment variable;
- Runs the openmp_example application.
To use hyperthreading, just change --cpus-per-task=80 to --cpus-per-task=160.

Monitoring queued jobs

Once the job is incorporated into the queue, there are some commands you can use to monitor its progress.

squeue to show the job queue (squeue -u $USER for just your jobs);
squeue -j JOBID to get information on a specific job

(alternatively, scontrol show job JOBID, which is more verbose).
squeue --start -j JOBID to get an estimate for when a job will run; these tend not to be very accurate predictions.
scancel -i JOBID to cancel the job.
jobperf JOBID to get an instantaneous view of the cpu and memory usage of the nodes of the job while it is running.
sacct to get information on your recent jobs.

Further instructions for monitoring your jobs can be found on the Slurm page. The my.SciNet site is also a very useful tool for monitoring your current and past usage.

Support

support@scinet.utoronto.ca

Trillium Neptune Nodes

Contents