System Overview

The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:

1. CPU Partition

~240,000 cores across homogeneous CPU nodes
Non-blocking 400 Gb/s NDR InfiniBand interconnect
Ideal for large-scale parallel workloads

2. GPU Partition

61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs
800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand
Optimized for AI/ML and accelerated science workloads
Note: This partition is in high demand and not ideal for training extremely large models (multi-100B parameters)

3. Storage System

Unified 29 PB VAST NVMe storage for all workloads
No tiering — all flash-based for consistent performance
Accessible via POSIX or S3 under a unified namespace

Cooling and Energy Efficiency

Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:

PUE below 1.03 (high energy efficiency)
Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage
Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact

Specifications

The Trillium cluster is a large cluster comprised of two types of nodes:

nodes	cores	available memory	CPU	GPU
1224	192	768GB DDR5	2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3
60	96	768GB DDR5	1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3	4 x NVIDIA H100 SXM (80 GB memory)

Each node of the cluster has 768 GB RAM per node. Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.

Getting started on Trillium

Trillium
Trillium
Installed	Aug 2025
Operating System	Rocky Linux 9.6
Number of Nodes	1284 nodes (240768 cores)
Interconnect	Mellanox Dragonfly+
Ram/Node	768 GB
Cores/Node	192 (CPU nodes) and 96 (GPU nodes)
Login/Devel Node	trillium.scinet.utoronto.ca
Queue Submission	Slurm

Access to Trillium is not enabled automatically for everyone with an account with the Digital Reseach Alliance of Canada (formerly Compute Canada), but anyone with an active Alliance account can get their access enabled.

Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current Alliance (formerly Compute Canada) RAC allocation, you may need to request access on the opt-in page on the CCDB site. After clicking the "Join" button, it usually takes only one or two business days for access to be granted.

You can check if you already have Trillium access by attempting to log in. If you receive a "Permission denied" error (and your SSH key is correctly set up), you may need to opt in.

Please read this document carefully. The FAQ is also a useful resource. If at any time you require assistance, or if something is unclear, please do not hesitate to contact us.

Logging in

Trillium runs Rocky Linux 9.6, which is a type of Linux. You will need to be familiar with Linux systems to work on Trillium. If you are not it will be worth your time to review our Introduction to Linux Shell class.

As with all SciNet and Alliance (formerly Compute Canada) compute systems, access to Trillium is done via SSH (secure shell) only and authentication is only allowed via SSH keys. Please refer to this page to generate your SSH key pair and make sure you use them securely.

Open a terminal window (e.g. Connecting with PuTTY on Windows or Connecting with MobaXTerm), then SSH into the Trillium login nodes with your Alliance (formerly Compute Canada) credentials:

$ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca

The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.
These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.
The optional -Y enables X11 forwarding, allowing graphical programs to open windows on your local computer.
To run on Trillium compute nodes, you must submit a batch job.

If you cannot log in, be sure to first check the System Status on this site's front page.

Software Environment

Trillium uses the *environment modules* system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., `PATH`, `LD_LIBRARY_PATH`) so you can access different versions of software without conflicts.

A detailed explanation can be found on the modules page.

Commonly used module commands:

module load <module-name> – Load the default version of a software package.
module load <module-name>/<module-version> – Load a specific version.
module purge – Unload all currently loaded modules.
module avail – List available modules that can be loaded.
module list – Show currently loaded modules.
module spider or module spider <module-name> – Search for available modules and their versions.

Handy abbreviations are available:

ml – Equivalent to module list.
ml <module-name> – Equivalent to module load <module-name>.

Storage System

Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:

/home – For personal files and configurations.
/scratch – High-speed, temporary storage for job data.
/project – Shared storage for project teams and collaborations.

All three share a unified 29 PB NVMe-backed storage pool, with:

29 PB effective capacity (deduplicated via VAST)
16.7 PB raw flash capacity
714 GB/s read bandwidth, 275 GB/s write bandwidth
10 million read IOPS, 2 million write IOPS
POSIX and S3 access protocols under a unified namespace
48 C-Boxes and 14 D-Boxes for data services

The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.

Backup and Archive Storage

An additional 114 PB HPSS tape-based archive is available for nearline storage:

Dual-copy archive across geographically separate libraries
Used for both backup and archival purposes
Backups are managed using Atempo backup software

Trillium Quickstart

Contents