<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://docs.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Afedosee</id>
	<title>SciNet Users Documentation - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://docs.scinet.utoronto.ca/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Afedosee"/>
	<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php/Special:Contributions/Afedosee"/>
	<updated>2026-06-18T11:39:09Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.35.12</generator>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6878</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6878"/>
		<updated>2025-08-13T19:21:26Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Added quick reference table for commands&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;border: 2px solid #e6b800; background-color: #fff8e1; padding: 0.5em; font-weight: bold; text-align: center;&amp;quot;&amp;gt;&lt;br /&gt;
This Quickstart Guide is a &amp;lt;u&amp;gt;work in progress&amp;lt;/u&amp;gt;. Details may change as Trillium documentation is updated.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. [[#Submitting_Jobs_on_the_CPU_Subcluster|CPU Subcluster]]&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. [[#Submitting_Jobs_on_the_GPU_Subcluster|GPU Subcluster]]&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! Nodes !! Cores !! Available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
'''Login Node Usage Rules:'''&lt;br /&gt;
* Do not run large memory jobs (e.g., exceeding 2 GB).&lt;br /&gt;
* Do not run parallel training or multi-threaded processes.&lt;br /&gt;
* Do not run long-running computations (keep to under a few minutes).&lt;br /&gt;
* Do not run resource-intensive tasks like heavy I/O operations or simulations.&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
=== Environment Variables Quicklist ===&lt;br /&gt;
&lt;br /&gt;
Here is a summary of common environment variables:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Variable !! Description&lt;br /&gt;
|-&lt;br /&gt;
| $HOME || Path to your home directory (e.g., /home/username)&lt;br /&gt;
|-&lt;br /&gt;
| $SCRATCH || Path to your scratch directory for temporary job data (e.g., /scratch/username)&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_SUBMIT_DIR || Directory from which the job was submitted&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_JOBID || Unique ID of the current job&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_NODELIST || List of nodes allocated to the job&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_NNODES || Number of nodes allocated to the job&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; accounts are for default or resource allocation competition (RAC)-independent usage, while &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts are tied to specific research group allocations (RRG) granted through the Alliance's RAC process. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account. If you want your job to use the RRG allocation, be sure to specify it explicitly (e.g., &amp;lt;code&amp;gt;--account=rrg-groupname&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the GPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
The Trillium GPU subcluster is designed for AI/ML and accelerated science workloads.  &lt;br /&gt;
It has specific rules and resource limits that differ from the CPU subcluster.&lt;br /&gt;
&lt;br /&gt;
Everything in the [[#Submitting_Jobs_on_the_CPU_Subcluster|Submitting Jobs on the CPU Subcluster]] section applies here, with the following GPU-specific rules:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Requirement !! Details&lt;br /&gt;
|-&lt;br /&gt;
| '''Allowed GPU counts''' || Jobs must request exactly 1 GPU or a multiple of 4 GPUs. Fractional GPUs are not supported except for --gpus-per-node=1 (e.g., you cannot request --gpus-per-node=2 or 3).&lt;br /&gt;
|-&lt;br /&gt;
| '''Single-GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=1&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Whole-node GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; (or &amp;lt;code&amp;gt;-p compute_full_node&amp;lt;/code&amp;gt;).&lt;br /&gt;
|-&lt;br /&gt;
| '''Multi-node GPU jobs''' || Must request full nodes: &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Memory limits''' || The &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; option is not allowed.  &lt;br /&gt;
Per GPU: 192 GB host memory.  &lt;br /&gt;
Whole-node jobs: 768 GB total.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Accessing the GPU Subcluster ==&lt;br /&gt;
&lt;br /&gt;
* From outside: &amp;lt;code&amp;gt;ssh trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt;  &lt;br /&gt;
* From another Trillium node: &amp;lt;code&amp;gt;ssh trig-login01&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Single-GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=single_gpu_job         # Job name&lt;br /&gt;
#SBATCH --output=single_gpu_job_%j.out    # Output file (%j = job ID)&lt;br /&gt;
#SBATCH --nodes=1                         # Request 1 node&lt;br /&gt;
#SBATCH --gpus-per-node=1                 # Request 1 GPU&lt;br /&gt;
#SBATCH --time=00:30:00                   # Max runtime (30 minutes)&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Check GPU allocation&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Run your workload&lt;br /&gt;
srun python my_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Whole-Node (4 GPUs) Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=whole_node_gpu_job&lt;br /&gt;
#SBATCH --output=whole_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --partition=compute_full_node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --gpus-per-node=4&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
srun python my_distributed_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Multi-Node GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=multi_node_gpu_job&lt;br /&gt;
#SBATCH --output=multi_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --nodes=2                        # Request 2 full nodes&lt;br /&gt;
#SBATCH --gpus-per-node=4                # 4 GPUs per node (full node)&lt;br /&gt;
#SBATCH --partition=compute_full_node    # Required for full-node jobs&lt;br /&gt;
#SBATCH --time=04:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
# Check all GPUs allocated&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Example: run a distributed training job with 8 GPUs (2 nodes × 4 GPUs)&lt;br /&gt;
srun python train_distributed.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Best Practices for GPU Jobs ==&lt;br /&gt;
&lt;br /&gt;
* Do not use &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; — memory is fixed per GPU (192 GB) or per node (768 GB).&lt;br /&gt;
* Always specify GPU counts and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; for whole-node or multi-node jobs.&lt;br /&gt;
* Load only the modules you need — see [[Using_modules]].&lt;br /&gt;
* Be explicit with software versions for reproducibility (e.g., &amp;lt;code&amp;gt;cuda/12.6&amp;lt;/code&amp;gt; rather than just &amp;lt;code&amp;gt;cuda&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Test on a single GPU before scaling to multiple GPUs or nodes.&lt;br /&gt;
* Monitor usage with &amp;lt;code&amp;gt;nvidia-smi&amp;lt;/code&amp;gt; to ensure GPUs are fully utilized.&lt;br /&gt;
&lt;br /&gt;
= Quick Reference Table for Commands =&lt;br /&gt;
&lt;br /&gt;
Here is a summary of common commands for quick reference:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Command !! Description&lt;br /&gt;
|-&lt;br /&gt;
| sbatch &amp;lt;script&amp;gt; || Submit a batch job script&lt;br /&gt;
|-&lt;br /&gt;
| squeue [-u $USER] || View queued jobs (optionally for current user)&lt;br /&gt;
|-&lt;br /&gt;
| scancel &amp;lt;JOBID&amp;gt; || Cancel a job&lt;br /&gt;
|-&lt;br /&gt;
| sacct || View accounting data for past jobs&lt;br /&gt;
|-&lt;br /&gt;
| module load &amp;lt;module&amp;gt; || Load a software module&lt;br /&gt;
|-&lt;br /&gt;
| module list || List loaded modules&lt;br /&gt;
|-&lt;br /&gt;
| module avail || List available modules&lt;br /&gt;
|-&lt;br /&gt;
| module spider &amp;lt;module&amp;gt; || Search for modules and dependencies&lt;br /&gt;
|-&lt;br /&gt;
| debugjob N || Request a short debug job on N nodes&lt;br /&gt;
|-&lt;br /&gt;
| diskusage_report || Check storage quotas&lt;br /&gt;
|-&lt;br /&gt;
| jobperf &amp;lt;JOBID&amp;gt; || Monitor CPU and memory usage of a running job&lt;br /&gt;
|-&lt;br /&gt;
| nvidia-smi || Check GPU status (on GPU nodes)&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6875</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6875"/>
		<updated>2025-08-13T19:16:11Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Submitting Jobs on the CPU Subcluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;border: 2px solid #e6b800; background-color: #fff8e1; padding: 0.5em; font-weight: bold; text-align: center;&amp;quot;&amp;gt;&lt;br /&gt;
This Quickstart Guide is a &amp;lt;u&amp;gt;work in progress&amp;lt;/u&amp;gt;. Details may change as Trillium documentation is updated.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. [[#Submitting_Jobs_on_the_CPU_Subcluster|CPU Subcluster]]&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. [[#Submitting_Jobs_on_the_GPU_Subcluster|GPU Subcluster]]&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! Nodes !! Cores !! Available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
'''Login Node Usage Rules:'''&lt;br /&gt;
* Do not run large memory jobs (e.g., exceeding 2 GB).&lt;br /&gt;
* Do not run parallel training or multi-threaded processes.&lt;br /&gt;
* Do not run long-running computations (keep to under a few minutes).&lt;br /&gt;
* Do not run resource-intensive tasks like heavy I/O operations or simulations.&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
=== Environment Variables Quicklist ===&lt;br /&gt;
&lt;br /&gt;
Here is a summary of common environment variables:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Variable !! Description&lt;br /&gt;
|-&lt;br /&gt;
| $HOME || Path to your home directory (e.g., /home/username)&lt;br /&gt;
|-&lt;br /&gt;
| $SCRATCH || Path to your scratch directory for temporary job data (e.g., /scratch/username)&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_SUBMIT_DIR || Directory from which the job was submitted&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_JOBID || Unique ID of the current job&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_NODELIST || List of nodes allocated to the job&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_NNODES || Number of nodes allocated to the job&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; accounts are for default or resource allocation competition (RAC)-independent usage, while &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts are tied to specific research group allocations (RRG) granted through the Alliance's RAC process. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account. If you want your job to use the RRG allocation, be sure to specify it explicitly (e.g., &amp;lt;code&amp;gt;--account=rrg-groupname&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the GPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
The Trillium GPU subcluster is designed for AI/ML and accelerated science workloads.  &lt;br /&gt;
It has specific rules and resource limits that differ from the CPU subcluster.&lt;br /&gt;
&lt;br /&gt;
Everything in the [[#Submitting_Jobs_on_the_CPU_Subcluster|Submitting Jobs on the CPU Subcluster]] section applies here, with the following GPU-specific rules:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Requirement !! Details&lt;br /&gt;
|-&lt;br /&gt;
| '''Allowed GPU counts''' || Jobs must request exactly 1 GPU or a multiple of 4 GPUs.&lt;br /&gt;
|-&lt;br /&gt;
| '''Single-GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=1&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Whole-node GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; (or &amp;lt;code&amp;gt;-p compute_full_node&amp;lt;/code&amp;gt;).&lt;br /&gt;
|-&lt;br /&gt;
| '''Multi-node GPU jobs''' || Must request full nodes: &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Memory limits''' || The &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; option is not allowed.  &lt;br /&gt;
Per GPU: 192 GB host memory.  &lt;br /&gt;
Whole-node jobs: 768 GB total.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Accessing the GPU Subcluster ==&lt;br /&gt;
&lt;br /&gt;
* From outside: &amp;lt;code&amp;gt;ssh trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt;  &lt;br /&gt;
* From another Trillium node: &amp;lt;code&amp;gt;ssh trig-login01&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Single-GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=single_gpu_job         # Job name&lt;br /&gt;
#SBATCH --output=single_gpu_job_%j.out    # Output file (%j = job ID)&lt;br /&gt;
#SBATCH --nodes=1                         # Request 1 node&lt;br /&gt;
#SBATCH --gpus-per-node=1                 # Request 1 GPU&lt;br /&gt;
#SBATCH --time=00:30:00                   # Max runtime (30 minutes)&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Check GPU allocation&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Run your workload&lt;br /&gt;
srun python my_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Whole-Node (4 GPUs) Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=whole_node_gpu_job&lt;br /&gt;
#SBATCH --output=whole_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --partition=compute_full_node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --gpus-per-node=4&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
srun python my_distributed_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Multi-Node GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=multi_node_gpu_job&lt;br /&gt;
#SBATCH --output=multi_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --nodes=2                        # Request 2 full nodes&lt;br /&gt;
#SBATCH --gpus-per-node=4                # 4 GPUs per node (full node)&lt;br /&gt;
#SBATCH --partition=compute_full_node    # Required for full-node jobs&lt;br /&gt;
#SBATCH --time=04:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
# Check all GPUs allocated&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Example: run a distributed training job with 8 GPUs (2 nodes × 4 GPUs)&lt;br /&gt;
srun python train_distributed.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Best Practices for GPU Jobs ==&lt;br /&gt;
&lt;br /&gt;
* Do not use &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; — memory is fixed per GPU (192 GB) or per node (768 GB).&lt;br /&gt;
* Always specify GPU counts and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; for whole-node or multi-node jobs.&lt;br /&gt;
* Load only the modules you need — see [[Using_modules]].&lt;br /&gt;
* Be explicit with software versions for reproducibility (e.g., &amp;lt;code&amp;gt;cuda/12.6&amp;lt;/code&amp;gt; rather than just &amp;lt;code&amp;gt;cuda&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Test on a single GPU before scaling to multiple GPUs or nodes.&lt;br /&gt;
* Monitor usage with &amp;lt;code&amp;gt;nvidia-smi&amp;lt;/code&amp;gt; to ensure GPUs are fully utilized.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6872</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6872"/>
		<updated>2025-08-13T19:06:37Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Added common SLURM's env variables&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;border: 2px solid #e6b800; background-color: #fff8e1; padding: 0.5em; font-weight: bold; text-align: center;&amp;quot;&amp;gt;&lt;br /&gt;
This Quickstart Guide is a &amp;lt;u&amp;gt;work in progress&amp;lt;/u&amp;gt;. Details may change as Trillium documentation is updated.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. [[#Submitting_Jobs_on_the_CPU_Subcluster|CPU Subcluster]]&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. [[#Submitting_Jobs_on_the_GPU_Subcluster|GPU Subcluster]]&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! Nodes !! Cores !! Available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
'''Login Node Usage Rules:'''&lt;br /&gt;
* Do not run large memory jobs (e.g., exceeding 2 GB).&lt;br /&gt;
* Do not run parallel training or multi-threaded processes.&lt;br /&gt;
* Do not run long-running computations (keep to under a few minutes).&lt;br /&gt;
* Do not run resource-intensive tasks like heavy I/O operations or simulations.&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
=== Environment Variables Quicklist ===&lt;br /&gt;
&lt;br /&gt;
Here is a summary of common environment variables:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Variable !! Description&lt;br /&gt;
|-&lt;br /&gt;
| $HOME || Path to your home directory (e.g., /home/username)&lt;br /&gt;
|-&lt;br /&gt;
| $SCRATCH || Path to your scratch directory for temporary job data (e.g., /scratch/username)&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_SUBMIT_DIR || Directory from which the job was submitted&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_JOBID || Unique ID of the current job&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_NODELIST || List of nodes allocated to the job&lt;br /&gt;
|-&lt;br /&gt;
| $SLURM_NNODES || Number of nodes allocated to the job&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the GPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
The Trillium GPU subcluster is designed for AI/ML and accelerated science workloads.  &lt;br /&gt;
It has specific rules and resource limits that differ from the CPU subcluster.&lt;br /&gt;
&lt;br /&gt;
Everything in the [[#Submitting_Jobs_on_the_CPU_Subcluster|Submitting Jobs on the CPU Subcluster]] section applies here, with the following GPU-specific rules:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Requirement !! Details&lt;br /&gt;
|-&lt;br /&gt;
| '''Allowed GPU counts''' || Jobs must request exactly 1 GPU or a multiple of 4 GPUs.&lt;br /&gt;
|-&lt;br /&gt;
| '''Single-GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=1&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Whole-node GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; (or &amp;lt;code&amp;gt;-p compute_full_node&amp;lt;/code&amp;gt;).&lt;br /&gt;
|-&lt;br /&gt;
| '''Multi-node GPU jobs''' || Must request full nodes: &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Memory limits''' || The &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; option is not allowed.  &lt;br /&gt;
Per GPU: 192 GB host memory.  &lt;br /&gt;
Whole-node jobs: 768 GB total.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Accessing the GPU Subcluster ==&lt;br /&gt;
&lt;br /&gt;
* From outside: &amp;lt;code&amp;gt;ssh trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt;  &lt;br /&gt;
* From another Trillium node: &amp;lt;code&amp;gt;ssh trig-login01&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Single-GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=single_gpu_job         # Job name&lt;br /&gt;
#SBATCH --output=single_gpu_job_%j.out    # Output file (%j = job ID)&lt;br /&gt;
#SBATCH --nodes=1                         # Request 1 node&lt;br /&gt;
#SBATCH --gpus-per-node=1                 # Request 1 GPU&lt;br /&gt;
#SBATCH --time=00:30:00                   # Max runtime (30 minutes)&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Check GPU allocation&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Run your workload&lt;br /&gt;
srun python my_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Whole-Node (4 GPUs) Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=whole_node_gpu_job&lt;br /&gt;
#SBATCH --output=whole_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --partition=compute_full_node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --gpus-per-node=4&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
srun python my_distributed_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Multi-Node GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=multi_node_gpu_job&lt;br /&gt;
#SBATCH --output=multi_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --nodes=2                        # Request 2 full nodes&lt;br /&gt;
#SBATCH --gpus-per-node=4                # 4 GPUs per node (full node)&lt;br /&gt;
#SBATCH --partition=compute_full_node    # Required for full-node jobs&lt;br /&gt;
#SBATCH --time=04:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
# Check all GPUs allocated&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Example: run a distributed training job with 8 GPUs (2 nodes × 4 GPUs)&lt;br /&gt;
srun python train_distributed.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Best Practices for GPU Jobs ==&lt;br /&gt;
&lt;br /&gt;
* Do not use &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; — memory is fixed per GPU (192 GB) or per node (768 GB).&lt;br /&gt;
* Always specify GPU counts and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; for whole-node or multi-node jobs.&lt;br /&gt;
* Load only the modules you need — see [[Using_modules]].&lt;br /&gt;
* Be explicit with software versions for reproducibility (e.g., &amp;lt;code&amp;gt;cuda/12.6&amp;lt;/code&amp;gt; rather than just &amp;lt;code&amp;gt;cuda&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Test on a single GPU before scaling to multiple GPUs or nodes.&lt;br /&gt;
* Monitor usage with &amp;lt;code&amp;gt;nvidia-smi&amp;lt;/code&amp;gt; to ensure GPUs are fully utilized.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6869</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6869"/>
		<updated>2025-08-13T18:43:17Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Added Login Node Usage Rules&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;border: 2px solid #e6b800; background-color: #fff8e1; padding: 0.5em; font-weight: bold; text-align: center;&amp;quot;&amp;gt;&lt;br /&gt;
This Quickstart Guide is a &amp;lt;u&amp;gt;work in progress&amp;lt;/u&amp;gt;. Details may change as Trillium documentation is updated.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. [[#Submitting_Jobs_on_the_CPU_Subcluster|CPU Subcluster]]&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. [[#Submitting_Jobs_on_the_GPU_Subcluster|GPU Subcluster]]&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! Nodes !! Cores !! Available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
'''Login Node Usage Rules:'''&lt;br /&gt;
* Do not run large memory jobs (e.g., exceeding 2 GB).&lt;br /&gt;
* Do not run parallel training or multi-threaded processes.&lt;br /&gt;
* Do not run long-running computations (keep to under a few minutes).&lt;br /&gt;
* Do not run resource-intensive tasks like heavy I/O operations or simulations.&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the GPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
The Trillium GPU subcluster is designed for AI/ML and accelerated science workloads.  &lt;br /&gt;
It has specific rules and resource limits that differ from the CPU subcluster.&lt;br /&gt;
&lt;br /&gt;
Everything in the [[#Submitting_Jobs_on_the_CPU_Subcluster|Submitting Jobs on the CPU Subcluster]] section applies here, with the following GPU-specific rules:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Requirement !! Details&lt;br /&gt;
|-&lt;br /&gt;
| '''Allowed GPU counts''' || Jobs must request exactly 1 GPU or a multiple of 4 GPUs.&lt;br /&gt;
|-&lt;br /&gt;
| '''Single-GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=1&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Whole-node GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; (or &amp;lt;code&amp;gt;-p compute_full_node&amp;lt;/code&amp;gt;).&lt;br /&gt;
|-&lt;br /&gt;
| '''Multi-node GPU jobs''' || Must request full nodes: &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Memory limits''' || The &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; option is not allowed.  &lt;br /&gt;
Per GPU: 192 GB host memory.  &lt;br /&gt;
Whole-node jobs: 768 GB total.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Accessing the GPU Subcluster ==&lt;br /&gt;
&lt;br /&gt;
* From outside: &amp;lt;code&amp;gt;ssh trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt;  &lt;br /&gt;
* From another Trillium node: &amp;lt;code&amp;gt;ssh trig-login01&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Single-GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=single_gpu_job         # Job name&lt;br /&gt;
#SBATCH --output=single_gpu_job_%j.out    # Output file (%j = job ID)&lt;br /&gt;
#SBATCH --nodes=1                         # Request 1 node&lt;br /&gt;
#SBATCH --gpus-per-node=1                 # Request 1 GPU&lt;br /&gt;
#SBATCH --time=00:30:00                   # Max runtime (30 minutes)&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Check GPU allocation&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Run your workload&lt;br /&gt;
srun python my_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Whole-Node (4 GPUs) Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=whole_node_gpu_job&lt;br /&gt;
#SBATCH --output=whole_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --partition=compute_full_node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --gpus-per-node=4&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
srun python my_distributed_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Multi-Node GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=multi_node_gpu_job&lt;br /&gt;
#SBATCH --output=multi_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --nodes=2                        # Request 2 full nodes&lt;br /&gt;
#SBATCH --gpus-per-node=4                # 4 GPUs per node (full node)&lt;br /&gt;
#SBATCH --partition=compute_full_node    # Required for full-node jobs&lt;br /&gt;
#SBATCH --time=04:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
# Check all GPUs allocated&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Example: run a distributed training job with 8 GPUs (2 nodes × 4 GPUs)&lt;br /&gt;
srun python train_distributed.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Best Practices for GPU Jobs ==&lt;br /&gt;
&lt;br /&gt;
* Do not use &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; — memory is fixed per GPU (192 GB) or per node (768 GB).&lt;br /&gt;
* Always specify GPU counts and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; for whole-node or multi-node jobs.&lt;br /&gt;
* Load only the modules you need — see [[Using_modules]].&lt;br /&gt;
* Be explicit with software versions for reproducibility (e.g., &amp;lt;code&amp;gt;cuda/12.6&amp;lt;/code&amp;gt; rather than just &amp;lt;code&amp;gt;cuda&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Test on a single GPU before scaling to multiple GPUs or nodes.&lt;br /&gt;
* Monitor usage with &amp;lt;code&amp;gt;nvidia-smi&amp;lt;/code&amp;gt; to ensure GPUs are fully utilized.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6866</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6866"/>
		<updated>2025-08-13T18:20:26Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Removed duplicates&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;border: 2px solid #e6b800; background-color: #fff8e1; padding: 0.5em; font-weight: bold; text-align: center;&amp;quot;&amp;gt;&lt;br /&gt;
This Quickstart Guide is a &amp;lt;u&amp;gt;work in progress&amp;lt;/u&amp;gt;. Details may change as Trillium documentation is updated.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. [[#Submitting_Jobs_on_the_CPU_Subcluster|CPU Subcluster]]&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. [[#Submitting_Jobs_on_the_GPU_Subcluster|GPU Subcluster]]&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! Nodes !! Cores !! Available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the GPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
The Trillium GPU subcluster is designed for AI/ML and accelerated science workloads.  &lt;br /&gt;
It has specific rules and resource limits that differ from the CPU subcluster.&lt;br /&gt;
&lt;br /&gt;
Everything in the [[#Submitting_Jobs_on_the_CPU_Subcluster|Submitting Jobs on the CPU Subcluster]] section applies here, with the following GPU-specific rules:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Requirement !! Details&lt;br /&gt;
|-&lt;br /&gt;
| '''Allowed GPU counts''' || Jobs must request exactly 1 GPU or a multiple of 4 GPUs.&lt;br /&gt;
|-&lt;br /&gt;
| '''Single-GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=1&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Whole-node GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; (or &amp;lt;code&amp;gt;-p compute_full_node&amp;lt;/code&amp;gt;).&lt;br /&gt;
|-&lt;br /&gt;
| '''Multi-node GPU jobs''' || Must request full nodes: &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Memory limits''' || The &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; option is not allowed.  &lt;br /&gt;
Per GPU: 192 GB host memory.  &lt;br /&gt;
Whole-node jobs: 768 GB total.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Accessing the GPU Subcluster ==&lt;br /&gt;
&lt;br /&gt;
* From outside: &amp;lt;code&amp;gt;ssh trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt;  &lt;br /&gt;
* From another Trillium node: &amp;lt;code&amp;gt;ssh trig-login01&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Single-GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=single_gpu_job         # Job name&lt;br /&gt;
#SBATCH --output=single_gpu_job_%j.out    # Output file (%j = job ID)&lt;br /&gt;
#SBATCH --nodes=1                         # Request 1 node&lt;br /&gt;
#SBATCH --gpus-per-node=1                 # Request 1 GPU&lt;br /&gt;
#SBATCH --time=00:30:00                   # Max runtime (30 minutes)&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Check GPU allocation&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Run your workload&lt;br /&gt;
srun python my_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Whole-Node (4 GPUs) Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=whole_node_gpu_job&lt;br /&gt;
#SBATCH --output=whole_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --partition=compute_full_node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --gpus-per-node=4&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
srun python my_distributed_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Multi-Node GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=multi_node_gpu_job&lt;br /&gt;
#SBATCH --output=multi_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --nodes=2                        # Request 2 full nodes&lt;br /&gt;
#SBATCH --gpus-per-node=4                # 4 GPUs per node (full node)&lt;br /&gt;
#SBATCH --partition=compute_full_node    # Required for full-node jobs&lt;br /&gt;
#SBATCH --time=04:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
# Check all GPUs allocated&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Example: run a distributed training job with 8 GPUs (2 nodes × 4 GPUs)&lt;br /&gt;
srun python train_distributed.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Best Practices for GPU Jobs ==&lt;br /&gt;
&lt;br /&gt;
* Do not use &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; — memory is fixed per GPU (192 GB) or per node (768 GB).&lt;br /&gt;
* Always specify GPU counts and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; for whole-node or multi-node jobs.&lt;br /&gt;
* Load only the modules you need — see [[Using_modules]].&lt;br /&gt;
* Be explicit with software versions for reproducibility (e.g., &amp;lt;code&amp;gt;cuda/12.6&amp;lt;/code&amp;gt; rather than just &amp;lt;code&amp;gt;cuda&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Test on a single GPU before scaling to multiple GPUs or nodes.&lt;br /&gt;
* Monitor usage with &amp;lt;code&amp;gt;nvidia-smi&amp;lt;/code&amp;gt; to ensure GPUs are fully utilized.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6863</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6863"/>
		<updated>2025-08-13T10:47:40Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Added links to sections&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;border: 2px solid #e6b800; background-color: #fff8e1; padding: 0.5em; font-weight: bold; text-align: center;&amp;quot;&amp;gt;&lt;br /&gt;
This Quickstart Guide is a &amp;lt;u&amp;gt;work in progress&amp;lt;/u&amp;gt;. Details may change as Trillium documentation is updated.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. [[#Submitting_Jobs_on_the_CPU_Subcluster|CPU Subcluster]]&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. [[#Submitting_Jobs_on_the_GPU_Subcluster|GPU Subcluster]]&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the GPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
The Trillium GPU subcluster is designed for AI/ML and accelerated science workloads.  &lt;br /&gt;
It has specific rules and resource limits that differ from the CPU subcluster.&lt;br /&gt;
&lt;br /&gt;
Everything in the [[#Submitting_Jobs_on_the_CPU_Subcluster|Submitting Jobs on the CPU Subcluster]] section applies here, with the following GPU-specific rules:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Requirement !! Details&lt;br /&gt;
|-&lt;br /&gt;
| '''Allowed GPU counts''' || Jobs must request exactly 1 GPU or a multiple of 4 GPUs.&lt;br /&gt;
|-&lt;br /&gt;
| '''Single-GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=1&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Whole-node GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; (or &amp;lt;code&amp;gt;-p compute_full_node&amp;lt;/code&amp;gt;).&lt;br /&gt;
|-&lt;br /&gt;
| '''Multi-node GPU jobs''' || Must request full nodes: &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Memory limits''' || The &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; option is not allowed.  &lt;br /&gt;
Per GPU: 192 GB host memory.  &lt;br /&gt;
Whole-node jobs: 768 GB total.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Accessing the GPU Subcluster ==&lt;br /&gt;
&lt;br /&gt;
* From outside: &amp;lt;code&amp;gt;ssh trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt;  &lt;br /&gt;
* From another Trillium node: &amp;lt;code&amp;gt;ssh trig-login01&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Single-GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=single_gpu_job         # Job name&lt;br /&gt;
#SBATCH --output=single_gpu_job_%j.out    # Output file (%j = job ID)&lt;br /&gt;
#SBATCH --nodes=1                         # Request 1 node&lt;br /&gt;
#SBATCH --gpus-per-node=1                 # Request 1 GPU&lt;br /&gt;
#SBATCH --time=00:30:00                   # Max runtime (30 minutes)&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Check GPU allocation&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Run your workload&lt;br /&gt;
srun python my_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Whole-Node (4 GPUs) Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=whole_node_gpu_job&lt;br /&gt;
#SBATCH --output=whole_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --partition=compute_full_node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --gpus-per-node=4&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
srun python my_distributed_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Multi-Node GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=multi_node_gpu_job&lt;br /&gt;
#SBATCH --output=multi_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --nodes=2                        # Request 2 full nodes&lt;br /&gt;
#SBATCH --gpus-per-node=4                # 4 GPUs per node (full node)&lt;br /&gt;
#SBATCH --partition=compute_full_node    # Required for full-node jobs&lt;br /&gt;
#SBATCH --time=04:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
# Check all GPUs allocated&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Example: run a distributed training job with 8 GPUs (2 nodes × 4 GPUs)&lt;br /&gt;
srun python train_distributed.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Best Practices for GPU Jobs ==&lt;br /&gt;
&lt;br /&gt;
* Do not use &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; — memory is fixed per GPU (192 GB) or per node (768 GB).&lt;br /&gt;
* Always specify GPU counts and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; for whole-node or multi-node jobs.&lt;br /&gt;
* Load only the modules you need — see [[Using_modules]].&lt;br /&gt;
* Be explicit with software versions for reproducibility (e.g., &amp;lt;code&amp;gt;cuda/12.6&amp;lt;/code&amp;gt; rather than just &amp;lt;code&amp;gt;cuda&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Test on a single GPU before scaling to multiple GPUs or nodes.&lt;br /&gt;
* Monitor usage with &amp;lt;code&amp;gt;nvidia-smi&amp;lt;/code&amp;gt; to ensure GPUs are fully utilized.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6860</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6860"/>
		<updated>2025-08-13T10:46:03Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Added more examples. Added best practices for GPU jobs.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;border: 2px solid #e6b800; background-color: #fff8e1; padding: 0.5em; font-weight: bold; text-align: center;&amp;quot;&amp;gt;&lt;br /&gt;
This Quickstart Guide is a &amp;lt;u&amp;gt;work in progress&amp;lt;/u&amp;gt;. Details may change as Trillium documentation is updated.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the GPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
The Trillium GPU subcluster is designed for AI/ML and accelerated science workloads.  &lt;br /&gt;
It has specific rules and resource limits that differ from the CPU subcluster.&lt;br /&gt;
&lt;br /&gt;
Everything in the [[#Submitting_Jobs_on_the_CPU_Subcluster|Submitting Jobs on the CPU Subcluster]] section applies here, with the following GPU-specific rules:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Requirement !! Details&lt;br /&gt;
|-&lt;br /&gt;
| '''Allowed GPU counts''' || Jobs must request exactly 1 GPU or a multiple of 4 GPUs.&lt;br /&gt;
|-&lt;br /&gt;
| '''Single-GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=1&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Whole-node GPU jobs''' || Use &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; (or &amp;lt;code&amp;gt;-p compute_full_node&amp;lt;/code&amp;gt;).&lt;br /&gt;
|-&lt;br /&gt;
| '''Multi-node GPU jobs''' || Must request full nodes: &amp;lt;code&amp;gt;--gpus-per-node=4&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt;.&lt;br /&gt;
|-&lt;br /&gt;
| '''Memory limits''' || The &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; option is not allowed.  &lt;br /&gt;
Per GPU: 192 GB host memory.  &lt;br /&gt;
Whole-node jobs: 768 GB total.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Accessing the GPU Subcluster ==&lt;br /&gt;
&lt;br /&gt;
* From outside: &amp;lt;code&amp;gt;ssh trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt;  &lt;br /&gt;
* From another Trillium node: &amp;lt;code&amp;gt;ssh trig-login01&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Single-GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=single_gpu_job         # Job name&lt;br /&gt;
#SBATCH --output=single_gpu_job_%j.out    # Output file (%j = job ID)&lt;br /&gt;
#SBATCH --nodes=1                         # Request 1 node&lt;br /&gt;
#SBATCH --gpus-per-node=1                 # Request 1 GPU&lt;br /&gt;
#SBATCH --time=00:30:00                   # Max runtime (30 minutes)&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Check GPU allocation&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Run your workload&lt;br /&gt;
srun python my_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Whole-Node (4 GPUs) Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=whole_node_gpu_job&lt;br /&gt;
#SBATCH --output=whole_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --partition=compute_full_node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --gpus-per-node=4&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
srun python my_distributed_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example: Multi-Node GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=multi_node_gpu_job&lt;br /&gt;
#SBATCH --output=multi_node_gpu_job_%j.out&lt;br /&gt;
#SBATCH --nodes=2                        # Request 2 full nodes&lt;br /&gt;
#SBATCH --gpus-per-node=4                # 4 GPUs per node (full node)&lt;br /&gt;
#SBATCH --partition=compute_full_node    # Required for full-node jobs&lt;br /&gt;
#SBATCH --time=04:00:00&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
# Check all GPUs allocated&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Example: run a distributed training job with 8 GPUs (2 nodes × 4 GPUs)&lt;br /&gt;
srun python train_distributed.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Best Practices for GPU Jobs ==&lt;br /&gt;
&lt;br /&gt;
* Do not use &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; — memory is fixed per GPU (192 GB) or per node (768 GB).&lt;br /&gt;
* Always specify GPU counts and &amp;lt;code&amp;gt;--partition=compute_full_node&amp;lt;/code&amp;gt; for whole-node or multi-node jobs.&lt;br /&gt;
* Load only the modules you need — see [[Using_modules]].&lt;br /&gt;
* Be explicit with software versions for reproducibility (e.g., &amp;lt;code&amp;gt;cuda/12.6&amp;lt;/code&amp;gt; rather than just &amp;lt;code&amp;gt;cuda&amp;lt;/code&amp;gt;).&lt;br /&gt;
* Test on a single GPU before scaling to multiple GPUs or nodes.&lt;br /&gt;
* Monitor usage with &amp;lt;code&amp;gt;nvidia-smi&amp;lt;/code&amp;gt; to ensure GPUs are fully utilized.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6857</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6857"/>
		<updated>2025-08-13T10:22:20Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Added sample job script for a single GPU job&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;border: 2px solid #e6b800; background-color: #fff8e1; padding: 0.5em; font-weight: bold; text-align: center;&amp;quot;&amp;gt;&lt;br /&gt;
This Quickstart Guide is a &amp;lt;u&amp;gt;work in progress&amp;lt;/u&amp;gt;. Details may change as Trillium documentation is updated.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the GPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
The Trillium GPU subcluster is a high-performance computing resource optimized for AI/ML and accelerated science workloads. &lt;br /&gt;
&lt;br /&gt;
Everything mentioned in [[#Submitting_Jobs_on_the_CPU_Subcluster|Submitting Jobs on the CPU Subcluster]] section applies with the addition of GPUs.&lt;br /&gt;
&lt;br /&gt;
Access the subcluster via SSH:&lt;br /&gt;
* External: &amp;lt;code&amp;gt;ssh trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt;&lt;br /&gt;
* From other Trillium nodes: &amp;lt;code&amp;gt;ssh trig-login01&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example Single-GPU Job ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source pre=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=single_gpu_job    # Job name&lt;br /&gt;
#SBATCH --output=single_gpu_job_%j.out  # Output file (%j is job ID)&lt;br /&gt;
#SBATCH --nodes=1                    # Request 1 node&lt;br /&gt;
#SBATCH --gpus-per-node=1            # Request 1 GPU&lt;br /&gt;
#SBATCH --time=00:30:00              # Max runtime (30 minutes)&lt;br /&gt;
&lt;br /&gt;
# Load modules&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load cuda/12.6&lt;br /&gt;
module load python/3.11.5&lt;br /&gt;
&lt;br /&gt;
# Activate Python environment (if applicable)&lt;br /&gt;
source ~/myenv/bin/activate&lt;br /&gt;
&lt;br /&gt;
# Verify GPU allocation&lt;br /&gt;
srun nvidia-smi&lt;br /&gt;
&lt;br /&gt;
# Run a Python script using 1 GPU&lt;br /&gt;
srun python my_script.py&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6854</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6854"/>
		<updated>2025-08-13T10:18:47Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;border: 2px solid #e6b800; background-color: #fff8e1; padding: 0.5em; font-weight: bold; text-align: center;&amp;quot;&amp;gt;&lt;br /&gt;
This Quickstart Guide is a &amp;lt;u&amp;gt;work in progress&amp;lt;/u&amp;gt;. Details may change as Trillium documentation is updated.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the GPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
The Trillium GPU subcluster is a high-performance computing resource optimized for AI/ML and accelerated science workloads. &lt;br /&gt;
&lt;br /&gt;
Everything mentioned in [[#Submitting_Jobs_on_the_CPU_Subcluster|Submitting Jobs on the CPU Subcluster]] section applies with the addition of GPUs.&lt;br /&gt;
&lt;br /&gt;
Access the subcluster via SSH:&lt;br /&gt;
* External: &amp;lt;code&amp;gt;ssh trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt;&lt;br /&gt;
* From other Trillium nodes: &amp;lt;code&amp;gt;ssh trig-login01&amp;lt;/code&amp;gt;&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6851</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6851"/>
		<updated>2025-08-08T02:02:39Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Added WIP warning at the top&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div style=&amp;quot;border: 2px solid #e6b800; background-color: #fff8e1; padding: 0.5em; font-weight: bold; text-align: center;&amp;quot;&amp;gt;&lt;br /&gt;
This Quickstart Guide is a &amp;lt;u&amp;gt;work in progress&amp;lt;/u&amp;gt;. Details may change as Trillium documentation is updated.&lt;br /&gt;
&amp;lt;/div&amp;gt;&lt;br /&gt;
&amp;lt;/br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6848</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6848"/>
		<updated>2025-08-08T01:57:50Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6845</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6845"/>
		<updated>2025-08-08T01:57:13Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Logging in */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Your storage locations ==&lt;br /&gt;
&lt;br /&gt;
On Trillium, every user has several types of storage space available. These locations each serve different purposes, and the exact paths depend on your username and group. For convenience and portability, each location is also available through a corresponding environment variable.&lt;br /&gt;
&lt;br /&gt;
=== Home and Scratch ===&lt;br /&gt;
&lt;br /&gt;
You have a home directory and a scratch directory. Their locations are stored in the environment variables &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
On Trillium, the paths follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $HOME=/home/username&lt;br /&gt;
 $SCRATCH=/scratch/username&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
  tri-login01:~$ pwd&lt;br /&gt;
  /home/yourusername&lt;br /&gt;
  tri-login01:~$ cd $SCRATCH&lt;br /&gt;
  tri-login01:scratch$ pwd&lt;br /&gt;
  /scratch/yourusername&lt;br /&gt;
&lt;br /&gt;
'''NOTE: The home directory is read-only on compute nodes.'''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
=== Project and Archive (Nearline) ===&lt;br /&gt;
&lt;br /&gt;
All users on Trillium have access to a project directory, with paths stored in the environment variable &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt;.  &lt;br /&gt;
Some groups may also have an archive (a.k.a. &amp;quot;nearline&amp;quot;) directory, stored in the environment variable &amp;lt;code&amp;gt;$ARCHIVE&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
These follow the naming convention:&lt;br /&gt;
&lt;br /&gt;
 $PROJECT=/project/groupname/username&lt;br /&gt;
 $ARCHIVE=/archive/groupname/username&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Archive storage is currently available only via [[HPSS]] and cannot be accessed from Trillium login, compute, or data mover nodes.&lt;br /&gt;
&lt;br /&gt;
'''''IMPORTANT: Future-proof your scripts'''''  &lt;br /&gt;
Always use the environment variables (&amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$PROJECT&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;$ARCHIVE&amp;lt;/tt&amp;gt;) in scripts instead of hardcoding the paths. The actual directory paths may change in the future.&lt;br /&gt;
&lt;br /&gt;
=== Storage and quotas ===&lt;br /&gt;
&lt;br /&gt;
Please review the [[Data_Management#Purpose_of_each_file_system | various file systems]], their intended uses, and their policies. The table below summarizes the key details. See the [[Data_Management | Data Management]] page for more information.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB / 250,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB / 6,000,000 files per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;2&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;2&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50–500 TB per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|[[Data_Management#Quotas_and_purging | depending on group size]]&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 1 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group (nearline) allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6842</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6842"/>
		<updated>2025-08-07T21:36:16Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Submitting Jobs on the CPU Subcluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca, trillium-gpu.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
* To access, SSH into &amp;lt;code&amp;gt;trillium-gpu.scinet.utoronto.ca&amp;lt;/code&amp;gt; from outside, or to &amp;lt;code&amp;gt;trig-login01&amp;lt;/code&amp;gt; from other Trillium nodes.&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Users typically have both &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;rrg&amp;lt;/code&amp;gt; accounts. Jobs will run under your group's RRG allocation, or if one is not available, under a RAS allocation (previously called the &amp;quot;default&amp;quot; allocation). Unless you explicitly specify the account using the &amp;lt;code&amp;gt;--account=ACCOUNT_NAME&amp;lt;/code&amp;gt; option in your job script or submission command, your job will most likely be charged to the &amp;lt;code&amp;gt;def&amp;lt;/code&amp;gt; account.&lt;br /&gt;
&lt;br /&gt;
If you want your job to use the RRG allocation, be sure to specify it explicitly.&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6833</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6833"/>
		<updated>2025-08-07T21:16:28Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if one is not available, under a RAS allocation (previously called the “default” allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6827</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6827"/>
		<updated>2025-08-07T20:55:33Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if one is not available, under a RAS allocation (previously called the “default” allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--cpus-per-task=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--cpus-per-task=384&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring Queued and Running Jobs ==&lt;br /&gt;
&lt;br /&gt;
Once your job is submitted to the queue, you can monitor its status and performance using the following SLURM commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; shows all jobs in the queue. Use &amp;lt;code&amp;gt;squeue -u $USER&amp;lt;/code&amp;gt; to view only your jobs.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;!-- &amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sqc&amp;lt;/code&amp;gt; is a SciNet-specific, faster version of &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; that shows a cached snapshot of the queue.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt; --&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue -j JOBID&amp;lt;/code&amp;gt; shows the current status of a specific job. Alternatively, use &amp;lt;code&amp;gt;scontrol show job JOBID&amp;lt;/code&amp;gt; for detailed information, including allocated nodes, resources, and job flags.&amp;lt;/p&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;squeue --start -j JOBID&amp;lt;/code&amp;gt; gives a rough estimate of when a pending job is expected to start. Note that this estimate is often inaccurate and can change depending on system load and priorities.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;scancel JOBID&amp;lt;/code&amp;gt; cancels a job you submitted.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;jobperf JOBID&amp;lt;/code&amp;gt; gives a live snapshot of the CPU and memory usage of your job while it is running.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;sacct&amp;lt;/code&amp;gt; shows information about your past jobs, including start time, run time, node usage, and exit status.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
More details on monitoring jobs can be found on the [[Slurm#Monitoring_jobs | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You can also view and manage your current and past jobs, resource usage, and allocation history through the [https://my.scinet.utoronto.ca my.SciNet] portal.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6824</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6824"/>
		<updated>2025-08-07T20:48:38Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Undo revision 6821 by Afedosee (talk)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if one is not available, under a RAS allocation (previously called the “default” allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--cpus-per-task=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--cpus-per-task=384&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6821</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6821"/>
		<updated>2025-08-07T20:46:48Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Example submission script (MPI) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if one is not available, under a RAS allocation (previously called the “default” allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command.&amp;lt;/li&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--cpus-per-task=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--cpus-per-task=384&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6818</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6818"/>
		<updated>2025-08-07T20:44:44Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if one is not available, under a RAS allocation (previously called the “default” allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command&lt;br /&gt;
&lt;br /&gt;
== Example submission script (OpenMP) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=openmp_job&lt;br /&gt;
#SBATCH --output=openmp_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK&lt;br /&gt;
&lt;br /&gt;
./openmp_example&lt;br /&gt;
# or &amp;quot;srun ./openmp_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch openmp_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a Bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; are directives for SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;openmp_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for one node with 192 CPUs for a single task running up to 192 OpenMP threads, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once such a node is allocated, it runs the script:&lt;br /&gt;
  &amp;lt;ul&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Changes to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Loads the required modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Sets &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt; based on SLURM’s CPU allocation;&amp;lt;/li&amp;gt;&lt;br /&gt;
    &amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;openmp_example&amp;lt;/code&amp;gt; application.&amp;lt;/li&amp;gt;&lt;br /&gt;
  &amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--cpus-per-task=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--cpus-per-task=384&amp;lt;/code&amp;gt;.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6815</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6815"/>
		<updated>2025-08-07T20:39:38Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Example submission script (MPI) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if one is not available, under a RAS allocation (previously called the “default” allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6812</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6812"/>
		<updated>2025-08-07T20:33:45Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Scheduling by Node */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if one is not available, under a RAS allocation (previously called the “default” allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
** no other user can run jobs on them;&lt;br /&gt;
** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your scratch directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6809</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6809"/>
		<updated>2025-08-07T20:33:27Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Scheduling by Node */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if one is not available, under a RAS allocation (previously called the “default” allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them:&lt;br /&gt;
  ** no other user can run jobs on them;&lt;br /&gt;
  ** you can [[SSH]] into your nodes during execution to monitor progress.&lt;br /&gt;
* Even if your job does not use all 192 cores, you still get the '''full''' node. Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests are ignored. Your job receives &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If running serial or low-core jobs you must still use all 192 cores on the node by bundling multiple independent tasks in one job script. See [[Running_Serial_Jobs_on_Niagara|this page]] for examples.&lt;br /&gt;
* If your job underutilizes the cores, our support team may reach out to assist you in optimizing your workflow, or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your scratch directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6797</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6797"/>
		<updated>2025-08-07T20:15:20Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Submitting Jobs on the CPU Subcluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if one is not available, under a RAS allocation (previously called the “default” allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
=== Key Points to Remember ===&lt;br /&gt;
&lt;br /&gt;
* Scheduling is by node, not by core or CPU.&lt;br /&gt;
* Each node has 192 cores and 768 GB of memory.&lt;br /&gt;
* Jobs are limited to a maximum of 24 hours walltime.&lt;br /&gt;
* Output must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;. &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;$PROJECT&amp;lt;/code&amp;gt; are read-only on compute nodes.&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script must load all necessary modules explicitly using &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt;.&lt;br /&gt;
* Ensure [[Data_Management#Moving_data|your input data is on Trillium]] before submitting jobs.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.&lt;br /&gt;
** No other users are running anything on them.&lt;br /&gt;
** You can [[SSH]] into them to see how things are going.&lt;br /&gt;
* Even if your resource request does not require a full node, the scheduler will still allocate one or more entire nodes to your job, as Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests to the scheduler are of no use. Your job always gets &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If you run serial jobs you must still use all 192 cores on the node. Visit the [[Running_Serial_Jobs_on_Trillium | serial jobs]] page for examples of how to do this.&lt;br /&gt;
* Since there are 192 cores per node, your job should use &amp;lt;code&amp;gt;N × 192&amp;lt;/code&amp;gt; cores. If you do not, we will contact you to help you optimize your workflow—or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your scratch directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6794</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6794"/>
		<updated>2025-08-07T20:11:13Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Submitting Jobs on the CPU Subcluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. These jobs will run on Trillium's compute nodes, and their execution is managed by the SLURM scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
To submit a job, use the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command on a login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This places your job into the queue. It will begin execution on available compute nodes when scheduled. Note: jobs must be submitted from a login node, submitting from datamover nodes is not allowed.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should submit jobs from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, not &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt;, since the home directory is read-only on compute nodes. Output from your jobs must be written to &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if one is not available, under a RAS allocation (previously called the “default” allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts are shown below.&lt;br /&gt;
&lt;br /&gt;
Keep in mind:&lt;br /&gt;
* Scheduling is by node, so in multiples of 192 cores.&lt;br /&gt;
* Your job's maximum walltime is 24 hours.&lt;br /&gt;
* Jobs must write their output to your scratch (&amp;lt;code&amp;gt;home&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; are read-only on compute nodes).&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script will not remember the modules you have loaded, so it needs to contain &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt; commands for all required modules (see examples below).&lt;br /&gt;
* [[Data_Management#Moving_data | Move your data]] to Trillium before you submit your job.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.&lt;br /&gt;
** No other users are running anything on them.&lt;br /&gt;
** You can [[SSH]] into them to see how things are going.&lt;br /&gt;
* Even if your resource request does not require a full node, the scheduler will still allocate one or more entire nodes to your job, as Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests to the scheduler are of no use. Your job always gets &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If you run serial jobs you must still use all 192 cores on the node. Visit the [[Running_Serial_Jobs_on_Trillium | serial jobs]] page for examples of how to do this.&lt;br /&gt;
* Since there are 192 cores per node, your job should use &amp;lt;code&amp;gt;N × 192&amp;lt;/code&amp;gt; cores. If you do not, we will contact you to help you optimize your workflow—or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your scratch directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6791</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6791"/>
		<updated>2025-08-07T19:58:41Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Submitting jobs on CPU Subcluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Trillium's compute nodes. When and where your job runs is determined by the scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You submit jobs from a login node by passing a script to the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This puts the job in the queue. It will run on the compute nodes in due course. Note that you must submit your job from a login node. You cannot submit jobs from the datamover nodes.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should not submit from your &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; directory, but rather from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, so that the output of your compute job can be written out (as mentioned above, &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; is read-only on the compute nodes).&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if your group has none, under a RAS allocation (previously called &amp;quot;default&amp;quot; allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts can be found below.&lt;br /&gt;
&lt;br /&gt;
Keep in mind:&lt;br /&gt;
* Scheduling is by node, so in multiples of 192 cores.&lt;br /&gt;
* Your job's maximum walltime is 24 hours.&lt;br /&gt;
* Jobs must write their output to your scratch (&amp;lt;code&amp;gt;home&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; are read-only on compute nodes).&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script will not remember the modules you have loaded, so it needs to contain &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt; commands for all required modules (see examples below).&lt;br /&gt;
* [[Data_Management#Moving_data | Move your data]] to Trillium before you submit your job.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.&lt;br /&gt;
** No other users are running anything on them.&lt;br /&gt;
** You can [[SSH]] into them to see how things are going.&lt;br /&gt;
* Even if your resource request does not require a full node, the scheduler will still allocate one or more entire nodes to your job, as Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests to the scheduler are of no use. Your job always gets &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If you run serial jobs you must still use all 192 cores on the node. Visit the [[Running_Serial_Jobs_on_Trillium | serial jobs]] page for examples of how to do this.&lt;br /&gt;
* Since there are 192 cores per node, your job should use &amp;lt;code&amp;gt;N × 192&amp;lt;/code&amp;gt; cores. If you do not, we will contact you to help you optimize your workflow—or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your scratch directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6788</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6788"/>
		<updated>2025-08-07T19:57:40Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;br /&gt;
&lt;br /&gt;
= Submitting jobs on CPU Subcluster =&lt;br /&gt;
&lt;br /&gt;
Once you have compiled and tested your code or workflow on the Trillium login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Trillium's compute nodes. When and where your job runs is determined by the scheduler.&lt;br /&gt;
&lt;br /&gt;
Trillium uses SLURM as its job scheduler. More advanced details of how to interact with the scheduler can be found on the [[Slurm | Slurm page]].&lt;br /&gt;
&lt;br /&gt;
You submit jobs from a login node by passing a script to the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch jobscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This puts the job in the queue. It will run on the compute nodes in due course. Note that you must submit your job from a login node. You cannot submit jobs from the datamover nodes.&lt;br /&gt;
&lt;br /&gt;
In most cases, you should not submit from your &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; directory, but rather from your &amp;lt;code&amp;gt;$SCRATCH&amp;lt;/code&amp;gt; directory, so that the output of your compute job can be written out (as mentioned above, &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; is read-only on the compute nodes).&lt;br /&gt;
&lt;br /&gt;
Jobs will run under your group's RRG allocation, or, if your group has none, under a RAS allocation (previously called &amp;quot;default&amp;quot; allocation).&lt;br /&gt;
&lt;br /&gt;
Some example job scripts can be found below.&lt;br /&gt;
&lt;br /&gt;
Keep in mind:&lt;br /&gt;
* Scheduling is by node, so in multiples of 192 cores.&lt;br /&gt;
* Your job's maximum walltime is 24 hours.&lt;br /&gt;
* Jobs must write their output to your scratch (&amp;lt;code&amp;gt;home&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; are read-only on compute nodes).&lt;br /&gt;
* Compute nodes have no internet access.&lt;br /&gt;
* Your job script will not remember the modules you have loaded, so it needs to contain &amp;lt;code&amp;gt;module load&amp;lt;/code&amp;gt; commands for all required modules (see examples below).&lt;br /&gt;
* [[Data_Management#Moving_data | Move your data]] to Trillium before you submit your job.&lt;br /&gt;
&lt;br /&gt;
== Scheduling by Node ==&lt;br /&gt;
&lt;br /&gt;
On many systems that use SLURM, the scheduler will deduce from the specifications of the number of tasks and the number of CPUs per node what resources should be allocated. On Trillium, things are a bit different.&lt;br /&gt;
&lt;br /&gt;
* All job resource requests on Trillium are scheduled as a multiple of '''nodes'''.&lt;br /&gt;
* The nodes that your jobs run on are exclusively yours, for as long as the job is running on them.&lt;br /&gt;
** No other users are running anything on them.&lt;br /&gt;
** You can [[SSH]] into them to see how things are going.&lt;br /&gt;
* Even if your resource request does not require a full node, the scheduler will still allocate one or more entire nodes to your job, as Trillium does not share nodes between users.&lt;br /&gt;
* Memory requests to the scheduler are of no use. Your job always gets &amp;lt;code&amp;gt;N × 768GB &amp;lt;/code&amp;gt; of RAM, where &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes and 768GB is the amount of memory on each node.&lt;br /&gt;
* If you run serial jobs you must still use all 192 cores on the node. Visit the [[Running_Serial_Jobs_on_Trillium | serial jobs]] page for examples of how to do this.&lt;br /&gt;
* Since there are 192 cores per node, your job should use &amp;lt;code&amp;gt;N × 192&amp;lt;/code&amp;gt; cores. If you do not, we will contact you to help you optimize your workflow—or you can [mailto:support@scinet.utoronto.ca contact us] to get assistance.&lt;br /&gt;
&lt;br /&gt;
== Limits ==&lt;br /&gt;
&lt;br /&gt;
There are limits to the size and duration of your jobs, the number of jobs you can run, and the number of jobs you can have queued. It matters whether a user is part of a group with a [https://www.alliancecan.ca/research-portal/accessing-resources/resource-allocation-competitions/ Resources for Research Group allocation] or not. It also matters in which &amp;quot;partition&amp;quot; the job runs. &amp;quot;Partitions&amp;quot; are SLURM-speak for use cases. You specify the partition with the &amp;lt;code&amp;gt;-p&amp;lt;/code&amp;gt; parameter to &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;, but if you do not specify one, your job will run in the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition, which is the most common case.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Usage&lt;br /&gt;
!Partition&lt;br /&gt;
!Limit on Running jobs&lt;br /&gt;
!Limit on Submitted jobs (incl. running)&lt;br /&gt;
!Min. size of jobs&lt;br /&gt;
!Max. size of jobs&lt;br /&gt;
!Min. walltime&lt;br /&gt;
!Max. walltime &lt;br /&gt;
|-&lt;br /&gt;
|Compute jobs ||compute || 50 || 1000 || 1 node (192&amp;amp;nbsp;cores) || default:&amp;amp;nbsp;20&amp;amp;nbsp;nodes&amp;amp;nbsp;(3840&amp;amp;nbsp;cores) &amp;lt;br&amp;gt; with&amp;amp;nbsp;allocation:&amp;amp;nbsp;1000&amp;amp;nbsp;nodes&amp;amp;nbsp;(192000&amp;amp;nbsp;cores)|| 15 minutes || 24 hours&lt;br /&gt;
|-&lt;br /&gt;
|Testing or troubleshooting || debug || 1 || 1 || 1 node (192&amp;amp;nbsp;cores) || 4 nodes (768 cores)|| N/A || 1 hour&lt;br /&gt;
|-&lt;br /&gt;
|Archiving or retrieving data in [[HPSS]]|| archivelong || 2 per user (5 in total) || 10 per user || N/A || N/A|| 15 minutes || 72 hours&lt;br /&gt;
|-&lt;br /&gt;
|Inspecting archived data, small archival actions in [[HPSS]] || archiveshort vfsshort || 2 per user|| 10 per user || N/A || N/A || 15 minutes || 1 hour&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Even if you respect these limits, your jobs will still have to wait in the queue. The waiting time depends on many factors such as your group's allocation amount, how much allocation has been used in the recent past, the number of requested nodes and walltime, and how many other jobs are waiting in the queue.&lt;br /&gt;
&lt;br /&gt;
== Example submission script (MPI) ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=192&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
#SBATCH --output=mpi_output_%j.txt&lt;br /&gt;
#SBATCH --mail-type=FAIL&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load StdEnv/2023&lt;br /&gt;
module load gcc/12.3&lt;br /&gt;
module load openmpi/4.1.5&lt;br /&gt;
&lt;br /&gt;
mpirun ./mpi_example&lt;br /&gt;
# or &amp;quot;srun ./mpi_example&amp;quot;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit this script from your scratch directory with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:scratch$ sbatch mpi_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;First line indicates that this is a bash script.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Lines starting with &amp;lt;code&amp;gt;#SBATCH&amp;lt;/code&amp;gt; go to SLURM.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; reads these lines as a job request (which it gives the name &amp;lt;code&amp;gt;mpi_job&amp;lt;/code&amp;gt;).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;In this case, SLURM looks for 2 nodes each running 192 tasks, for 1 hour.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Note that the mpifun flag &amp;lt;code&amp;gt;--ppn&amp;lt;/code&amp;gt; (processors per node) is ignored.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Once it finds such a node, it runs the script:&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Change to the submission directory;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Loads modules;&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Runs the &amp;lt;code&amp;gt;mpi_example&amp;lt;/code&amp;gt; application (SLURM will inform &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;srun&amp;lt;/code&amp;gt; how many processes to run).&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;/ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;To use hyperthreading, change &amp;lt;code&amp;gt;--ntasks-per-node=192&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;--ntasks-per-node=384&amp;lt;/code&amp;gt;, and add &amp;lt;code&amp;gt;--bind-to none&amp;lt;/code&amp;gt; to the &amp;lt;code&amp;gt;mpirun&amp;lt;/code&amp;gt; command&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6785</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6785"/>
		<updated>2025-08-07T19:25:38Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Trillium.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Subcluster&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Subcluster&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This subcluster is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you will need to request access on the [https://ccdb.alliancecan.ca/me/access_systems Access Systems] page on the CCDB site. After clicking the &amp;quot;I request access&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
Note: We plan to add browser access to Trillium via Open OnDemand in the future. In the meantime you can still access our existing Open OnDemand deployment by following the instructions in our [https://docs.scinet.utoronto.ca/index.php/Open_OnDemand_Quickstart quickstart guide].&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the '''environment modules''' system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., &amp;lt;code&amp;gt;PATH&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;LD_LIBRARY_PATH&amp;lt;/code&amp;gt;) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;br /&gt;
&lt;br /&gt;
= Technical Details =&lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
The VAST high-performance file system is comprised of a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Testing and Debugging =&lt;br /&gt;
&lt;br /&gt;
Before submitting your job to the cluster, it's important to test your code to ensure correctness and determine the resources it requires.&lt;br /&gt;
&lt;br /&gt;
* '''Lightweight tests''' can be run directly on the login nodes. As a rule of thumb, these should:&lt;br /&gt;
** Run in under a few minutes  &lt;br /&gt;
** Use no more than 1–2 GB of memory  &lt;br /&gt;
** Use only 1–2 CPU cores&lt;br /&gt;
&lt;br /&gt;
* You can also run the [[Parallel Debugging with DDT|DDT]] debugger on the login nodes after loading it with: &amp;lt;code&amp;gt;module load ddt-cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* For short tests that exceed login node limits or require dedicated resources, request an interactive debug job using the &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ debugjob --clean N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; with the number of nodes (1 to 4). If &amp;lt;code&amp;gt;N=1&amp;lt;/code&amp;gt;, you will get 1 hour of interactive time; with &amp;lt;code&amp;gt;N=4&amp;lt;/code&amp;gt; (the maximum), you will get 22 minutes.  &lt;br /&gt;
The &amp;lt;code&amp;gt;--clean&amp;lt;/code&amp;gt; flag is optional but recommended, as it starts the session with no modules loaded, better mimicking the clean environment of batch jobs.&lt;br /&gt;
&lt;br /&gt;
* If your test job requires more time than allowed by &amp;lt;code&amp;gt;debugjob&amp;lt;/code&amp;gt;, you can request an interactive session from the regular queue using &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
tri-login01:~$ salloc --nodes=N --time=M:00:00 --x11&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;N&amp;lt;/code&amp;gt; is the number of nodes  &lt;br /&gt;
* &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt; is the number of hours the job should run  &lt;br /&gt;
* &amp;lt;code&amp;gt;--x11&amp;lt;/code&amp;gt; is required for graphical applications (e.g., when using [[Parallel Debugging with DDT|DDT]] or DDD)&lt;br /&gt;
&lt;br /&gt;
'''Note:''' Jobs submitted with &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; may take longer to start, as they are scheduled like any other batch job. See the [[Testing_With_Graphics|Testing with graphics]] page for more information on graphical testing options.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6749</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6749"/>
		<updated>2025-08-05T15:00:10Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
= System Overview =&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Partition&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Partition&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This partition is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
== Cooling and Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
== Specifications ==&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
All three share a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
== Backup and Archive Storage ==&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you may need to request access on the [https://ccdb.alliancecan.ca/services/opt_in opt-in page on the CCDB site]. After clicking the &amp;quot;Join&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the *environment modules* system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., `PATH`, `LD_LIBRARY_PATH`) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Using Commercial Software ==&lt;br /&gt;
&lt;br /&gt;
You may be able to use commercial software on Trillium, but there are a few important considerations:&lt;br /&gt;
&lt;br /&gt;
* Bring your own license. You can use commercial software on Trillium if you have a valid license. If the software requires a license server, you can connect to it securely using [[SSH_Tunneling | SSH tunneling]].&lt;br /&gt;
&lt;br /&gt;
* SciNet and the {{Alliance}} do not provide user-specific licenses. Due to the large and diverse user base, we cannot provide licenses for individual or specialized commercial packages.&lt;br /&gt;
&lt;br /&gt;
* Freely available commercial tools. Some widely useful commercial tools are available system-wide, such as compilers, math libraries, debuggers.&lt;br /&gt;
&lt;br /&gt;
* Software not available (unless you bring your own license): tools like [[MATLAB]], Gaussian, and IDL are not provided centrally. If you have your own license, you are welcome to install and use them.&lt;br /&gt;
&lt;br /&gt;
* Open-source alternatives are available. Consider using freely available tools such as [[Python]], [[R]], and Octave, which are well-supported and widely used on the system.&lt;br /&gt;
&lt;br /&gt;
* We're here to help. If you have a valid license and need help installing commercial software, feel free to contact us, we'll assist where possible.&lt;br /&gt;
&lt;br /&gt;
A list of commercial software currently installed on Trillium (for which you must supply a license to use) is available on the [[Commercial_software | Commercial Software page]].&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6746</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6746"/>
		<updated>2025-08-05T14:49:54Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
== System Overview ==&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Partition&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Partition&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This partition is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
=== Cooling and Energy Efficiency ===&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
=== Specifications ===&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you may need to request access on the [https://ccdb.alliancecan.ca/services/opt_in opt-in page on the CCDB site]. After clicking the &amp;quot;Join&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the *environment modules* system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., `PATH`, `LD_LIBRARY_PATH`) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Tips for Loading Software ==&lt;br /&gt;
&lt;br /&gt;
Properly managing your software environment is key to avoiding conflicts and ensuring reproducibility. Here are some best practices:&lt;br /&gt;
&lt;br /&gt;
* Avoid loading modules in your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; file. Doing so can cause unexpected behavior, particularly in non-interactive environments like batch jobs or remote shells. For more information, see our [[bashrc guidelines|.bashrc guidelines]].&lt;br /&gt;
&lt;br /&gt;
* Instead, load modules manually or from a separate script. This approach gives you more control and helps keep environments clean.&lt;br /&gt;
&lt;br /&gt;
* Load required modules inside your job submission script. This ensures that your job runs with the expected software environment, regardless of your interactive shell settings.&lt;br /&gt;
&lt;br /&gt;
* Be explicit about module versions. Short names like &amp;lt;code&amp;gt;gcc&amp;lt;/code&amp;gt; will load the system default (e.g., &amp;lt;code&amp;gt;gcc/12.3&amp;lt;/code&amp;gt;), which may change in the future. Specify full versions (e.g., &amp;lt;code&amp;gt;gcc/13.3&amp;lt;/code&amp;gt;) for long-term reproducibility.&lt;br /&gt;
&lt;br /&gt;
* Resolve dependencies with &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;. Some modules depend on others. Use &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; to discover which modules are required and how to load them in the correct order. For more, see [[Using_modules#Module_spider | Using &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt;]].&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
All three share a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
=== Backup and Archive Storage ===&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6743</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6743"/>
		<updated>2025-08-05T14:33:55Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
== System Overview ==&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Partition&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Partition&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This partition is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
=== Cooling and Energy Efficiency ===&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
=== Specifications ===&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you may need to request access on the [https://ccdb.alliancecan.ca/services/opt_in opt-in page on the CCDB site]. After clicking the &amp;quot;Join&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the *environment modules* system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., `PATH`, `LD_LIBRARY_PATH`) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
All three share a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
=== Backup and Archive Storage ===&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6740</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6740"/>
		<updated>2025-08-05T14:33:22Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
== System Overview ==&lt;br /&gt;
&lt;br /&gt;
The Trillium system is a state-of-the-art high performance computing platform, consisting of three main components:&lt;br /&gt;
&lt;br /&gt;
1. CPU Partition&lt;br /&gt;
* ~240,000 cores across homogeneous CPU nodes  &lt;br /&gt;
* Non-blocking 400 Gb/s NDR InfiniBand interconnect  &lt;br /&gt;
* Ideal for large-scale parallel workloads  &lt;br /&gt;
&lt;br /&gt;
2. GPU Partition&lt;br /&gt;
* 61 GPU nodes, each with 4 x NVIDIA H100 (SXM) GPUs  &lt;br /&gt;
* 800 Gb/s bandwidth per node (200 Gb/s per GPU) over InfiniBand  &lt;br /&gt;
* Optimized for AI/ML and accelerated science workloads  &lt;br /&gt;
* Note: This partition is in high demand and not ideal for training extremely large models (multi-100B parameters)&lt;br /&gt;
&lt;br /&gt;
3. Storage System&lt;br /&gt;
* Unified 29 PB VAST NVMe storage for all workloads  &lt;br /&gt;
* No tiering — all flash-based for consistent performance  &lt;br /&gt;
* Accessible via POSIX or S3 under a unified namespace  &lt;br /&gt;
&lt;br /&gt;
=== Cooling and Energy Efficiency ===&lt;br /&gt;
&lt;br /&gt;
Trillium is fully direct liquid cooled using warm water (35–40 °C input), resulting in:&lt;br /&gt;
&lt;br /&gt;
* PUE below 1.03 (high energy efficiency)&lt;br /&gt;
* Use of closed-loop dry fluid coolers, avoiding evaporative towers and new water usage&lt;br /&gt;
* Heat reuse: Trillium supplies excess heat to nearby facilities to minimize climate impact&lt;br /&gt;
&lt;br /&gt;
=Specifications=&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you may need to request access on the [https://ccdb.alliancecan.ca/services/opt_in opt-in page on the CCDB site]. After clicking the &amp;quot;Join&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the *environment modules* system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., `PATH`, `LD_LIBRARY_PATH`) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Storage System ==&lt;br /&gt;
&lt;br /&gt;
Trillium features a unified high-performance storage system based on the VAST platform, with no tiering. It serves the following directories:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;/home&amp;lt;/code&amp;gt; – For personal files and configurations.&lt;br /&gt;
* &amp;lt;code&amp;gt;/scratch&amp;lt;/code&amp;gt; – High-speed, temporary storage for job data.&lt;br /&gt;
* &amp;lt;code&amp;gt;/project&amp;lt;/code&amp;gt; – Shared storage for project teams and collaborations.&lt;br /&gt;
&lt;br /&gt;
All three share a unified 29 PB NVMe-backed storage pool, with:&lt;br /&gt;
&lt;br /&gt;
* 29 PB effective capacity (deduplicated via VAST)&lt;br /&gt;
* 16.7 PB raw flash capacity&lt;br /&gt;
* 714 GB/s read bandwidth, 275 GB/s write bandwidth&lt;br /&gt;
* 10 million read IOPS, 2 million write IOPS&lt;br /&gt;
* POSIX and S3 access protocols under a unified namespace&lt;br /&gt;
* 48 C-Boxes and 14 D-Boxes for data services&lt;br /&gt;
&lt;br /&gt;
The storage is accessible via the NDR InfiniBand fabric for maximum performance across all workloads.&lt;br /&gt;
&lt;br /&gt;
=== Backup and Archive Storage ===&lt;br /&gt;
&lt;br /&gt;
An additional 114 PB HPSS tape-based archive is available for nearline storage:&lt;br /&gt;
&lt;br /&gt;
* Dual-copy archive across geographically separate libraries&lt;br /&gt;
* Used for both backup and archival purposes&lt;br /&gt;
* Backups are managed using Atempo backup software&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6737</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6737"/>
		<updated>2025-08-05T14:24:17Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1284 nodes (240768 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode= 192 (CPU nodes) and 96 (GPU nodes)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=Specifications=&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you may need to request access on the [https://ccdb.alliancecan.ca/services/opt_in opt-in page on the CCDB site]. After clicking the &amp;quot;Join&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
== Your various directories ==&lt;br /&gt;
&lt;br /&gt;
By virtue of your access to Trillium you are granted storage space on the system.  There are several directories available to you, each indicated by an associated environment variable.&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the *environment modules* system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., `PATH`, `LD_LIBRARY_PATH`) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6734</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6734"/>
		<updated>2025-08-05T14:20:09Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1,224 nodes (235,008 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode=96 (192 hyperthreads)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=Specifications=&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you may need to request access on the [https://ccdb.alliancecan.ca/services/opt_in opt-in page on the CCDB site]. After clicking the &amp;quot;Join&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;br /&gt;
&lt;br /&gt;
== Your various directories ==&lt;br /&gt;
&lt;br /&gt;
By virtue of your access to Trillium you are granted storage space on the system.  There are several directories available to you, each indicated by an associated environment variable.&lt;br /&gt;
&lt;br /&gt;
== Software Environment ==&lt;br /&gt;
&lt;br /&gt;
Trillium uses the *environment modules* system to manage compilers, libraries, and other software packages. Modules dynamically modify your environment (e.g., `PATH`, `LD_LIBRARY_PATH`) so you can access different versions of software without conflicts.&lt;br /&gt;
&lt;br /&gt;
A detailed explanation can be [[Using_modules | found on the modules page]].&lt;br /&gt;
&lt;br /&gt;
Commonly used module commands:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Load the default version of a software package.&lt;br /&gt;
* &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;/&amp;lt;module-version&amp;gt;&amp;lt;/code&amp;gt; – Load a specific version.&lt;br /&gt;
* &amp;lt;code&amp;gt;module purge&amp;lt;/code&amp;gt; – Unload all currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt; – List available modules that can be loaded.&lt;br /&gt;
* &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt; – Show currently loaded modules.&lt;br /&gt;
* &amp;lt;code&amp;gt;module spider&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;module spider &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Search for available modules and their versions.&lt;br /&gt;
&lt;br /&gt;
Handy abbreviations are available:&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;code&amp;gt;ml&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module list&amp;lt;/code&amp;gt;.&lt;br /&gt;
* &amp;lt;code&amp;gt;ml &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt; – Equivalent to &amp;lt;code&amp;gt;module load &amp;lt;module-name&amp;gt;&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6731</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6731"/>
		<updated>2025-08-05T14:09:31Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1,224 nodes (235,008 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode=96 (192 hyperthreads)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
__TOC__&lt;br /&gt;
&lt;br /&gt;
=Specifications=&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you may need to request access on the [https://ccdb.alliancecan.ca/services/opt_in opt-in page on the CCDB site]. After clicking the &amp;quot;Join&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6728</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6728"/>
		<updated>2025-08-05T14:07:37Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1,224 nodes (235,008 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode=96 (192 hyperthreads)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=Specifications=&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster comprised of two types of nodes:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVIDIA H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
Trillium is not automatically available to all Alliance account holders. If you are new to SciNet or your Supervisor/PI does not hold a current {{Alliance}} [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions RAC] allocation, you may need to request access on the [https://ccdb.alliancecan.ca/services/opt_in opt-in page on the CCDB site]. After clicking the &amp;quot;Join&amp;quot; button, it usually takes only one or two business days for access to be granted.&lt;br /&gt;
&lt;br /&gt;
You can check if you already have Trillium access by attempting to log in. If you receive a &amp;quot;Permission denied&amp;quot; error (and your SSH key is correctly set up), you may need to opt in.&lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
&lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; enables X11 forwarding, allowing graphical programs to open windows on your local computer.&lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6725</id>
		<title>Trillium Quickstart</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Trillium_Quickstart&amp;diff=6725"/>
		<updated>2025-08-05T13:54:24Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Created page with &amp;quot;{{Infobox Computer |image=thumb |name=Trillium |installed=Aug 2025 |operatingsystem= Rocky Linux 9.6 |loginnode= trillium.scinet.utoronto.ca...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Infobox Computer&lt;br /&gt;
|image=[[Image:Niagara.jpg|center|300px|thumb]]&lt;br /&gt;
|name=Trillium&lt;br /&gt;
|installed=Aug 2025&lt;br /&gt;
|operatingsystem= Rocky Linux 9.6&lt;br /&gt;
|loginnode= trillium.scinet.utoronto.ca&lt;br /&gt;
|nnodes=  1,224 nodes (235,008 cores)&lt;br /&gt;
|rampernode= 768 GB&lt;br /&gt;
|corespernode=96 (192 hyperthreads)&lt;br /&gt;
|interconnect=Mellanox Dragonfly+&lt;br /&gt;
|vendorcompilers= icc (C) ifort (fortran) icpc (C++)&lt;br /&gt;
|queuetype=Slurm&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
=Specifications=&lt;br /&gt;
&lt;br /&gt;
The Trillium cluster is a large cluster consisting of:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable sortable&amp;quot;&lt;br /&gt;
! nodes !! cores !! available memory !! CPU !! GPU&lt;br /&gt;
|-&lt;br /&gt;
| 1224 || 192 || 768GB DDR5 ||2 x AMD EPYC 9655 (Zen 5) @ 2.6 GHz, 384MB cache L3 ||&lt;br /&gt;
|-&lt;br /&gt;
|  60 || 96 || 768GB DDR5 || 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 || 4 x NVidia H100 SXM (80 GB memory)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Each node of the cluster has 768 GB RAM per node.  Being designed for large parallel workloads, it has a fast interconnect consisting of NDR InfiniBand in a Dragonfly+ topology with Adaptive Routing. The compute nodes are accessed through a queueing system that allows jobs with a minimum of 15 minutes and a maximum of 24 hours.&lt;br /&gt;
&lt;br /&gt;
= Getting started on Trillium =&lt;br /&gt;
&lt;br /&gt;
Access to Trillium is not enabled automatically for everyone with an account with the {{DigitalResearchAllianceOfCanada}}, but anyone with an active Alliance account can get their access enabled.&lt;br /&gt;
 &lt;br /&gt;
If you have an active Alliance account but you do not have access to Trillium yet (e.g. because you are new to SciNet or belong to a group whose primary PI does not have an allocation as granted in the annual [https://alliancecan.ca/en/services/advanced-research-computing/research-portal/accessing-resources/resource-allocation-competitions {{Alliance}} RAC]), go to the [https://ccdb.alliancecan.ca/services/opt_in opt-in page on the CCDB site].  After clicking the &amp;quot;Join&amp;quot; button, it usually takes only one or two business days for access to be granted.  &lt;br /&gt;
&lt;br /&gt;
Please read this document carefully.  The [https://docs.scinet.utoronto.ca/index.php/FAQ FAQ] is also a useful resource.  If at any time you require assistance, or if something is unclear, please do not hesitate to [mailto:support@scinet.utoronto.ca contact us].&lt;br /&gt;
&lt;br /&gt;
== Logging in ==&lt;br /&gt;
        &lt;br /&gt;
Trillium runs Rocky Linux 9.6, which is a type of Linux.  You will need to be familiar with Linux systems to work on Trillium.  If you are not it will be worth your time to review our [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=scmp101&amp;amp;include=all&amp;amp;filter=Filter Introduction to Linux Shell] class.&lt;br /&gt;
&lt;br /&gt;
As with all SciNet and {{Alliance}} compute systems, access to Trillium is done via [[SSH]] (secure shell) only and authentication is only allowed via SSH keys. [https://docs.alliancecan.ca/wiki/SSH_Keys Please refer to this page] to generate your SSH key pair and make sure you use them securely.&lt;br /&gt;
 &lt;br /&gt;
Open a terminal window (e.g. Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_PuTTY PuTTY] on Windows or Connecting with [https://docs.alliancecan.ca/wiki/Connecting_with_MobaXTerm MobaXTerm]), then SSH into the Trillium login nodes with your {{Alliance}} credentials:&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
or&lt;br /&gt;
&lt;br /&gt;
 $ ssh -i /path/to/ssh_private_key -Y MYALLIANCEUSERNAME@trillium.computecanada.ca&lt;br /&gt;
&lt;br /&gt;
The first time you login to Trillium, please make sure you are actually accessing Trillium by checking if the login node ssh host key fingerprint matches [[SSH_Changes_in_May_2019 | (See here how)]]. This check prevents you from falling victim of [https://en.wikipedia.org/wiki/Man-in-the-middle_attack man-in-the-middle attacks.]&lt;br /&gt;
&lt;br /&gt;
* The Trillium login nodes are where you develop, edit, compile, prepare and submit jobs.&lt;br /&gt;
* These login nodes are not part of the Trillium compute cluster, but have the same architecture, operating system, and software stack.&lt;br /&gt;
* The optional &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; is needed to open windows from the Niagara command-line onto your local X server.&lt;br /&gt;
* You can only connect 4 times in a 2-minute window to the login nodes. &lt;br /&gt;
* To run on Trillium compute nodes, you must [[#Submitting_jobs | submit a batch job]].&lt;br /&gt;
&lt;br /&gt;
If you cannot log in, be sure to first check the [https://docs.scinet.utoronto.ca System Status] on this site's front page.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Custom_fMRIPrep_with_Apptainer_on_Niagara&amp;diff=5930</id>
		<title>Custom fMRIPrep with Apptainer on Niagara</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Custom_fMRIPrep_with_Apptainer_on_Niagara&amp;diff=5930"/>
		<updated>2024-10-26T04:46:00Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Overview =&lt;br /&gt;
This guide outlines the steps for setting up a custom fMRIPrep container using Apptainer on Niagara, configuring Templateflow for offline use, and preparing a submission script for running fMRIPrep in the Niagara cluster environment.&lt;br /&gt;
&lt;br /&gt;
== Acknowledgments ==&lt;br /&gt;
Special thanks to Maurice Pasternak for providing this information and guidance on setting up the fMRIPrep container on Niagara.&lt;br /&gt;
&lt;br /&gt;
= Setting up the fMRIPrep Container =&lt;br /&gt;
&lt;br /&gt;
== Verify Apptainer Installation ==&lt;br /&gt;
We will use Apptainer to run fMRIPrep. Apptainer should be installed by default on Niagara. To verify, run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
which apptainer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
A path like &amp;lt;code&amp;gt;/usr/bin/apptainer&amp;lt;/code&amp;gt; should be returned.&lt;br /&gt;
&lt;br /&gt;
If Apptainer is not enabled, enable it by running:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load apptainer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Creating the Container ==&lt;br /&gt;
We will create and use a definition file to create the container and add some additional Python packages to ensure compatibility in the event that fMRIPrep demands internet access.&lt;br /&gt;
&lt;br /&gt;
This is a one-time setup.&lt;br /&gt;
&lt;br /&gt;
Let's start by creating a definition file. Go to the directory where you want to create the container and run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; fmriprep.def&lt;br /&gt;
From: nipreps/fmriprep:latest&lt;br /&gt;
&lt;br /&gt;
%post&lt;br /&gt;
   apt-get update &amp;amp;&amp;amp; apt-get install -y python3-pip&lt;br /&gt;
   pip3 install pysocks&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will create a file called fmriprep.def. Use it to build the container with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
apptainer build fmriprep-latest.sif fmriprep-latest.def&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will create a file called fmriprep-latest.sif in the current directory. It is the fMRIPrep container of interest. You can verify the container by running:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
apptainer run fmriprep-latest.sif --version&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This should print the version of fMRIPrep in the container.&lt;br /&gt;
&lt;br /&gt;
= Setting up Templateflow =&lt;br /&gt;
&lt;br /&gt;
fMRIPrep will try to download templates from the internet. To avoid this, we will try to setup Templateflow locally and use environment variables to inform fMRIPrep where to find it.&lt;br /&gt;
&lt;br /&gt;
This is a one-time setup.&lt;br /&gt;
&lt;br /&gt;
== Establishing the Templateflow Directory ==&lt;br /&gt;
Define the directory where Templateflow will be installed, for example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export TEMPLATEFLOW_HOME=$HOME/templateflow&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For convenience, you can add the above command to your .bashrc or .zshrc file which should be located in your home directory:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;export TEMPLATEFLOW_HOME=$HOME/templateflow&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If this directory we have indicated does not exist, create it:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir -p $TEMPLATEFLOW_HOME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using Python to Install Templateflow ==&lt;br /&gt;
You can either activate a Python environment you have used previously or create a new one. For instructions, see [https://docs.scinet.utoronto.ca/index.php/Python#Using_Virtualenv_in_Regular_Python these steps].&lt;br /&gt;
&lt;br /&gt;
Now, let's install Templateflow.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install templateflow&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Make sure the &amp;lt;code&amp;gt;TEMPLATEFLOW_HOME&amp;lt;/code&amp;gt; environment variable is set correctly:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo $TEMPLATEFLOW_HOME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This should point to the directory you created earlier. If this prints nothing, refer back to the section on establishing the Templateflow directory.&lt;br /&gt;
&lt;br /&gt;
Assuming the above checks out, let's start up a Python session and install the desired templates.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Within Python, run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import templateflow.api as tf&lt;br /&gt;
tf.templates()&lt;br /&gt;
tf.get([&amp;quot;MNI152NLin2009cAsym&amp;quot;, &amp;quot;MNI152NLin6Asym&amp;quot;, &amp;quot;OASIS30ANTs&amp;quot;, &amp;quot;fsLR&amp;quot;, &amp;quot;fsaverage&amp;quot;])&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will take a while to run. After it is complete, try to re-run the get command again. It should return immediately with a list of Path objects, with no download taking place since the templates have already been downloaded.&lt;br /&gt;
&lt;br /&gt;
Let's exit the Python session by typing &amp;lt;code&amp;gt;exit()&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= Establishing an SSH Tunnel to the Niagara Login Node =&lt;br /&gt;
&lt;br /&gt;
'''This must be done every time''' you log in to Niagara if you intend to use the container.&lt;br /&gt;
&lt;br /&gt;
It is possible that fMRIPrep will at some point request internet access. To avoid this, we will establish an SSH tunnel to a Niagara login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh -D 44223 nia-login02 -f -N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where &amp;lt;code&amp;gt;nia-login02&amp;lt;/code&amp;gt; is an example login node. Preferably, replace with the login node that you are currently using. You can check which login node you are using by running &amp;lt;code&amp;gt;echo $HOSTNAME&amp;lt;/code&amp;gt; and looking at the &amp;lt;code&amp;gt;nia-login##&amp;lt;/code&amp;gt; part.&lt;br /&gt;
&lt;br /&gt;
= Preparing a Script to Run fMRIPrep =&lt;br /&gt;
&lt;br /&gt;
Assuming you’ve set up everything above, have a BIDS dataset ready, and have a Freesurfer license file, you can use the following example script:&lt;br /&gt;
&lt;br /&gt;
* Followed the instructions above&lt;br /&gt;
* Have a BIDS dataset ready on scratch or burst buffer&lt;br /&gt;
* Have a freesurfer license file placed in a known location&lt;br /&gt;
&lt;br /&gt;
Here would be an example script that you can use to run fMRIPrep:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=fmriprep&lt;br /&gt;
#SBATCH --output=specify/where/to/save/fmriprep.log&lt;br /&gt;
#SBATCH --error=specify/where/to/save/fmriprep.err&lt;br /&gt;
#SBATCH --time=12:00:00&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
#SBATCH --cpus-per-task=40&lt;br /&gt;
#SBATCH --mail-type=END,FAIL&lt;br /&gt;
#SBATCH --mail-user=your.email@whatever.ca&lt;br /&gt;
&lt;br /&gt;
export APPTAINER_INSTANCE=/path/to/your/container/file&lt;br /&gt;
export BIDS_DIR=/path/to/your/bids/dataset&lt;br /&gt;
export FS_LICENSE=/path/to/your/freesurfer/license/file&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $TEMPLATEFLOW_HOME ]; then&lt;br /&gt;
    echo &amp;quot;Templateflow directory does not exist: $TEMPLATEFLOW_HOME&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $BIDS_DIR ]; then&lt;br /&gt;
    echo &amp;quot;BIDS directory does not exist: $BIDS_DIR&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $BIDS_DIR/derivatives/fmriprep ]; then&lt;br /&gt;
    mkdir -p $BIDS_DIR/derivatives/fmriprep&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -f $FS_LICENSE ]; then&lt;br /&gt;
    echo &amp;quot;Freesurfer license file does not exist: $FS_LICENSE&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export all_proxy=socks5://localhost:44223&lt;br /&gt;
export APPTAINERENV_TEMPLATEFLOW_HOME=/templateflow&lt;br /&gt;
&lt;br /&gt;
apptainer run \&lt;br /&gt;
    --cleanenv \&lt;br /&gt;
    -B $BIDS_DIR:/data \&lt;br /&gt;
    -B $BIDS_DIR/derivatives/fmriprep:/data/derivatives/fmriprep \&lt;br /&gt;
    -B $FS_LICENSE:/freesurfer_license.txt \&lt;br /&gt;
    -B $TEMPLATEFLOW_HOME:$APPTAINERENV_TEMPLATEFLOW_HOME \&lt;br /&gt;
    $APPTAINER_INSTANCE \&lt;br /&gt;
    /data \&lt;br /&gt;
    /data/derivatives/fmriprep \&lt;br /&gt;
    participant \&lt;br /&gt;
    --random-seed 42 \&lt;br /&gt;
    --omp-nthreads 40 \&lt;br /&gt;
    --fs-license-file /freesurfer_license.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Save the script &amp;lt;code&amp;gt;.sh&amp;lt;/code&amp;gt; file to a known location and remember to ensure it is executable:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod +x your_script.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running the Script =&lt;br /&gt;
Submit the script to the queue with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch your_script.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
You can check the status of the job with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Good luck!&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Custom_fMRIPrep_with_Apptainer_on_Niagara&amp;diff=5927</id>
		<title>Custom fMRIPrep with Apptainer on Niagara</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Custom_fMRIPrep_with_Apptainer_on_Niagara&amp;diff=5927"/>
		<updated>2024-10-26T04:40:56Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Overview =&lt;br /&gt;
This guide outlines the steps for setting up a custom fMRIPrep container using Apptainer on Niagara, configuring Templateflow for offline use, and preparing a submission script for running fMRIPrep in the Niagara cluster environment.&lt;br /&gt;
&lt;br /&gt;
= Setting up the fMRIPrep Container =&lt;br /&gt;
&lt;br /&gt;
== Verify Apptainer Installation ==&lt;br /&gt;
We will use Apptainer to run fMRIPrep. Apptainer should be installed by default on Niagara. To verify, run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
which apptainer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
A path like &amp;lt;code&amp;gt;/usr/bin/apptainer&amp;lt;/code&amp;gt; should be returned.&lt;br /&gt;
&lt;br /&gt;
If Apptainer is not enabled, enable it by running:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load apptainer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Creating the Container ==&lt;br /&gt;
We will create and use a definition file to create the container and add some additional Python packages to ensure compatibility in the event that fMRIPrep demands internet access.&lt;br /&gt;
&lt;br /&gt;
This is a one-time setup.&lt;br /&gt;
&lt;br /&gt;
Let's start by creating a definition file. Go to the directory where you want to create the container and run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; fmriprep.def&lt;br /&gt;
From: nipreps/fmriprep:latest&lt;br /&gt;
&lt;br /&gt;
%post&lt;br /&gt;
   apt-get update &amp;amp;&amp;amp; apt-get install -y python3-pip&lt;br /&gt;
   pip3 install pysocks&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will create a file called fmriprep.def. Use it to build the container with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
apptainer build fmriprep-latest.sif fmriprep-latest.def&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will create a file called fmriprep-latest.sif in the current directory. It is the fMRIPrep container of interest. You can verify the container by running:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
apptainer run fmriprep-latest.sif --version&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This should print the version of fMRIPrep in the container.&lt;br /&gt;
&lt;br /&gt;
= Setting up Templateflow =&lt;br /&gt;
&lt;br /&gt;
fMRIPrep will try to download templates from the internet. To avoid this, we will try to setup Templateflow locally and use environment variables to inform fMRIPrep where to find it.&lt;br /&gt;
&lt;br /&gt;
This is a one-time setup.&lt;br /&gt;
&lt;br /&gt;
== Establishing the Templateflow Directory ==&lt;br /&gt;
Define the directory where Templateflow will be installed, for example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export TEMPLATEFLOW_HOME=$HOME/templateflow&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For convenience, you can add the above command to your .bashrc or .zshrc file which should be located in your home directory:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;export TEMPLATEFLOW_HOME=$HOME/templateflow&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If this directory we have indicated does not exist, create it:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir -p $TEMPLATEFLOW_HOME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using Python to Install Templateflow ==&lt;br /&gt;
You can either activate a Python environment you have used previously or create a new one. For instructions, see [https://docs.scinet.utoronto.ca/index.php/Python#Using_Virtualenv_in_Regular_Python these steps].&lt;br /&gt;
&lt;br /&gt;
Now, let's install Templateflow.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install templateflow&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Make sure the &amp;lt;code&amp;gt;TEMPLATEFLOW_HOME&amp;lt;/code&amp;gt; environment variable is set correctly:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo $TEMPLATEFLOW_HOME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This should point to the directory you created earlier. If this prints nothing, refer back to the section on establishing the Templateflow directory.&lt;br /&gt;
&lt;br /&gt;
Assuming the above checks out, let's start up a Python session and install the desired templates.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Within Python, run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import templateflow.api as tf&lt;br /&gt;
tf.templates()&lt;br /&gt;
tf.get([&amp;quot;MNI152NLin2009cAsym&amp;quot;, &amp;quot;MNI152NLin6Asym&amp;quot;, &amp;quot;OASIS30ANTs&amp;quot;, &amp;quot;fsLR&amp;quot;, &amp;quot;fsaverage&amp;quot;])&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This will take a while to run. After it is complete, try to re-run the get command again. It should return immediately with a list of Path objects, with no download taking place since the templates have already been downloaded.&lt;br /&gt;
&lt;br /&gt;
Let's exit the Python session by typing &amp;lt;code&amp;gt;exit()&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= Establishing an SSH Tunnel to the Niagara Login Node =&lt;br /&gt;
&lt;br /&gt;
'''This must be done every time''' you log in to Niagara if you intend to use the container.&lt;br /&gt;
&lt;br /&gt;
It is possible that fMRIPrep will at some point request internet access. To avoid this, we will establish an SSH tunnel to a Niagara login node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh -D 44223 nia-login02 -f -N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Where &amp;lt;code&amp;gt;nia-login02&amp;lt;/code&amp;gt; is an example login node. Preferably, replace with the login node that you are currently using. You can check which login node you are using by running &amp;lt;code&amp;gt;echo $HOSTNAME&amp;lt;/code&amp;gt; and looking at the &amp;lt;code&amp;gt;nia-login##&amp;lt;/code&amp;gt; part.&lt;br /&gt;
&lt;br /&gt;
= Preparing a Script to Run fMRIPrep =&lt;br /&gt;
&lt;br /&gt;
Assuming you’ve set up everything above, have a BIDS dataset ready, and have a Freesurfer license file, you can use the following example script:&lt;br /&gt;
&lt;br /&gt;
* Followed the instructions above&lt;br /&gt;
* Have a BIDS dataset ready on scratch or burst buffer&lt;br /&gt;
* Have a freesurfer license file placed in a known location&lt;br /&gt;
&lt;br /&gt;
Here would be an example script that you can use to run fMRIPrep:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=fmriprep&lt;br /&gt;
#SBATCH --output=specify/where/to/save/fmriprep.log&lt;br /&gt;
#SBATCH --error=specify/where/to/save/fmriprep.err&lt;br /&gt;
#SBATCH --time=12:00:00&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
#SBATCH --cpus-per-task=40&lt;br /&gt;
#SBATCH --mail-type=END,FAIL&lt;br /&gt;
#SBATCH --mail-user=your.email@whatever.ca&lt;br /&gt;
&lt;br /&gt;
export APPTAINER_INSTANCE=/path/to/your/container/file&lt;br /&gt;
export BIDS_DIR=/path/to/your/bids/dataset&lt;br /&gt;
export FS_LICENSE=/path/to/your/freesurfer/license/file&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $TEMPLATEFLOW_HOME ]; then&lt;br /&gt;
    echo &amp;quot;Templateflow directory does not exist: $TEMPLATEFLOW_HOME&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $BIDS_DIR ]; then&lt;br /&gt;
    echo &amp;quot;BIDS directory does not exist: $BIDS_DIR&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $BIDS_DIR/derivatives/fmriprep ]; then&lt;br /&gt;
    mkdir -p $BIDS_DIR/derivatives/fmriprep&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -f $FS_LICENSE ]; then&lt;br /&gt;
    echo &amp;quot;Freesurfer license file does not exist: $FS_LICENSE&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export all_proxy=socks5://localhost:44223&lt;br /&gt;
export APPTAINERENV_TEMPLATEFLOW_HOME=/templateflow&lt;br /&gt;
&lt;br /&gt;
apptainer run \&lt;br /&gt;
    --cleanenv \&lt;br /&gt;
    -B $BIDS_DIR:/data \&lt;br /&gt;
    -B $BIDS_DIR/derivatives/fmriprep:/data/derivatives/fmriprep \&lt;br /&gt;
    -B $FS_LICENSE:/freesurfer_license.txt \&lt;br /&gt;
    -B $TEMPLATEFLOW_HOME:$APPTAINERENV_TEMPLATEFLOW_HOME \&lt;br /&gt;
    $APPTAINER_INSTANCE \&lt;br /&gt;
    /data \&lt;br /&gt;
    /data/derivatives/fmriprep \&lt;br /&gt;
    participant \&lt;br /&gt;
    --random-seed 42 \&lt;br /&gt;
    --omp-nthreads 40 \&lt;br /&gt;
    --fs-license-file /freesurfer_license.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Save the script &amp;lt;code&amp;gt;.sh&amp;lt;/code&amp;gt; file to a known location and remember to ensure it is executable:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod +x your_script.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running the Script =&lt;br /&gt;
Submit the script to the queue with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch your_script.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
You can check the status of the job with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Good luck!&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Custom_fMRIPrep_with_Apptainer_on_Niagara&amp;diff=5924</id>
		<title>Custom fMRIPrep with Apptainer on Niagara</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Custom_fMRIPrep_with_Apptainer_on_Niagara&amp;diff=5924"/>
		<updated>2024-10-26T04:13:06Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Overview =&lt;br /&gt;
This guide outlines the steps for setting up a custom fMRIPrep container using Apptainer on Niagara, configuring Templateflow for offline use, and preparing a submission script for running fMRIPrep in the Niagara cluster environment.&lt;br /&gt;
&lt;br /&gt;
= Setting up the fMRIPrep Container =&lt;br /&gt;
&lt;br /&gt;
== Verify Apptainer Installation ==&lt;br /&gt;
1. Ensure Apptainer is installed on Niagara. To verify, run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
which apptainer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
A path like &amp;lt;code&amp;gt;/usr/bin/apptainer&amp;lt;/code&amp;gt; should be returned.&lt;br /&gt;
&lt;br /&gt;
2. If Apptainer is not enabled, enable it by running:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load apptainer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Creating the Container ==&lt;br /&gt;
1. Create a definition file to build the container. In your desired directory, run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; fmriprep.def&lt;br /&gt;
From: nipreps/fmriprep:latest&lt;br /&gt;
&lt;br /&gt;
%post&lt;br /&gt;
   apt-get update &amp;amp;&amp;amp; apt-get install -y python3-pip&lt;br /&gt;
   pip3 install pysocks&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
2. Build the container with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
apptainer build fmriprep-latest.sif fmriprep-latest.def&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
3. Verify the container by running:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
apptainer run fmriprep-latest.sif --version&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This should print the version of fMRIPrep in the container.&lt;br /&gt;
&lt;br /&gt;
= Setting up Templateflow =&lt;br /&gt;
&lt;br /&gt;
fMRIPrep will try to download templates from the internet. To avoid this, we set up Templateflow locally.&lt;br /&gt;
&lt;br /&gt;
== Establishing the Templateflow Directory ==&lt;br /&gt;
1. Define the Templateflow home directory, for example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export TEMPLATEFLOW_HOME=$HOME/templateflow&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
2. Add this to your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;.zshrc&amp;lt;/code&amp;gt; file:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;export TEMPLATEFLOW_HOME=$HOME/templateflow&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
3. Create the directory if it does not exist:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir -p $TEMPLATEFLOW_HOME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== 2.2 Using Python to Install Templateflow ===&lt;br /&gt;
1. Activate your Python environment. For instructions, see [these instructions](https://docs.scinet.utoronto.ca/index.php/Python#Using_Virtualenv_in_Regular_Python).&lt;br /&gt;
&lt;br /&gt;
2. Install Templateflow:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install templateflow&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
3. Verify the &amp;lt;code&amp;gt;TEMPLATEFLOW_HOME&amp;lt;/code&amp;gt; environment variable:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo $TEMPLATEFLOW_HOME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
4. Start a Python session to install templates:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Within Python, run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import templateflow.api as tf&lt;br /&gt;
tf.templates()&lt;br /&gt;
tf.get([&amp;quot;MNI152NLin2009cAsym&amp;quot;, &amp;quot;MNI152NLin6Asym&amp;quot;, &amp;quot;OASIS30ANTs&amp;quot;, &amp;quot;fsLR&amp;quot;, &amp;quot;fsaverage&amp;quot;])&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
5. Exit Python with &amp;lt;code&amp;gt;exit()&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= Establishing an SSH Tunnel to the Niagara Login Node =&lt;br /&gt;
&lt;br /&gt;
Establish this tunnel each time you log in to Niagara:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh -D 44223 nia-login02 -f -N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;nia-login02&amp;lt;/code&amp;gt; with your current login node (check with &amp;lt;code&amp;gt;echo $HOSTNAME&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
= Preparing a Script to Run fMRIPrep =&lt;br /&gt;
&lt;br /&gt;
Assuming you’ve set up everything above, have a BIDS dataset ready, and have a Freesurfer license file, you can use the following example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=fmriprep&lt;br /&gt;
#SBATCH --output=specify/where/to/save/fmriprep.log&lt;br /&gt;
#SBATCH --error=specify/where/to/save/fmriprep.err&lt;br /&gt;
#SBATCH --time=12:00:00&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
#SBATCH --cpus-per-task=40&lt;br /&gt;
#SBATCH --mail-type=END,FAIL&lt;br /&gt;
#SBATCH --mail-user=your.email@whatever.ca&lt;br /&gt;
&lt;br /&gt;
export APPTAINER_INSTANCE=/path/to/your/container/file&lt;br /&gt;
export BIDS_DIR=/path/to/your/bids/dataset&lt;br /&gt;
export FS_LICENSE=/path/to/your/freesurfer/license/file&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $TEMPLATEFLOW_HOME ]; then&lt;br /&gt;
    echo &amp;quot;Templateflow directory does not exist: $TEMPLATEFLOW_HOME&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $BIDS_DIR ]; then&lt;br /&gt;
    echo &amp;quot;BIDS directory does not exist: $BIDS_DIR&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $BIDS_DIR/derivatives/fmriprep ]; then&lt;br /&gt;
    mkdir -p $BIDS_DIR/derivatives/fmriprep&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -f $FS_LICENSE ]; then&lt;br /&gt;
    echo &amp;quot;Freesurfer license file does not exist: $FS_LICENSE&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export all_proxy=socks5://localhost:44223&lt;br /&gt;
export APPTAINERENV_TEMPLATEFLOW_HOME=/templateflow&lt;br /&gt;
&lt;br /&gt;
apptainer run \&lt;br /&gt;
    --cleanenv \&lt;br /&gt;
    -B $BIDS_DIR:/data \&lt;br /&gt;
    -B $BIDS_DIR/derivatives/fmriprep:/data/derivatives/fmriprep \&lt;br /&gt;
    -B $FS_LICENSE:/freesurfer_license.txt \&lt;br /&gt;
    -B $TEMPLATEFLOW_HOME:$APPTAINERENV_TEMPLATEFLOW_HOME \&lt;br /&gt;
    $APPTAINER_INSTANCE \&lt;br /&gt;
    /data \&lt;br /&gt;
    /data/derivatives/fmriprep \&lt;br /&gt;
    participant \&lt;br /&gt;
    --random-seed 42 \&lt;br /&gt;
    --omp-nthreads 40 \&lt;br /&gt;
    --fs-license-file /freesurfer_license.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Make the script executable:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod +x your_script.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running the Script =&lt;br /&gt;
Submit the script with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch your_script.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Check the job status with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Good luck!&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Custom_fMRIPrep_with_Apptainer_on_Niagara&amp;diff=5921</id>
		<title>Custom fMRIPrep with Apptainer on Niagara</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Custom_fMRIPrep_with_Apptainer_on_Niagara&amp;diff=5921"/>
		<updated>2024-10-26T04:07:40Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Custom fMRIPrep with Apptainer on Niagara =&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
This guide outlines the steps for setting up a custom fMRIPrep container using Apptainer on Niagara, configuring Templateflow for offline use, and preparing a submission script for running fMRIPrep in the Niagara cluster environment.&lt;br /&gt;
&lt;br /&gt;
== 1. Setting up the fMRIPrep Container ==&lt;br /&gt;
&lt;br /&gt;
=== 1.1 Verify Apptainer Installation ===&lt;br /&gt;
1. Ensure Apptainer is installed on Niagara. To verify, run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
which apptainer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
A path like &amp;lt;code&amp;gt;/usr/bin/apptainer&amp;lt;/code&amp;gt; should be returned.&lt;br /&gt;
&lt;br /&gt;
2. If Apptainer is not enabled, enable it by running:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load apptainer&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
=== 1.2 Creating the Container ===&lt;br /&gt;
1. Create a definition file to build the container. In your desired directory, run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat &amp;lt;&amp;lt;EOF &amp;gt; fmriprep.def&lt;br /&gt;
From: nipreps/fmriprep:latest&lt;br /&gt;
&lt;br /&gt;
%post&lt;br /&gt;
   apt-get update &amp;amp;&amp;amp; apt-get install -y python3-pip&lt;br /&gt;
   pip3 install pysocks&lt;br /&gt;
EOF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
2. Build the container with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
apptainer build fmriprep-latest.sif fmriprep-latest.def&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
3. Verify the container by running:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
apptainer run fmriprep-latest.sif --version&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This should print the version of fMRIPrep in the container.&lt;br /&gt;
&lt;br /&gt;
== 2. Setting up Templateflow ==&lt;br /&gt;
&lt;br /&gt;
fMRIPrep will try to download templates from the internet. To avoid this, we set up Templateflow locally.&lt;br /&gt;
&lt;br /&gt;
=== 2.1 Establishing the Templateflow Directory ===&lt;br /&gt;
1. Define the Templateflow home directory, for example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export TEMPLATEFLOW_HOME=$HOME/templateflow&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
2. Add this to your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;.zshrc&amp;lt;/code&amp;gt; file:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo &amp;quot;export TEMPLATEFLOW_HOME=$HOME/templateflow&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
3. Create the directory if it does not exist:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mkdir -p $TEMPLATEFLOW_HOME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== 2.2 Using Python to Install Templateflow ===&lt;br /&gt;
1. Activate your Python environment. For instructions, see [these instructions](https://docs.scinet.utoronto.ca/index.php/Python#Using_Virtualenv_in_Regular_Python).&lt;br /&gt;
&lt;br /&gt;
2. Install Templateflow:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
pip install templateflow&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
3. Verify the &amp;lt;code&amp;gt;TEMPLATEFLOW_HOME&amp;lt;/code&amp;gt; environment variable:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
echo $TEMPLATEFLOW_HOME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
4. Start a Python session to install templates:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
python&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Within Python, run:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
import templateflow.api as tf&lt;br /&gt;
tf.templates()&lt;br /&gt;
tf.get([&amp;quot;MNI152NLin2009cAsym&amp;quot;, &amp;quot;MNI152NLin6Asym&amp;quot;, &amp;quot;OASIS30ANTs&amp;quot;, &amp;quot;fsLR&amp;quot;, &amp;quot;fsaverage&amp;quot;])&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
5. Exit Python with &amp;lt;code&amp;gt;exit()&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== 3. Establishing an SSH Tunnel to the Niagara Login Node ==&lt;br /&gt;
&lt;br /&gt;
Establish this tunnel each time you log in to Niagara:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh -D 44223 nia-login02 -f -N&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;nia-login02&amp;lt;/code&amp;gt; with your current login node (check with &amp;lt;code&amp;gt;echo $HOSTNAME&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
== 4. Preparing a Script to Run fMRIPrep ==&lt;br /&gt;
&lt;br /&gt;
Assuming you’ve set up everything above, have a BIDS dataset ready, and have a Freesurfer license file, you can use the following example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=fmriprep&lt;br /&gt;
#SBATCH --output=specify/where/to/save/fmriprep.log&lt;br /&gt;
#SBATCH --error=specify/where/to/save/fmriprep.err&lt;br /&gt;
#SBATCH --time=12:00:00&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
#SBATCH --cpus-per-task=40&lt;br /&gt;
#SBATCH --mail-type=END,FAIL&lt;br /&gt;
#SBATCH --mail-user=your.email@whatever.ca&lt;br /&gt;
&lt;br /&gt;
export APPTAINER_INSTANCE=/path/to/your/container/file&lt;br /&gt;
export BIDS_DIR=/path/to/your/bids/dataset&lt;br /&gt;
export FS_LICENSE=/path/to/your/freesurfer/license/file&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $TEMPLATEFLOW_HOME ]; then&lt;br /&gt;
    echo &amp;quot;Templateflow directory does not exist: $TEMPLATEFLOW_HOME&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $BIDS_DIR ]; then&lt;br /&gt;
    echo &amp;quot;BIDS directory does not exist: $BIDS_DIR&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $BIDS_DIR/derivatives/fmriprep ]; then&lt;br /&gt;
    mkdir -p $BIDS_DIR/derivatives/fmriprep&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -f $FS_LICENSE ]; then&lt;br /&gt;
    echo &amp;quot;Freesurfer license file does not exist: $FS_LICENSE&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export all_proxy=socks5://localhost:44223&lt;br /&gt;
export APPTAINERENV_TEMPLATEFLOW_HOME=/templateflow&lt;br /&gt;
&lt;br /&gt;
apptainer run \&lt;br /&gt;
    --cleanenv \&lt;br /&gt;
    -B $BIDS_DIR:/data \&lt;br /&gt;
    -B $BIDS_DIR/derivatives/fmriprep:/data/derivatives/fmriprep \&lt;br /&gt;
    -B $FS_LICENSE:/freesurfer_license.txt \&lt;br /&gt;
    -B $TEMPLATEFLOW_HOME:$APPTAINERENV_TEMPLATEFLOW_HOME \&lt;br /&gt;
    $APPTAINER_INSTANCE \&lt;br /&gt;
    /data \&lt;br /&gt;
    /data/derivatives/fmriprep \&lt;br /&gt;
    participant \&lt;br /&gt;
    --random-seed 42 \&lt;br /&gt;
    --omp-nthreads 40 \&lt;br /&gt;
    --fs-license-file /freesurfer_license.txt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Make the script executable:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
chmod +x your_script.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== 5. Running the Script ==&lt;br /&gt;
Submit the script with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch your_script.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Check the job status with:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sq&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Good luck!&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Custom_fMRIPrep_with_Apptainer_on_Niagara&amp;diff=5918</id>
		<title>Custom fMRIPrep with Apptainer on Niagara</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Custom_fMRIPrep_with_Apptainer_on_Niagara&amp;diff=5918"/>
		<updated>2024-10-26T03:57:55Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Created page with &amp;quot;= Custom fMRIPrep with Apptainer on Niagara =  == Overview == This guide outlines the steps for setting up a custom fMRIPrep container using Apptainer on Niagara, configuring...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Custom fMRIPrep with Apptainer on Niagara =&lt;br /&gt;
&lt;br /&gt;
== Overview ==&lt;br /&gt;
This guide outlines the steps for setting up a custom fMRIPrep container using Apptainer on Niagara, configuring Templateflow for offline use, and preparing a submission script for running fMRIPrep in the Niagara cluster environment.&lt;br /&gt;
&lt;br /&gt;
== 1. Setting up the fMRIPrep Container ==&lt;br /&gt;
&lt;br /&gt;
=== 1.1 Verify Apptainer Installation ===&lt;br /&gt;
1. Ensure Apptainer is installed on Niagara. To verify, run:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   which apptainer&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
   A path like &amp;lt;code&amp;gt;/usr/bin/apptainer&amp;lt;/code&amp;gt; should be returned.&lt;br /&gt;
&lt;br /&gt;
2. If Apptainer is not enabled, enable it by running:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   module load apptainer&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== 1.2 Creating the Container ===&lt;br /&gt;
1. Create a definition file to build the container. In your desired directory, run:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   cat &amp;lt;&amp;lt;EOF &amp;gt; fmriprep.def&lt;br /&gt;
   From: nipreps/fmriprep:latest&lt;br /&gt;
&lt;br /&gt;
   %post&lt;br /&gt;
       apt-get update &amp;amp;&amp;amp; apt-get install -y python3-pip&lt;br /&gt;
       pip3 install pysocks&lt;br /&gt;
   EOF&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
2. Build the container with:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   apptainer build fmriprep-latest.sif fmriprep-latest.def&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
3. Verify the container by running:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   apptainer run fmriprep-latest.sif --version&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
   This should print the version of fMRIPrep in the container.&lt;br /&gt;
&lt;br /&gt;
== 2. Setting up Templateflow ==&lt;br /&gt;
&lt;br /&gt;
fMRIPrep will try to download templates from the internet. To avoid this, we set up Templateflow locally.&lt;br /&gt;
&lt;br /&gt;
=== 2.1 Establishing the Templateflow Directory ===&lt;br /&gt;
1. Define the Templateflow home directory, for example:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   export TEMPLATEFLOW_HOME=$HOME/templateflow&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
2. Add this to your &amp;lt;code&amp;gt;.bashrc&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;.zshrc&amp;lt;/code&amp;gt; file:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   echo &amp;quot;export TEMPLATEFLOW_HOME=$HOME/templateflow&amp;quot; &amp;gt;&amp;gt; ~/.bashrc&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
3. Create the directory if it does not exist:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   mkdir -p $TEMPLATEFLOW_HOME&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== 2.2 Using Python to Install Templateflow ===&lt;br /&gt;
1. Activate your Python environment. For instructions, see [these instructions](https://docs.scinet.utoronto.ca/index.php/Python#Using_Virtualenv_in_Regular_Python).&lt;br /&gt;
&lt;br /&gt;
2. Install Templateflow:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   pip install templateflow&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
3. Verify the &amp;lt;code&amp;gt;TEMPLATEFLOW_HOME&amp;lt;/code&amp;gt; environment variable:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   echo $TEMPLATEFLOW_HOME&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
4. Start a Python session to install templates:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   python&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
   Within Python, run:&lt;br /&gt;
   &amp;lt;code&amp;gt;&lt;br /&gt;
   import templateflow.api as tf&lt;br /&gt;
   tf.templates()&lt;br /&gt;
   tf.get([&amp;quot;MNI152NLin2009cAsym&amp;quot;, &amp;quot;MNI152NLin6Asym&amp;quot;, &amp;quot;OASIS30ANTs&amp;quot;, &amp;quot;fsLR&amp;quot;, &amp;quot;fsaverage&amp;quot;])&lt;br /&gt;
   &amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
5. Exit Python with &amp;lt;code&amp;gt;exit()&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== 3. Establishing an SSH Tunnel to the Niagara Login Node ==&lt;br /&gt;
&lt;br /&gt;
Establish this tunnel each time you log in to Niagara:&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
ssh -D 44223 nia-login02 -f -N&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
Replace &amp;lt;code&amp;gt;nia-login02&amp;lt;/code&amp;gt; with your current login node (check with &amp;lt;code&amp;gt;echo $HOSTNAME&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
== 4. Preparing a Script to Run fMRIPrep ==&lt;br /&gt;
&lt;br /&gt;
Assuming you’ve set up everything above, have a BIDS dataset ready, and have a Freesurfer license file, you can use the following example script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --job-name=fmriprep&lt;br /&gt;
#SBATCH --output=specify/where/to/save/fmriprep.log&lt;br /&gt;
#SBATCH --error=specify/where/to/save/fmriprep.err&lt;br /&gt;
#SBATCH --time=12:00:00&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
#SBATCH --cpus-per-task=40&lt;br /&gt;
#SBATCH --mail-type=END,FAIL&lt;br /&gt;
#SBATCH --mail-user=your.email@whatever.ca&lt;br /&gt;
&lt;br /&gt;
export APPTAINER_INSTANCE=/path/to/your/container/file&lt;br /&gt;
export BIDS_DIR=/path/to/your/bids/dataset&lt;br /&gt;
export FS_LICENSE=/path/to/your/freesurfer/license/file&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $TEMPLATEFLOW_HOME ]; then&lt;br /&gt;
    echo &amp;quot;Templateflow directory does not exist: $TEMPLATEFLOW_HOME&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $BIDS_DIR ]; then&lt;br /&gt;
    echo &amp;quot;BIDS directory does not exist: $BIDS_DIR&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -d $BIDS_DIR/derivatives/fmriprep ]; then&lt;br /&gt;
    mkdir -p $BIDS_DIR/derivatives/fmriprep&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
if [ ! -f $FS_LICENSE ]; then&lt;br /&gt;
    echo &amp;quot;Freesurfer license file does not exist: $FS_LICENSE&amp;quot;&lt;br /&gt;
    exit 1&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
export all_proxy=socks5://localhost:44223&lt;br /&gt;
export APPTAINERENV_TEMPLATEFLOW_HOME=/templateflow&lt;br /&gt;
&lt;br /&gt;
apptainer run \&lt;br /&gt;
    --cleanenv \&lt;br /&gt;
    -B $BIDS_DIR:/data \&lt;br /&gt;
    -B $BIDS_DIR/derivatives/fmriprep:/data/derivatives/fmriprep \&lt;br /&gt;
    -B $FS_LICENSE:/freesurfer_license.txt \&lt;br /&gt;
    -B $TEMPLATEFLOW_HOME:$APPTAINERENV_TEMPLATEFLOW_HOME \&lt;br /&gt;
    $APPTAINER_INSTANCE \&lt;br /&gt;
    /data \&lt;br /&gt;
    /data/derivatives/fmriprep \&lt;br /&gt;
    participant \&lt;br /&gt;
    --random-seed 42 \&lt;br /&gt;
    --omp-nthreads 40 \&lt;br /&gt;
    --fs-license-file /freesurfer_license.txt&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Make the script executable:&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
chmod +x your_script.sh&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== 5. Running the Script ==&lt;br /&gt;
Submit the script with:&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
sbatch your_script.sh&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
Check the job status with:&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
sq&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Good luck!&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=NAMD&amp;diff=1535</id>
		<title>NAMD</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=NAMD&amp;diff=1535"/>
		<updated>2018-09-19T16:11:30Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* NAMD v2.12 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;====NAMD v2.12====&lt;br /&gt;
&lt;br /&gt;
This is the NAMD version 2.12 Scalable Molecular Dynamics package.  &lt;br /&gt;
&lt;br /&gt;
It was built directly on top of ibverbs, so it will run on InfiniBand nodes.  Here is a sample run script:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=40&lt;br /&gt;
#SBATCH --time=00:15:00&lt;br /&gt;
#SBATCH --job-name namdtest&lt;br /&gt;
&lt;br /&gt;
# Note that the module will likely be taken out of experimental mode at some point.&lt;br /&gt;
module load namd/.experimental-2.12-ibverbs-smp&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $SLURM_SUBMIT_DIR is directory job was submitted from&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
# Generate NAMD nodelist&lt;br /&gt;
for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do&lt;br /&gt;
  echo &amp;quot;host $n&amp;quot; &amp;gt;&amp;gt; nodelist.$SLURM_JOBID&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
NODELIST=nodelist.$SLURM_JOBID&lt;br /&gt;
cat $NODELIST&lt;br /&gt;
&lt;br /&gt;
# Calculate total processes (P) and procs per node (PPN)&lt;br /&gt;
PPN=4&lt;br /&gt;
P=$(($SLURM_NTASKS * 2))&lt;br /&gt;
&lt;br /&gt;
charmrun ++verbose +p $P ++ppn $PPN ++nodelist $NODELIST $SCINET_NAMD_ROOT/bin/namd2 input.namd&lt;br /&gt;
&lt;br /&gt;
# Cleaning&lt;br /&gt;
rm $NODELIST&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Full documentation for NAMD is available on their website:  http://www.ks.uiuc.edu/Research/namd/&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Singularity&amp;diff=1447</id>
		<title>Singularity</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Singularity&amp;diff=1447"/>
		<updated>2018-08-21T17:55:12Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: Reverted edits by Edickie (talk) to last revision by Ejspence&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;A holding space for Erin.&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=1445</id>
		<title>FAQ</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=FAQ&amp;diff=1445"/>
		<updated>2018-08-21T17:01:38Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Can't forward X:  &amp;quot;Warning: No xauth data; using fake authentication data&amp;quot;, or &amp;quot;X11 connection rejected because of wrong authentication.&amp;quot; */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__TOC__&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
===Whom do I contact for support?===&lt;br /&gt;
&lt;br /&gt;
Whom do I contact if I have problems or questions about how to use the SciNet systems?&lt;br /&gt;
&lt;br /&gt;
E-mail [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  &lt;br /&gt;
&lt;br /&gt;
In your email, please include the following information:&lt;br /&gt;
&lt;br /&gt;
* your username on SciNet&lt;br /&gt;
* the cluster that your question pertains to (NIA, BGQ, SGC, ...; SciNet is not a cluster!),&lt;br /&gt;
* any relevant error messages&lt;br /&gt;
* the commands you typed before the errors occurred&lt;br /&gt;
* the path to your code (if applicable)&lt;br /&gt;
* the location of the job scripts (if applicable)&lt;br /&gt;
* the directory from which it was submitted (if applicable)&lt;br /&gt;
* a description of what it is supposed to do (if applicable)&lt;br /&gt;
* if your problem is about connecting to SciNet, the type of computer you are connecting from.&lt;br /&gt;
&lt;br /&gt;
Note that your password should never, never, never be to sent to us, even if your question is about your account.&lt;br /&gt;
&lt;br /&gt;
Avoid sending email only to specific individuals at SciNet. Your chances of a quick reply increase significantly if you email our team! (support@scinet.utoronto.ca)&lt;br /&gt;
&lt;br /&gt;
===I have a CCDB account, but I can't login to SciNet. How can I get a SciNet account?===&lt;br /&gt;
&lt;br /&gt;
You must extend your CCDB application process to also get a SciNet account:&lt;br /&gt;
&lt;br /&gt;
https://www.scinethpc.ca/getting-a-scinet-account/&lt;br /&gt;
&lt;br /&gt;
===How can I reset the password for my Compute Canada account?===&lt;br /&gt;
&lt;br /&gt;
You can reset your password for your Compute Canada account here:&lt;br /&gt;
&lt;br /&gt;
https://ccdb.computecanada.ca/security/forgot&lt;br /&gt;
&lt;br /&gt;
===How can I change or reset the password for my SciNet account?===&lt;br /&gt;
&lt;br /&gt;
To reset your password at SciNet please go to [https://portal.scinet.utoronto.ca/password_resets Password reset page].&lt;br /&gt;
&lt;br /&gt;
== Connecting to Niagara ==&lt;br /&gt;
===Do you have a recommended ssh program that will allow scinet access from Windows machines?===&lt;br /&gt;
&lt;br /&gt;
The [[SSH#SSH_for_Windows_Users | SSH for Windows users]] programs we recommend are:&lt;br /&gt;
&lt;br /&gt;
* [http://mobaxterm.mobatek.net/en/ MobaXterm] is a tabbed ssh client with some Cygwin tools, including ssh and X, all wrapped up into one executable.&lt;br /&gt;
* [https://git-scm.com/downloads Git Bash] is an implementation of git which comes with a terminal emulator.&lt;br /&gt;
* [http://www.chiark.greenend.org.uk/~sgtatham/putty/ PuTTY]  - this is a terminal for windows that connects via ssh.  It is a quick install and will get you up and running quickly.&amp;lt;br/&amp;gt; '''WARNING:''' Make sure you download putty from the official website, because there are &amp;quot;trojanized&amp;quot; versions of putty around that will send your login information to a site in Russia (as reported [http://blogs.cisco.com/security/trojanized-putty-software here]).&amp;lt;br&amp;gt;To set up your passphrase protected ssh key with putty, see [http://the.earth.li/~sgtatham/putty/0.61/htmldoc/Chapter8.html#pubkey here].&lt;br /&gt;
* [http://www.cygwin.com/ CygWin] - this is a whole linux-like environment for windows, which also includes an X window server so that you can display remote windows on your desktop.  Make sure you include the openssh and X window system in the installation for full functionality.  This is recommended if you will be doing a lot of work on Linux machines, as it makes a very similar environment available on your computer.&lt;br /&gt;
&lt;br /&gt;
To set up your ssh keys, following the Linux instruction on the [[SSH keys]] page.&lt;br /&gt;
&lt;br /&gt;
===My ssh key does not work! WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! ===&lt;br /&gt;
&lt;br /&gt;
[[SSH_keys#Testing_Your_Key | Testing Your Key]]&lt;br /&gt;
&lt;br /&gt;
* If this doesn't work, you should be able to login using your password, and investigate the problem. For example, if during a login session you get an message similar to the one below, just follow the instruction and delete the offending key on line 3 (you can use vi to jump to that line with ESC plus : plus 3). That only means that you may have logged in from your home computer to SciNet in the past, and that key is obsolete.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@niagara.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @&lt;br /&gt;
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@**@@@@@@@@@@@@@@@@@@@@@@@@@@@@@&lt;br /&gt;
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!&lt;br /&gt;
Someone could be eavesdropping on you right now (man-in-the-middle&lt;br /&gt;
attack)!&lt;br /&gt;
It is also possible that the RSA host key has just been changed.&lt;br /&gt;
The fingerprint for the RSA key sent by the remote host is&lt;br /&gt;
53:f9:60:71:a8:0b:5d:74:83:52:**fe:ea:1a:9e:cc:d3.&lt;br /&gt;
Please contact your system administrator.&lt;br /&gt;
Add correct host key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts to get rid of&lt;br /&gt;
this message.&lt;br /&gt;
Offending key in /home/&amp;lt;user&amp;gt;/.ssh/known_hosts:3&lt;br /&gt;
RSA host key for niagara.scinet.utoronto.ca &lt;br /&gt;
&amp;lt;http://niagara.scinet.utoronto.ca &amp;lt;http://niagara.scinet.utoronto.ca&amp;gt;&amp;gt; has&lt;br /&gt;
changed and you have requested&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If you get the message below you may need to logout of your gnome session and log back in since the ssh-agent needs to be&lt;br /&gt;
restarted with the new passphrase ssh key.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh USERNAME@niagara.scinet.utoronto.ca&lt;br /&gt;
&lt;br /&gt;
Agent admitted failure to sign using the key.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Can't get graphics: &amp;quot;Can't open display/DISPLAY is not set&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
To use graphics on SciNet machines and have it displayed on your machine, you need to have a X server running on your computer (an X server is the standard way graphics is done on linux). One an X server is running, you can log in with the &amp;quot;-Y&amp;quot; option to ssh (&amp;quot;-X&amp;quot; sometimes also works).&lt;br /&gt;
&lt;br /&gt;
How to get an X server running on your computer, depends on the operating system.  On linux machines with a graphical interface, X will already be running.  On windows, the easiest solution is using MobaXterm, which comes with an  X server (alternatives, such as cygwin with the x11 server installed, or running putty+Xming, can also work, but are a bit more work to set up.  For Macs, you will need to install Xquartz. &lt;br /&gt;
  &lt;br /&gt;
===Remote graphics stops working after a while: &amp;quot;Can't open display&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
If you still cannot get graphics, or it works only for a while and then suddenly it &amp;quot;can't open display localhost:....&amp;quot;, your X11 graphics connection may have timed out (Macs seem to be particularly prone to this).  You'll have to tell your own computer not to allow, and not to timeout the X11 graphics connection.&lt;br /&gt;
&lt;br /&gt;
The following should fix it. The ssh configuration settings are in a file called /etc/ssh/ssh_config (or /etc/ssh_config in older OS X versions, or $HOME/.ssh/config for specific users). In the config file, find (or create) the section &amp;quot;Host *&amp;quot; (meaning all hosts) and add the following lines:&lt;br /&gt;
&lt;br /&gt;
  Host *&lt;br /&gt;
   ServerAliveInterval 60&lt;br /&gt;
   ServerAliveCountMax 3&lt;br /&gt;
   ForwardX11 yes&lt;br /&gt;
   ForwardX11Trusted yes&lt;br /&gt;
   ForwardX11Timeout 596h&lt;br /&gt;
&lt;br /&gt;
(The &amp;lt;tt&amp;gt;Host *&amp;lt;/tt&amp;gt; is only needed if there was no Host section yet to append these settings to.)&lt;br /&gt;
&lt;br /&gt;
If this does not resolve it, try it again with &amp;quot;ssh -vvv -Y ....&amp;quot;.  The &amp;quot;-vvv&amp;quot; spews out a lot of diagnostic messages. Look for anything resembling a timeout, and let us know (support AT scinet DOT utoronto DOT ca).&lt;br /&gt;
&lt;br /&gt;
===Can't forward X:  &amp;quot;Warning: No xauth data; using fake authentication data&amp;quot;, or &amp;quot;X11 connection rejected because of wrong authentication.&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
I used to be able to forward X11 windows from SciNet to my home machine, but now I'm getting these messages; what's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This very likely means that ssh/xauth can't update your ${HOME}/.Xauthority file. &lt;br /&gt;
&lt;br /&gt;
The simplest possible reason for this is that you've filled your 100GB /home quota and so can't write anything to your home directory.   Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ diskUsage&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
to check to see how close you are to your disk usage on ${HOME}.&lt;br /&gt;
&lt;br /&gt;
Alternately, this could mean your .Xauthority file has become broken/corrupted/confused some how, in which case you can delete that file, and when you next log in you'll get a similar warning message involving creating .Xauthority, but things should work.&lt;br /&gt;
&lt;br /&gt;
===Why am I getting the error &amp;quot;Permission denied (publickey,gssapi-with-mic,password)&amp;quot;?===&lt;br /&gt;
&lt;br /&gt;
In most cases, the &amp;quot;Permission denied&amp;quot; error is caused by incorrect permission of the (hidden) .ssh directory. SSH is used for logging in as well as for the copying of the standard error and output files after a job. &lt;br /&gt;
&lt;br /&gt;
For security reasons, the directory .ssh should only be writable and readable to you, but yours has read permission for everybody, and thus it fails.  You can change this by&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 700 ~/.ssh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
And to be sure, also do&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   chmod 600 ~/.ssh/id_rsa ~/authorized_keys&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Environment ==&lt;br /&gt;
&lt;br /&gt;
===I changed my .bashrc/.bash_profile and now nothing works===&lt;br /&gt;
&lt;br /&gt;
The default startup scripts provided by SciNet, and guidelines for them, can be found [[Bashrc_guidelines|here]].  Certain things - like sourcing &amp;lt;tt&amp;gt;/etc/profile&amp;lt;/tt&amp;gt;&lt;br /&gt;
and &amp;lt;tt&amp;gt;/etc/bashrc&amp;lt;/tt&amp;gt; are ''required'' for various SciNet routines to work!   &lt;br /&gt;
&lt;br /&gt;
If the situation is so bad that you cannot even log in, please send email [mailto:support@scinet.utoronto.ca support].&lt;br /&gt;
&lt;br /&gt;
===Could I have my login shell changed to (t)csh?===&lt;br /&gt;
&lt;br /&gt;
The login shell used on our systems is bash. While the tcsh is available, we do not support it as the default login shell at present.  So &amp;quot;chsh&amp;quot; will not work, but you can always run tcsh interactively. Also, csh scripts will be executed correctly provided that they have the correct &amp;quot;shebang&amp;quot; &amp;lt;tt&amp;gt;#!/bin/tcsh&amp;lt;/tt&amp;gt; at the top.&lt;br /&gt;
&lt;br /&gt;
===Can I work in a Jupyter Notebook?===&lt;br /&gt;
&lt;br /&gt;
Yes, a Niagara Jupyter Hub is available for use.  See [[Jupyter_Hub | this page]] for details.&lt;br /&gt;
&lt;br /&gt;
== Compiling your Code ==&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
===How can I get g77 to work?===&lt;br /&gt;
&lt;br /&gt;
The fortran 77 compilers on the GPC are ifort and gfortran. We have dropped support for g77.  This has been a conscious decision. g77 (and the associated library libg2c) were completely replaced six years ago (Apr 2005) by the gcc 4.x branch, and haven't undergone any updates at all, even bug fixes, for over five years.  &lt;br /&gt;
If we would install g77 and libg2c, we would have to deal with the inevitable confusion caused when users accidentally link against the old, broken, wrong versions of the gcc libraries instead of the correct current versions.   &lt;br /&gt;
&lt;br /&gt;
If your code for some reason specifically requires five-plus-year-old libraries,  availability, compatibility, and unfixed-known-bug problems are only going to get worse for you over time, and this might be as good an opportunity as any to address those issues. &lt;br /&gt;
&lt;br /&gt;
''A note on porting to gfortran or ifort:''&lt;br /&gt;
&lt;br /&gt;
While gfortran and ifort are rather compatible with g77, one &lt;br /&gt;
important difference is that by default, gfortran does not preserve &lt;br /&gt;
local variables between function calls, while g77 does.   Preserved &lt;br /&gt;
local variables are for instance often used in implementations of quasi-random number &lt;br /&gt;
generators.  Proper fortran requires to declare such variables as SAVE &lt;br /&gt;
but not all old code does this.&lt;br /&gt;
Luckily, you can change gfortran's default behavior with the flag &lt;br /&gt;
&amp;lt;tt&amp;gt;-fno-automatic&amp;lt;/tt&amp;gt;.   For ifort, the corresponding flag is &amp;lt;tt&amp;gt;-noautomatic&amp;lt;/tt&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
===Where is libg2c.so?===&lt;br /&gt;
&lt;br /&gt;
libg2c.so is part of the g77 compiler, for which we dropped support. See [[#How can I get g77 to work on the GPC?]] for our reasons.&lt;br /&gt;
&lt;br /&gt;
===Autoparallelization does not work!===&lt;br /&gt;
&lt;br /&gt;
I compiled my code with the &amp;lt;tt&amp;gt;-qsmp=omp,auto&amp;lt;/tt&amp;gt; option, and then I specified that it should be run with 64 threads - with &lt;br /&gt;
 export OMP_NUM_THREADS=64&lt;br /&gt;
&lt;br /&gt;
However, when I check the load using &amp;lt;tt&amp;gt;llq1 -n&amp;lt;/tt&amp;gt;, it shows a load on the node of 1.37.  Why?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Using the autoparallelization will only get you so far.  In fact, it usually does not do too much.  What is helpful is to run the compiler with the &amp;lt;tt&amp;gt;-qreport&amp;lt;/tt&amp;gt; option, and then read the output listing carefully to see where the compiler thought it could parallelize, where it could not, and the reasons for this.  Then you can go back to your code and carefully try to address each of the issues brought up by the compiler.&lt;br /&gt;
We ''emphasize'' that this is just a rough first guide, and that the compilers are still not magical!   For more sophisticated approaches to parallelizing your code, email us at [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;]  to set up an appointment with one&lt;br /&gt;
of our technical analysts.&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===How do I link against the Intel Math Kernel Library?===&lt;br /&gt;
&lt;br /&gt;
If you need to link to the Intel Math Kernal Library (MKL) with the intel compilers, just add the &amp;lt;pre&amp;gt;-mkl&amp;lt;/pre&amp;gt; flag. There are in fact three flavours: &amp;lt;tt&amp;gt;-mkl=sequential&amp;lt;/tt&amp;gt;, &amp;lt;tt&amp;gt;-mkl=parallel&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;-mkl=cluster&amp;lt;/tt&amp;gt;, for the serial version, the threaded version and the mpi version, respectively. (Note: The cluster version is available only when using the intelmpi module and mpi compilation wrappers.)&lt;br /&gt;
&lt;br /&gt;
If you need to link in the Intel Math Kernel Library (MKL) libraries to gcc/gfortran/c++, you are well advised to use the [http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor Intel(R) Math Kernel Library Link Line Advisor] for help in devising the list of libraries to link with your code.&lt;br /&gt;
&lt;br /&gt;
'''''Note that this give the link line for the command line. When using this in Makefiles, replace $MKLPATH by ${MKLROOT}.'''''&lt;br /&gt;
&lt;br /&gt;
'''''Note too that, unless the integer arguments you will be passing to the MKL libraries are actually 64-bit integers, rather than the normal int or INTEGER types, you want to specify 32-bit integers (lp64) .'''''&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;relocation truncated to fit: R_X86_64_PC32&amp;quot;: Huh?===&lt;br /&gt;
&lt;br /&gt;
What does this mean, and why can't I compile this code?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Welcome to the joys of the x86 architecture!  You're probably having trouble building arrays larger than 2GB, individually or together.   Generally, you have to try to use the medium or large x86 `memory model'.   For the intel compilers, this is specified with the compile options&lt;br /&gt;
&lt;br /&gt;
  -mcmodel=medium -shared-intel&lt;br /&gt;
&lt;br /&gt;
===&amp;quot;feupdateenv is not implemented and will always fail&amp;quot;===&lt;br /&gt;
&lt;br /&gt;
How do I get rid of this and what does it mean?&lt;br /&gt;
 &lt;br /&gt;
'''Answer:'''&lt;br /&gt;
First note that, as ominous as it sounds, this is really just a warning, and has to do with the intel math library. You can ignore it (unless you really are trying to manually change the exception handlers for floating point exceptions such as divide by zero), or take the safe road and get rid off it by linking with the intel math functions library:&amp;lt;pre&amp;gt;-limf&amp;lt;/pre&amp;gt;See also [[#How do I link against the Intel Math Kernel Library?]]&lt;br /&gt;
&lt;br /&gt;
===Cannot find rdmacm library when compiling on GPC===&lt;br /&gt;
&lt;br /&gt;
I get the following error building my code on GPC: &amp;quot;&amp;lt;tt&amp;gt;ld: cannot find -lrdmacm&amp;lt;/tt&amp;gt;&amp;quot;.  Where can I find this library?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
This library is part of the MPI libraries; if your compiler is having problems picking it up, it probably means you are mistakenly trying to compile on the login nodes (scinet01..scinet04).  The login nodes aren't part of the GPC; they are for logging into the data centre only.  From there you must go to the GPC or TCS development nodes to do any real work.&lt;br /&gt;
&lt;br /&gt;
=== Why do I get this error when I try to compile: &amp;quot;icpc: error #10001: could not find directory in which /usr/bin/g++41 resides&amp;quot; ?===&lt;br /&gt;
&lt;br /&gt;
You are trying to compile on the login nodes.   As described in the wiki ( https://support.scinet.utoronto.ca/wiki/index.php/GPC_Quickstart#Login ), or in the users guide you would have received with your account,   Scinet supports two main clusters, with very different architectures.  Compilation must be done on the development nodes of the appropriate cluster (in this case, gpc01-04).   Thus, log into gpc01, gpc02, gpc03, or gpc04, and compile from there.&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Testing your Code==&lt;br /&gt;
&lt;br /&gt;
===How can I run MATLAB / IDL / Gaussian / my favourite commercial software at SciNet?===&lt;br /&gt;
&lt;br /&gt;
Because SciNet serves such a disparate group of user communities, there is just no way we can buy licenses for everyone's commercial package.   The only commercial software we have purchased is that which in principle can benefit everyone -- fast compilers and math libraries.&lt;br /&gt;
&lt;br /&gt;
If your research group requires a commercial package that you already have or are willing to buy licenses for, contact us at [mailto:support@scinet.utoronto.ca support@scinet] and we can work together to find out if it is feasible to implement the package's licensing arrangement on the SciNet clusters, and if so, what is the the best way to do it.  Several commercial packages have already been installed, you can see the list [[Commercial_software | here]].&lt;br /&gt;
&lt;br /&gt;
Note that it is important that you contact us before installing commercially licensed software on SciNet machines, even if you have a way to do it in your own directory without requiring sysadmin intervention.   It puts us in a very awkward position if someone is found to be running unlicensed or invalidly licensed software on our systems, so we need to be aware of what is being installed where.&lt;br /&gt;
&lt;br /&gt;
Also note that MATLAB is somewhat of a special case.  See the [[MATLAB]] page for more information.&lt;br /&gt;
&lt;br /&gt;
=== Can I run a something for a short time on the login nodes? ===&lt;br /&gt;
&lt;br /&gt;
I am in the process of playing around with the MPI calls in my code to get it to work. I do a lot of tests and each of them takes a couple of seconds only.  Can I do this on the login nodes?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Yes, as long as it's very brief (a few minutes).   People use the login nodes for their work, and you don't want to bog it down.  Testing a real code can chew up a lot more resources than compiling, etc.&lt;br /&gt;
&lt;br /&gt;
Once you have run some short test jobs, you should run an [[ Slurm#Queues | interactive ]] job and run the tests either in the regular compute queue or using the debug queue that is reserved for this purpose.&lt;br /&gt;
&lt;br /&gt;
=== How do I run a longer (but still shorter than an hour) test job quickly ? ===&lt;br /&gt;
&lt;br /&gt;
On NIagara there is a high turnover short queue called [[ Slurm#Queues | debug ]] that is designed for this purpose.  You can use it by adding &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#SBATCH -p debug&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your submission script.  This is for testing you code only; do not use the debug queue for production runs.&lt;br /&gt;
&lt;br /&gt;
===What does ''code scaling'' mean?===&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Parallel_Speedup|A Performance Primer]]&lt;br /&gt;
&lt;br /&gt;
===What do you mean by ''throughput''?===&lt;br /&gt;
&lt;br /&gt;
Please see [[Introduction_To_Performance#Throughput|A Performance Primer]].&lt;br /&gt;
&lt;br /&gt;
Here is a simple example:&lt;br /&gt;
&lt;br /&gt;
Suppose you need to do 10 computations.  Say each of these runs for 1 day on 40 cores, but they take &amp;quot;only&amp;quot; 18 hours on 80 cores.  What is the fastest way to get all 10 computations done - as 40-core jobs or as 80-core jobs?  Let us assume you have 2 nodes at your disposal. The answer, after some simple arithmetic, is that running your 10 jobs as 40-core jobs will take 5 days, whereas if you ran them as 80-core jobs it would take 7.5 days.  Take your own conclusions...&lt;br /&gt;
&lt;br /&gt;
==Submitting your jobs==&lt;br /&gt;
&lt;br /&gt;
=== How do I charge jobs to my RAC allocation? ===&lt;br /&gt;
&lt;br /&gt;
Please see the [[Slurm#Slurm_Accounts | accounting section of Slurm page]].&lt;br /&gt;
&lt;br /&gt;
===How can I automatically resubmit a job?===&lt;br /&gt;
&lt;br /&gt;
Commonly you may have a job that you know will take longer to run than what is permissible in the queue.  As long as your program contains [[Checkpoints|checkpoint]] or restart capability, you can have one job automatically submit the next. In the following example it is assumed that the program finishes before the 24 hour limit and then resubmits itself by logging into one of the login nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --time=24:00:00&lt;br /&gt;
#SBATCH --job-name my_job&lt;br /&gt;
&lt;br /&gt;
# DIRECTORY TO RUN - $PBS_O_WORKDIR is directory job was submitted from&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
# YOUR CODE HERE&lt;br /&gt;
./run_my_code&lt;br /&gt;
&lt;br /&gt;
# RESUBMIT 10 TIMES HERE&lt;br /&gt;
num=$NUM&lt;br /&gt;
if [ &amp;quot;$num&amp;quot; -lt 10 ]; then&lt;br /&gt;
      num=$(($num+1))&lt;br /&gt;
      ssh nia-login01 &amp;quot;cd $SLURM_SUBMIT_DIR; sbatch script_name.sh --export=NUM=$num&amp;quot;;&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch script_name.sh --export=NUM=0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can alternatively use [[ Slurm#Job_dependencies | Job dependencies ]] through the queuing system which will not start one job until another job has completed.&lt;br /&gt;
&lt;br /&gt;
If your job can't be made to automatically stop before the 24 hour queue window, but it does write out checkpoints, you can use the timeout command to stop the program while you still have time to resubmit; for instance&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    timeout 1410m ./run_my_code argument1 argument2&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will run the program for 23.5 hours (1410 minutes), and then send it SIGTERM to exit the program.&lt;br /&gt;
&lt;br /&gt;
===How can I pass in arguments to my submission script?===&lt;br /&gt;
&lt;br /&gt;
If you wish to make your scripts more generic you can use SLURM's ability to pass in environment variables to pass in arguments to your script.  See [[Niagara_Quickstart#Passing_Variables_to_submission_scripts | this page]].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
===I submit my GPC job, and I get an email saying it was rejected===&lt;br /&gt;
&lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues and is rejected. An email&lt;br /&gt;
is sent with the JOBID, JOBNAME, and the reason it was rejected.  The following is an example where a job&lt;br /&gt;
requests more than 48 hours and was rejected.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462493.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'wclimit too high for class 'batch_ib' (345600 &amp;gt; 172800)'&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Jobs on the TCS or GPC may only run for 48 hours at a time; this restriction greatly increases responsiveness of the queue and queue throughput for all our users.  If your computation requires longer than that, as many do, you will have to [[ Checkpoints | checkpoint ]] your job and restart it after each 48-hour queue window.   You can manually re-submit jobs, or if you can have your job cleanly exit before the 48 hour window, there are ways to [[ FAQ#How_can_I_automatically_resubmit_a_job.3F | automatically resubmit jobs ]].&lt;br /&gt;
&lt;br /&gt;
Other rejections return a more cryptic error saying &amp;quot;job violates class configuration&amp;quot; such as follows:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
PBS Job Id: 3462409.gpc-sched&lt;br /&gt;
Job Name:   STDIN&lt;br /&gt;
job deleted&lt;br /&gt;
Job deleted at request of root@gpc-sched&lt;br /&gt;
MOAB_INFO:  job was rejected - job violates class configuration 'user required by class 'batch''&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The most common problems that result in this error are:&lt;br /&gt;
&lt;br /&gt;
* '''Incorrect number of processors per node''': Jobs on the GPC are scheduled per-node not per-core and since each node has 8 processor cores (ppn=8) the smallest job allowed is one node with 8 cores (nodes=1:ppn=8).  For serial jobs users must bundle or batch them together in groups of 8. See [[ FAQ#How_do_I_run_serial_jobs_on_GPC.3F | How do I run serial jobs on GPC? ]]&lt;br /&gt;
* '''No number of nodes specified''': Jobs submitted to the main queue must request a specific number of nodes, either in the submission script (with a line like &amp;lt;tt&amp;gt;#PBS -l nodes=2:ppn=8&amp;lt;/tt&amp;gt;) or on the command line (eg, &amp;lt;tt&amp;gt;qsub -l nodes=2:ppn=8,walltime=5:00:00 script.pbs&amp;lt;/tt&amp;gt;).  Note that for the debug queue, you can get away without specifying a number of nodes and a default of one will be assigned; for both technical and policy reasons, we do not enforce such a default for the main (&amp;quot;batch&amp;quot;) queue.&lt;br /&gt;
* '''There is a 15 minute walltime minimum''' on all queues except debug and if you set your walltime less than this, it will be rejected.&lt;br /&gt;
&lt;br /&gt;
=== When submitting your job, fails saying: &amp;quot;script is written in DOS/Windows text format&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
Very likely you have written your script in a windows machine, so to fix this you just need to change the format of you submission script to unix from Windows/DOS.&lt;br /&gt;
Use the command below for all your script files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
dos2unix &amp;lt;pbs-script-file&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;pbs-script-file&amp;gt; has to be substituted by the name of your script file.&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Scheduling and Priority ==&lt;br /&gt;
&lt;br /&gt;
=== Why did &amp;lt;code&amp;gt;squeue --start&amp;lt;/code&amp;gt; say it would take 3 hours for my job to start before, and now it says my job will start in 10 hours? ===&lt;br /&gt;
&lt;br /&gt;
Please look at the [[FAQ#How_do_priorities_work.2Fwhy_did_that_job_jump_ahead_of_mine_in_the_queue.3F | How do priorities work/why did that job jump ahead of mine in the queue? ]] page.&lt;br /&gt;
&lt;br /&gt;
===How do priorities work/why did that job jump ahead of mine in the queue?===&lt;br /&gt;
&lt;br /&gt;
The [[Slurm | queueing system]] used on SciNet machines is a [http://en.wikipedia.org/wiki/Priority_queue Priority Queue].  Jobs enter the queue at the back of the queue, and slowly make their way to the front as those ahead of them are run; but a job that enters the queue with a higher priority can `cut in line'.&lt;br /&gt;
&lt;br /&gt;
The main factor which determines priority is whether or not the user (or his PI) has an [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions Compute Canada RAC allocation].  These are competitively allocated grants of computer time; there is a call for proposals in the fall of every calendar year.    Users with an allocation have higher priorities in an attempt to make sure that they can use the amount of computer time the committees granted them.   Their priority decreases as they approach their allotted usage over the current window of time; by the time that they have exhausted that allotted usage, their priority is the same as users with no allocation (&amp;quot;RAS&amp;quot;, or `default' users).    Default users have a fixed, low, priority.&lt;br /&gt;
&lt;br /&gt;
This priority system is called `fairshare'; the scheduler attempts to make sure everyone has their fair share of the machines, where the share that's fair has been determined by the allocation committee.    The fairshare window is a rolling window of one week; that is, any time you have a job in the queue, the fairshare calculation of its priority is given by how much of your allocation of the machine has been used in the last 7 days.&lt;br /&gt;
&lt;br /&gt;
A particular allocation might have some fraction of Niagara - say 4% of the machine (if the PI had been allocated 2400 core-years on Niagara). The allocations have labels; (called `Resource Allocation Proposal Identifiers', or RAPIs) they look something like&lt;br /&gt;
&lt;br /&gt;
  rrg-abc-ab&lt;br /&gt;
&lt;br /&gt;
where rrg (or rpp) indicates an allocation, abc is the group name, and the suffix specifies which of the allocations granted to the PI is to be used.  These can be specified on a job-by-job basis.  On Niagara, one adds the line&lt;br /&gt;
 #SBATCH -A RAPI&lt;br /&gt;
to your script. If the allocation to charge isn't specified, a default is used; each user has such a default, which can be changed at the [https://portal.scinet.utoronto.ca SciNet portal] where one changes one's password.&lt;br /&gt;
&lt;br /&gt;
A job's priority is determined primarily by the fairshare priority of the allocation it is being charged to; the previous 7 days worth of use under that allocation is calculated and compared to the allocated fraction (here, 4%) of the machine over that window (here, 7 days).   The fairshare priority is a decreasing function of the allocation left; if there is no allocation left (eg, jobs running under that allocation have already used 403,200 CPU hours (2400 cores x 7 days x 24 hours) in the past 7 days), the priority is the same as that of a user with no granted allocation.   (This last part has been the topic of some debate; as the machine gets more utilized, it will probably be the case that we allow RAC users who have greatly overused their quota to have their priorities to drop below that of unallocated users, to give the unallocated users some chance to run on our increasingly crowded system; this would have no undue effect on our allocated users as they still would be able to use the amount of resources they had been allocated by the committees.)   Note that all jobs charging the same allocation get the same fairshare priority.&lt;br /&gt;
&lt;br /&gt;
There are other factors that go into calculating priority, but fairshare is the most significant.   Other factors include&lt;br /&gt;
* length of time waiting in queue (measured in units of the requested runtime). A waiting queue job gains priority as it sits in the queue to avoid job starvation. &lt;br /&gt;
* User adjustment of priorities ( See below ).&lt;br /&gt;
&lt;br /&gt;
The major effect of these subdominant terms is to shuffle the order of jobs running under the same allocation.&lt;br /&gt;
&lt;br /&gt;
===How do we manage job priorities within our research group?===&lt;br /&gt;
&lt;br /&gt;
Obviously, managing shared resources within a large group - whether it is conference funding or CPU time - takes some doing.   &lt;br /&gt;
&lt;br /&gt;
It's important to note that the fairshare periods are intentionally kept quite short - just one week long. So, for example, let us say that in your resource allocation you have about 10% of the machine.   Then for someone to use up the whole one week amount of time in one day, he'd have to use 70% of the machine in that one day - which is unlikely to happen by accident.  If that does happen, those using the same allocation as the person who used 70% of the machine over the one day will suffer by having much lower priority for their jobs, but only for the next 6 days - and even then, if there are idle CPUs they'll still be able to compute.&lt;br /&gt;
&lt;br /&gt;
There are online tools, both the [https://ccdb.computecanada.ca CCDB] and [https://my.scinet.utoronto.ca my.SciNet], for seeing how the allocation is being used, and those people who are in charge in your group will be able to use that information to manage users, telling them to dial it down or up.   We know that managing a large research group is hard, and we want to make sure we provide you the information you need to do your job effectively.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- One way for users within a group to manage their priorities within the group is with [[Moab#Adjusting_Job_Priority | user-adjusted priorities]]; this is described in more detail on the [[Moab | Scheduling System]] page. --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Running your jobs==&lt;br /&gt;
&lt;br /&gt;
===My job can't write to /home===&lt;br /&gt;
&lt;br /&gt;
My code works fine when I test on the Niagara login nodes, but when I submit a job it fails.  What's wrong?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
As [[Niagara_Quickstart#Data_Management | discussed elsewhere]], &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; is mounted read-only on the compute nodes; you can only write to &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; from the login nodes.  In general, to run jobs you can read from &amp;lt;tt&amp;gt;/home&amp;lt;/tt&amp;gt; but you'll have to write to &amp;lt;tt&amp;gt;/scratch&amp;lt;/tt&amp;gt; (or, if you were allocated space through the RAC process, on &amp;lt;tt&amp;gt;/project&amp;lt;/tt&amp;gt;).  More information on SciNet filesystems can be found on our [[Data_Management | Data Management]] page.&lt;br /&gt;
&lt;br /&gt;
===Can I can use hybrid codes consisting of MPI and openMP ?===&lt;br /&gt;
&lt;br /&gt;
Yes.&lt;br /&gt;
&lt;br /&gt;
===How do I run serial jobs?===&lt;br /&gt;
&lt;br /&gt;
Niagara is a parallel computing resource, and SciNet's priority will always be parallel jobs.   Having said that, if you can make efficient use of the resources using serial jobs and get good science done, that's good too, and we're happy to help you.&lt;br /&gt;
&lt;br /&gt;
The Niagara nodes each have 40 processing cores, and making efficient use of these nodes means using all forty cores.  As a result, users must run multiples of 40 jobs at a time.  &lt;br /&gt;
&lt;br /&gt;
It depends on the nature of your job what the best strategy is. Several approaches are presented on the [[Running_Serial_Jobs_on_Niagara | serial page]].&lt;br /&gt;
&lt;br /&gt;
===Why can't I request only a single cpu for my job on Niagara?===&lt;br /&gt;
&lt;br /&gt;
On Niagara jobs are allocated by whole node - that is, in chunks of 40 processors.   If you want to run a job that requires only one processor, you need to bundle the jobs into groups of 40, so as to not be wasting the other 39 cores. See the [[Running_Serial_Jobs_on_Niagara | serial run page]] for more information on how this is accomplished.&lt;br /&gt;
&lt;br /&gt;
If you are unable to bundle your jobs into groups of 40, you should consider running on the [https://docs.computecanada.ca/wiki/Graham Graham] or [https://docs.computecanada.ca/wiki/Cedar Cedar] Compute Canada machines instead of Niagara.&lt;br /&gt;
&lt;br /&gt;
===How do I use the ramdisk on Niagara?===&lt;br /&gt;
&lt;br /&gt;
To use the ramdisk, create and read to/write from files in /dev/shm/.. just as one would to (eg) ${SCRATCH}. Only the amount of RAM needed to store the files will be taken up by the temporary file system; thus if you have 40 serial jobs each requiring 1 GB of RAM, and 1GB is taken up by various OS services, you would still have approximately 160GB available to use as ramdisk on a ~202GB node. However, if you were to write 8 GB of data to the RAM disk, this would exceed available memory and your job would likely crash.&lt;br /&gt;
&lt;br /&gt;
It is very important to delete your files from ramdisk at the end of your job. If you do not do this, the next user to use that node will have less RAM available than she might expect, and this might kill her job.&lt;br /&gt;
&lt;br /&gt;
More details on how to setup your script to use the ramdisk can be found on the [[User_Ramdisk|Ramdisk page]].&lt;br /&gt;
&lt;br /&gt;
=== How can I run a job longer than 24 hours? ===&lt;br /&gt;
&lt;br /&gt;
The Niagara queue has a queue limit of 24 hours for groups with a [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions RAC allocation], and 12 hours for those groups which do not.   This is pretty typical for systems of its size; larger systems commonly have shorter run limits.   The limits are there to ensure that every user gets a fair share of the system (so that no one user ties up lots of nodes for a long time), and for safety (so that if one memory board in one node fails in the middle of a very long job, you haven't lost a months worth of work).&lt;br /&gt;
&lt;br /&gt;
Since many of us have simulations that require more than that much time, most widely-used scientific applications have &amp;quot;checkpoint-restart&amp;quot; functionality, where every so often the complete state of the calculation is stored as a checkpoint file, and one can restart a simulation from one of these.   In fact, these restart files tend to be quite useful for a number of purposes.&lt;br /&gt;
&lt;br /&gt;
If your job will take longer, you will have to submit your job in multiple parts, restarting from a checkpoint each time.  In this way, one can run a simulation much longer than the queue limit.  In fact, one can even write job scripts which automatically re-submit themselves until a run is completed, using [[#How_can_I_automatically_resubmit_a_job.3F | automatic resubmission. ]]&lt;br /&gt;
&lt;br /&gt;
==Errors in running jobs==&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
=== I couldn't find the  .o output file in the .pbs_spool directory as I used to ===&lt;br /&gt;
&lt;br /&gt;
On Feb 24 2011, the temporary location of standard input and output files was moved from the shared file system ${SCRATCH}/.pbs_spool to the&lt;br /&gt;
node-local directory /var/spool/torque/spool (which resides in ram). The final location after a job has finished is unchanged,&lt;br /&gt;
but to check the output/error of running jobs, users will now have to ssh into the (first) node assigned to the job and look in&lt;br /&gt;
/var/spool/torque/spool.&lt;br /&gt;
&lt;br /&gt;
This alleviates access contention to the temporary directory, especially for those users that are running a lot of jobs, and  reduces the burden on the file system in general.&lt;br /&gt;
&lt;br /&gt;
Note that it is good practice to redirect output to a file rather than to count on the scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
=== My GPC job died, telling me `Copy Stageout Files Failed' ===&lt;br /&gt;
&lt;br /&gt;
When a job runs on GPC, the script's standard output and error are redirected to &lt;br /&gt;
&amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.OU&amp;lt;/tt&amp;gt; and &amp;lt;tt&amp;gt;$PBS_JOBID.gpc-sched.ER&amp;lt;/tt&amp;gt; in&lt;br /&gt;
/var/spool/torque/spool on the (first) node on which your job is running.  At the end of the job, those .OU and .ER files are copied to where the batch script tells them to be copied, by default &amp;lt;tt&amp;gt;$PBS_JOBNAME.o$PBS_JOBID&amp;lt;/tt&amp;gt; and&amp;lt;tt&amp;gt;$PBS_JOBNAME.e$PBS_JOBID&amp;lt;/tt&amp;gt;.   (You can set those filenames to be something clearer with the -e and -o options in your PBS script.)&lt;br /&gt;
&lt;br /&gt;
When you get errors like this:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
An error has occurred processing your job, see below.&lt;br /&gt;
request to copy stageout files failed on node&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
it means that the copying back process has failed in some way.  There could be a few reasons for this. The first thing to '''make sure that your .bashrc does not produce any output''', as the output-stageout is performed by bash and further output can cause this to fail.&lt;br /&gt;
But it also could have just been a random filesystem error, or it  could be that your job failed spectacularly enough to shortcircuit the normal job-termination process (e.g. ran out of memory very quickly) and those files just never got copied.&lt;br /&gt;
&lt;br /&gt;
Write to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;] if your input/output files got lost, as we will probably be able to retrieve them for you (please supply at least the jobid, and any other information that may be relevant). &lt;br /&gt;
&lt;br /&gt;
Mind you that it is good practice to redirect output to a file rather than depending on the job scheduler to do this for you.&lt;br /&gt;
&lt;br /&gt;
===Another transport will be used instead===&lt;br /&gt;
&lt;br /&gt;
I get error messages like the following when running on the GPC at the start of the run, although the job seems to proceed OK.   Is this a problem?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[[45588,1],0]: A high-performance Open MPI point-to-point messaging module&lt;br /&gt;
was unable to find any relevant network interfaces:&lt;br /&gt;
&lt;br /&gt;
Module: OpenFabrics (openib)&lt;br /&gt;
  Host: gpc-f101n005&lt;br /&gt;
&lt;br /&gt;
Another transport will be used instead, although this may result in&lt;br /&gt;
lower performance.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
Everything's fine.   The two MPI libraries scinet provides work for both the InifiniBand and the Gigabit Ethernet interconnects, and will always try to use the fastest interconnect available.   In this case, you ran on normal gigabit GPC nodes with no infiniband; but the MPI libraries have no way of knowing this, and try the infiniband first anyway.  This is just a harmless `failover' message; it tried to use the infiniband, which doesn't exist on this node, then fell back on using Gigabit ethernet (`another transport').&lt;br /&gt;
&lt;br /&gt;
With OpenMPI, this can be avoided by not looking for infiniband; eg, by using the option&lt;br /&gt;
&lt;br /&gt;
--mca btl ^openib&lt;br /&gt;
&lt;br /&gt;
===IB Memory Errors, eg &amp;lt;tt&amp;gt; reg_mr Cannot allocate memory &amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
Infiniband requires more memory than ethernet; it can use RDMA (remote direct memory access) transport for which it sets aside registered memory to transfer data.&lt;br /&gt;
&lt;br /&gt;
In our current network configuration, it requires a _lot_ more memory, particularly as you go to larger process counts; unfortunately, that means you can't get around the &amp;quot;I need more memory&amp;quot; problem the usual way, by running on more nodes.   Machines with different memory or &lt;br /&gt;
network configurations may exhibit this problem at higher or lower MPI &lt;br /&gt;
task counts.&lt;br /&gt;
&lt;br /&gt;
Right now, the best workaround is to reduce the number and size of OpenIB queues, using XRC: with the OpenMPI, add the following options to your mpirun command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-mca btl_openib_receive_queues X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32 -mca btl_openib_max_send_size 12288&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With Intel MPI, you should be able to do&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load intelmpi/4.0.3.008&lt;br /&gt;
mpirun -genv I_MPI_FABRICS=shm:ofa  -genv I_MPI_OFA_USE_XRC=1 -genv I_MPI_OFA_DYNAMIC_QPS=1 -genv I_MPI_DEBUG=5 -np XX ./mycode&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
to the same end.  &lt;br /&gt;
&lt;br /&gt;
For more information see [[GPC MPI Versions]].&lt;br /&gt;
&lt;br /&gt;
===My compute job fails, saying &amp;lt;tt&amp;gt;libpng12.so.0: cannot open shared object file&amp;lt;/tt&amp;gt; or &amp;lt;tt&amp;gt;libjpeg.so.62: cannot open shared object file&amp;lt;/tt&amp;gt;===&lt;br /&gt;
&lt;br /&gt;
To maximize the amount of memory available for compute jobs, the compute nodes have a less complete system image than the development nodes.   In particular, since interactive graphics libraries like matplotlib and gnuplot are usually used interactively, the libraries for their use are included in the devel nodes' image but not the compute nodes.&lt;br /&gt;
&lt;br /&gt;
Many of these extra libraries are, however, available in the &amp;quot;extras&amp;quot; module.   So adding a &amp;quot;module load extras&amp;quot; to your job submission  script - or, for overkill, to your .bashrc - should enable these scripts to run on the compute nodes.&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==Monitoring jobs in the queue==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
===Why hasn't my job started?===&lt;br /&gt;
&lt;br /&gt;
Use the moab command &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
checkjob -v jobid&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
and the last couple of lines should explain why a job hasn't started.  &lt;br /&gt;
&lt;br /&gt;
Please see [[Moab| Job Scheduling System (Moab) ]] for more detailed information&lt;br /&gt;
&lt;br /&gt;
===How do I figure out when my job will run?===&lt;br /&gt;
&lt;br /&gt;
Please see [[Moab#Available_Resources| Job Scheduling System (Moab) ]]&lt;br /&gt;
&lt;br /&gt;
===My GPC job is Held, and checkjob says &amp;quot;Batch:PolicyViolation&amp;quot; ===&lt;br /&gt;
&lt;br /&gt;
When this happens, you'll see your job stuck in a BatchHold state.  &lt;br /&gt;
This happens because the job you've submitted breaks one of the rules of the queues, and is being held until you modify it or kill it and re-submit a conforming job.  The most common problems are:&lt;br /&gt;
&lt;br /&gt;
===Running checkjob on my job gives me messages about JobFail and rejected===&lt;br /&gt;
&lt;br /&gt;
Running checkjob on my job gives me messages that suggest my job has failed, as below: what did I do wrong?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
AName: test&lt;br /&gt;
State: Idle &lt;br /&gt;
Creds:  user:xxxxxx  group:xxxxxxxx  account:xxxxxxxx  class:batch_ib  qos:ibqos&lt;br /&gt;
WallTime:   00:00:00 of 8:00:00&lt;br /&gt;
BecameEligible: Wed Jul 23 10:39:27&lt;br /&gt;
SubmitTime: Wed Jul 23 10:38:22&lt;br /&gt;
  (Time Queued  Total: 00:01:47  Eligible: 00:01:05)&lt;br /&gt;
&lt;br /&gt;
Total Requested Tasks: 8&lt;br /&gt;
&lt;br /&gt;
Req[0]  TaskCount: 8  Partition: ALL  &lt;br /&gt;
Opsys: centos6computeA  Arch: ---  Features: ---&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Notification Events: JobFail&lt;br /&gt;
&lt;br /&gt;
IWD:            /scratch/x/xxxxxxxx/xxxxxxx/xxxxxxx&lt;br /&gt;
Partition List: torque,DDR&lt;br /&gt;
Flags:          RESTARTABLE&lt;br /&gt;
Attr:           checkpoint&lt;br /&gt;
StartPriority:  76&lt;br /&gt;
rejected for Opsys        - (null)&lt;br /&gt;
rejected for State        - (null)&lt;br /&gt;
rejected for Reserved     - (null)&lt;br /&gt;
NOTE:  job req cannot run in partition torque (available procs do not meet requirements : 0 of 8 procs found)&lt;br /&gt;
idle procs: 793  feasible procs:   0&lt;br /&gt;
&lt;br /&gt;
Node Rejection Summary: [Opsys: 117][State: 2895][Reserved: 19]&lt;br /&gt;
&lt;br /&gt;
NOTE:  job violates constraints for partition SANDY (partition SANDY not in job partition mask)&lt;br /&gt;
&lt;br /&gt;
NOTE:  job violates constraints for partition GRAVITY (partition GRAVITY not in job partition mask)&lt;br /&gt;
&lt;br /&gt;
rejected for State        - (null)&lt;br /&gt;
NOTE:  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
The output from check job is a little cryptic in places, and if you are wondering why your job hasn't started yet, you might think that &amp;quot;rejection&amp;quot; and &amp;quot;JobFail&amp;quot; suggest that there's something wrong.  But the above message is actually normal; you can use the &amp;lt;tt&amp;gt;showstart&amp;lt;/tt&amp;gt; command on your job to get a (preliminary, subject to change) estimate as to when the job will start, and you'll find that it is in fact scheduled to start up in the near future.&lt;br /&gt;
&lt;br /&gt;
In the above message:&lt;br /&gt;
&lt;br /&gt;
* `Notification Events: JobFail` just means that, if notifications are enabled, you'll get a message if the job fails;&lt;br /&gt;
* `job req cannot run in partition torque` just means that the job cannot run just yet (that's why it's queued);&lt;br /&gt;
* `job req cannot run in dynamic partition DDR now (insufficient procs available: 0 &amp;lt; 8)` says why: there aren't processors available; and&lt;br /&gt;
* `job violates constraints for partition SANDY/GRAVITY` just means that the job isn't eligable to run in those paritcular (small) sections of the cluster.&lt;br /&gt;
&lt;br /&gt;
that is, the above output is the normal and expected (if somewhat cryptic) explanation as to why the job is waiting - nothing to worry about.&lt;br /&gt;
&lt;br /&gt;
===How can I monitor my running jobs on TCS?===&lt;br /&gt;
&lt;br /&gt;
How can I monitor the load of TCS jobs?&lt;br /&gt;
&lt;br /&gt;
'''Answer:'''&lt;br /&gt;
&lt;br /&gt;
You can get more information with the command &lt;br /&gt;
 /xcat/tools/tcs-scripts/LL/jobState.sh&lt;br /&gt;
which I alias as:&lt;br /&gt;
 alias llq1='/xcat/tools/tcs-scripts/LL/jobState.sh'&lt;br /&gt;
If you run &amp;quot;llq1 -n&amp;quot; you will see a listing of jobs together with a lot of information, including the load.&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===How can I check the memory usage from my jobs?===&lt;br /&gt;
&lt;br /&gt;
In many occasions it can be really useful to take a look at how much memory your job is using while it is running.&lt;br /&gt;
There a couple of ways to do so:&lt;br /&gt;
&lt;br /&gt;
1) using some of the [[SciNet_Command_Line_Utilities | command line utilities]] we have developed, e.g: by using the [[Slurm#jobperf | jobperf]] utility, will allow you to check the job performance and head's node utilization respectively.&lt;br /&gt;
&lt;br /&gt;
2) [[Slurm#SSHing_to_a_node | SSH into the nodes]] where your job is being run and check for memory usage and system stats right there. For instance, trying the 'top' or 'free' commands, on those nodes.&lt;br /&gt;
&lt;br /&gt;
Also, it always a good a idea and strongly encouraged to inspect the output generated for your job submissions.&lt;br /&gt;
The output file is named ''JobName-jobIdNumber.out''; where ''JobName'' is the name you gave to the job (via the '--job-name' Slurm flag) and ''JobIdNumber'' is the id number of the job.  If no job name is given, then the JobName will be &amp;quot;slurm&amp;quot;.&lt;br /&gt;
This file is saved in the working directory after the job is finished.&lt;br /&gt;
&lt;br /&gt;
Other related topics to memory usage:&lt;br /&gt;
* [[User_Ramdisk | Using Ram Disk]]&lt;br /&gt;
* [[FAQ#Monitoring_jobs_in_the_queue | Monitoring Jobs in the Queue]]&lt;br /&gt;
* [https://wiki.scinet.utoronto.ca/wiki/images/a/a0/TechTalkJobMonitoring.pdf Tech Talk on Monitoring Jobs]&lt;br /&gt;
&lt;br /&gt;
===Can I run cron jobs on login nodes to monitor my jobs?===&lt;br /&gt;
&lt;br /&gt;
No, we do not permit cron jobs to be run by users.  To monitor the status of your jobs using a cron job running on your own machine, use the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh myusername@niagara.scinet.utoronto.ca &amp;quot;squeue -u myusername&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or some variation of this command.  Of course, you will need to have [[SSH_keys | SSH keys]] set up on the machine running the cron job, so that password entry won't be necessary.&lt;br /&gt;
&lt;br /&gt;
=== How does one check the amount of used CPU-hours in a project, and how does one get statistics for each user in the project? ===&lt;br /&gt;
&lt;br /&gt;
This information is available on the [https://portal.scinet.utoronto.ca SciNet portal], See also [[SciNet Usage Reports]].&lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== How do I compute the core-years usage of my code? ===&lt;br /&gt;
&lt;br /&gt;
The &amp;quot;core-years&amp;quot; quantity is a way to account for the time your code runs, by considering the total number of cores and time used, accounting for the total number of hours in a year.&lt;br /&gt;
For instance if your code uses ''HH'' hours, in ''NN'' nodes, where each node has ''CC'' cores, then &amp;quot;core-years&amp;quot; can be computed as follow:&lt;br /&gt;
&lt;br /&gt;
''HH*(NN*CC)/(365*24)''&lt;br /&gt;
&lt;br /&gt;
If you have several independent instances (batches) running on different nodes, with ''BB'' number of batches and each batch during ''HH'' hours, then your core-years usage can be computed as,&lt;br /&gt;
&lt;br /&gt;
''BB*HH*(NN*CC)/(365*24)''&lt;br /&gt;
&lt;br /&gt;
As a general rule, the Niagara system, each node has only 40 cores, so ''CC'' will be always 40.&lt;br /&gt;
&lt;br /&gt;
=== How much have I been running? ===&lt;br /&gt;
&lt;br /&gt;
You can get information about your SciNet resource usage by visiting the [[SciNet_Usage_Reports| SciNet Usage Reports]] page.  The [https://ccdb.computecanada.ca CCDB] and [https://my.scinet.utoronto.ca my.SciNet] sites contain similar information.&lt;br /&gt;
&lt;br /&gt;
==Data on SciNet disks==&lt;br /&gt;
&lt;br /&gt;
===How do I find out my disk usage?===&lt;br /&gt;
&lt;br /&gt;
The standard Unix/Linux utilities for finding the amount of disk space used by a directory are very slow, and notoriously inefficient on the GPFS filesystems that we run on the SciNet systems.  There are utilities that very quickly report your disk usage:&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes and datamovers, provides information in a number of ways on the home, scratch, and project file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time.&lt;br /&gt;
This information is updated every 3-hours!&lt;br /&gt;
&lt;br /&gt;
More information about these filesystems is available at the [[Data_Management | Data Management page]].&lt;br /&gt;
&lt;br /&gt;
===How do I transfer data to/from SciNet?===&lt;br /&gt;
&lt;br /&gt;
All incoming connections to SciNet go through relatively low-speed connections to the &amp;lt;tt&amp;gt;niagara.scinet&amp;lt;/tt&amp;gt; gateways, so using scp to copy files the same way you ssh in is not an effective way to move lots of data.  Better tools are described in our page on [[Data_Management#Moving_data | Moving data]].&lt;br /&gt;
&lt;br /&gt;
===My group works with data files of size 1-2 GB.  Is this too large to  transfer by scp to niagara.scinet.utoronto.ca ?===&lt;br /&gt;
&lt;br /&gt;
Generally, occasion transfers of data less than 10GB is perfectly acceptable to so through the login nodes. See [[Data_Management#Moving_data | Moving data]].&lt;br /&gt;
&lt;br /&gt;
===How can I check if I have files in /scratch that are scheduled for automatic deletion?===&lt;br /&gt;
&lt;br /&gt;
Please see [[Data_Management#Scratch_Disk_Purging_Policy | Scratch Disk Purging Policy]]&lt;br /&gt;
&lt;br /&gt;
===How to allow my supervisor to manage files for me using ACL-based commands?===&lt;br /&gt;
&lt;br /&gt;
Please see [[Data_Management#File.2FOwnership_Management_.28ACL.29 | File/Ownership Management]]&lt;br /&gt;
&lt;br /&gt;
===Can I transfer files between BGQ and HPSS?===&lt;br /&gt;
&lt;br /&gt;
Yes, however for now you'll need to do this in 2 steps:&lt;br /&gt;
* transfer from BGQ to Niagara SCRATCH&lt;br /&gt;
* then from Niagara SCRATCH to HPSS&lt;br /&gt;
&lt;br /&gt;
== Miscellaneous ==&lt;br /&gt;
&lt;br /&gt;
===How do I acknowledge SciNet?===&lt;br /&gt;
&lt;br /&gt;
Visit our [[Acknowledging_SciNet | Acknowledging SciNet ]] page for direction on how to thank us.&lt;br /&gt;
&lt;br /&gt;
==Keep 'em Coming!==&lt;br /&gt;
&lt;br /&gt;
===Next question, please===&lt;br /&gt;
&lt;br /&gt;
Send your question to [mailto:support@scinet.utoronto.ca &amp;lt;support@scinet.utoronto.ca&amp;gt;];  we'll answer it asap!&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=SciNet_Command_Line_Utilities&amp;diff=1379</id>
		<title>SciNet Command Line Utilities</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=SciNet_Command_Line_Utilities&amp;diff=1379"/>
		<updated>2018-08-16T15:02:57Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Jobs &amp;amp; Queues */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;__TOC__&lt;br /&gt;
&lt;br /&gt;
Below is list of handy command-line utilities written by SciNet to help manage your data and compute.&lt;br /&gt;
&amp;lt;!-- Currently, on the GPC you need to have the '''extras''' module loaded for these to work.--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the table, optional arguments are denoted with square brackets (i.e. ''''[]''''), while mandatory arguments are denoted with angular brackets (i.e. ''''&amp;lt;&amp;gt;'''').&lt;br /&gt;
&lt;br /&gt;
= Storage =&lt;br /&gt;
{| class = &amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Command&lt;br /&gt;
!Arguments&lt;br /&gt;
!Description&lt;br /&gt;
!Cluster&lt;br /&gt;
|-&lt;br /&gt;
|quota&lt;br /&gt;
|&lt;br /&gt;
|Short overview of a user's storage usage.&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|diskUsage&lt;br /&gt;
|''-h'' to see all command options&lt;br /&gt;
|Informs about the user and group file system usage.&lt;br /&gt;
|Niagara, BGQ&lt;br /&gt;
|-&lt;br /&gt;
|topUserDirOver1000list&lt;br /&gt;
|&lt;br /&gt;
|Lists your directories that have over 1,000 files&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|topUserDirOver1GBlist &lt;br /&gt;
|&lt;br /&gt;
|Lists your directories that have over 1 GB of data&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Jobs &amp;amp; Queues =&lt;br /&gt;
{| class = &amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Command&lt;br /&gt;
!Arguments&lt;br /&gt;
!Description&lt;br /&gt;
!Cluster&lt;br /&gt;
|-&lt;br /&gt;
|qsum&lt;br /&gt;
|most squeue arguments work&lt;br /&gt;
|Lists jobs running or in the queue, grouped by user.&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|llq2&lt;br /&gt;
|&lt;br /&gt;
|Detailed information on jobs that are actively running.&lt;br /&gt;
|BGQ&lt;br /&gt;
|-&lt;br /&gt;
|scinet niagara priority&lt;br /&gt;
|&lt;br /&gt;
|Compute usage in the last 7 days, and how this affects your priority in the queue.&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|nodeperf&lt;br /&gt;
|[''userName'']&lt;br /&gt;
|Who is doing what on the current node?&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|[[Slurm#jobperf | jobperf]]&lt;br /&gt;
| &amp;lt;''jobId''&amp;lt;nowiki&amp;gt; | &amp;lt;/nowiki&amp;gt;''jobName''&amp;gt;&lt;br /&gt;
|Informs about the performance of all nodes of a given job.&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|[[Slurm#Debug | debugjob]]&lt;br /&gt;
|[''number of nodes'']&lt;br /&gt;
|Requests a time-limited interactive session of up to 4 dedicated nodes&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Modules =&lt;br /&gt;
{|class = &amp;quot;wikitable&amp;quot;&lt;br /&gt;
!|Command&lt;br /&gt;
!| Arguments&lt;br /&gt;
!| Description&lt;br /&gt;
!| Cluster&lt;br /&gt;
|-&lt;br /&gt;
|ml&lt;br /&gt;
|&lt;br /&gt;
|&amp;quot;module list&amp;quot;&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|ml&lt;br /&gt;
|&amp;lt;modulename&amp;gt;&lt;br /&gt;
|&amp;quot;module load &amp;lt;modulename&amp;gt;&amp;quot;&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|ml&lt;br /&gt;
|X&lt;br /&gt;
|&amp;quot;module X&amp;quot;&lt;br /&gt;
|Niagara&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Related Topics =&lt;br /&gt;
[[Slurm#Monitoring_jobs | Monitoring jobs]]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
[[Niagara_Quickstart | Niagara Quickstart]]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
[[Slurm | Slurm Scheduler]]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
[[FAQ#Monitoring_jobs_in_the_queue | Monitoring Jobs]]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
[https://wiki.scinet.utoronto.ca/wiki/images/a/a0/TechTalkJobMonitoring.pdf Tech Talk on Monitoring Jobs]&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Modules_specific_to_Niagara&amp;diff=1378</id>
		<title>Modules specific to Niagara</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Modules_specific_to_Niagara&amp;diff=1378"/>
		<updated>2018-08-16T15:01:14Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;width:85%&amp;quot;&lt;br /&gt;
! style=&amp;quot;width: 25%&amp;quot; align=&amp;quot;center&amp;quot; | Module&lt;br /&gt;
! style=&amp;quot;width: 15%&amp;quot; align=&amp;quot;center&amp;quot; | Versions&lt;br /&gt;
! style=&amp;quot;width: 15%&amp;quot; align=&amp;quot;center&amp;quot; | Documentation&lt;br /&gt;
! align=&amp;quot;center&amp;quot; | Description&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.continuum.io/anaconda-overview anaconda2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform  that empowers companies to adopt a modern open data science analytics architecture. &amp;lt;br /&amp;gt;Homepage: https://www.continuum.io/anaconda-overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.continuum.io/anaconda-overview anaconda3]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform  that empowers companies to adopt a modern open data science analytics architecture. &amp;lt;br /&amp;gt;Homepage: https://www.continuum.io/anaconda-overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://arma.sourceforge.net/ armadillo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.500.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Armadillo is an open-source C++ linear algebra library (matrix maths) aiming towards  a good balance between speed and ease of use. Integer, floating point and complex numbers are supported,  as well as a subset of trigonometric and statistics functions.&amp;lt;br /&amp;gt;Homepage: http://arma.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ arpack-ng]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ aspect]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.0.0 2.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://autotools.io autotools]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This bundle collect the standard GNU build tools: Autoconf, Automake and libtool&amp;lt;br /&amp;gt;Homepage: http://autotools.io &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://bazel.io/ bazel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.11.1 0.15.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Bazel is a build tool that builds code quickly and reliably.  It is used to build the majority of Google's software.&amp;lt;br /&amp;gt;Homepage: http://bazel.io/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://blast.ncbi.nlm.nih.gov/ blast+]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.7.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Basic Local Alignment Search Tool, or BLAST, is an algorithm  for comparing primary biological sequence information, such as the amino-acid  sequences of different proteins or the nucleotides of DNA sequences.&amp;lt;br /&amp;gt;Homepage: http://blast.ncbi.nlm.nih.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.boost.org/ boost]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.63.0 1.66.0 1.67.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Boost provides free peer-reviewed portable C++ source libraries.&amp;lt;br /&amp;gt;Homepage: http://www.boost.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.boost.org/ boost-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.67.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Boost provides free peer-reviewed portable C++ source libraries.&amp;lt;br /&amp;gt;Homepage: http://www.boost.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://code.zmaw.de/projects/cdo cdo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data.&amp;lt;br /&amp;gt;Homepage: https://code.zmaw.de/projects/cdo &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://heasarc.gsfc.nasa.gov/fitsio/ cfitsio]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.430&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CFITSIO is a library of C and Fortran subroutines for reading and writing data files in FITS (Flexible Image Transport System) data format.&amp;lt;br /&amp;gt;Homepage: http://heasarc.gsfc.nasa.gov/fitsio/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://cmake.org/ cmake]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.10.3 3.11.0 3.11.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   CMake, the cross-platform, open-source build system.  CMake is a family of  tools designed to build, test and package software. &amp;lt;br /&amp;gt;Homepage: https://cmake.org/&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.cp2k.org/ cp2k]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CP2K is a freely available (GPL) program, written in Fortran 95, to perform atomistic and molecular  simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different  methods such as e.g. density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and  classical pair and many-body potentials. &amp;lt;br /&amp;gt;Homepage: http://www.cp2k.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ dakota-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.7.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge ddt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 18.1.2 18.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ARM's HPC development tools: Distributed Debugging Tool and MAP Profiler&amp;lt;br /&amp;gt;Homepage: https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://eigen.tuxfamily.org/index.php?title=Main_Page eigen]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Eigen is a C++ template library for linear algebra:  matrices, vectors, numerical solvers, and related algorithms.&amp;lt;br /&amp;gt;Homepage: http://eigen.tuxfamily.org/index.php?title=Main_Page &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://elpa.rzg.mpg.de elpa]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2016.05.003 2017.05.003 2018.05.001&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Eigenvalue SoLvers for Petaflop-Applications .&amp;lt;br /&amp;gt;Homepage: http://elpa.rzg.mpg.de &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://fenicsproject.org/ fenics]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.2.0 2018.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FEniCS Project is a collection of free and open-source software components with the common goal to enable automated solution of differential equations. The components provide scientific computing tools for working with computational meshes, finite-element variational formulations of ordinary and partial differential equations, and numerical linear algebra.&amp;lt;br /&amp;gt;Homepage: https://fenicsproject.org/&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fetk.org/ fetk]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Finite Element ToolKit (FETK) is a collaboratively developed, evolving collection of adaptive finite element method (AFEM) software libraries and tools for solving coupled systems of nonlinear geometric partial differential equations (PDE).&amp;lt;br /&amp;gt;Homepage: http://www.fetk.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.ffmpeg.org/ ffmpeg]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.4.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: A complete, cross-platform solution to record, convert and stream audio and video.&amp;lt;br /&amp;gt;Homepage: https://www.ffmpeg.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fftw.org fftw]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FFTW is a C subroutine library for computing the discrete Fourier transform (DFT)  in one or more dimensions, of arbitrary input size, and of both real and complex data.&amp;lt;br /&amp;gt;Homepage: http://www.fftw.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fftw.org fftw-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FFTW is a C subroutine library for computing the discrete Fourier transform (DFT)  in one or more dimensions, of arbitrary input size, and of both real and complex data.&amp;lt;br /&amp;gt;Homepage: http://www.fftw.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://fmtlib.net/ fmt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.2 4.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: fmt (formerly cppformat) is an open-source formatting library.&amp;lt;br /&amp;gt;Homepage: http://fmtlib.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ foam-extend]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://hboehm.info/gc/ gc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.6.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Boehm-Demers-Weiser conservative garbage collector can be used as a garbage collecting   replacement for C malloc or C++ new.&amp;lt;br /&amp;gt;Homepage: http://hboehm.info/gc/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gcc.gnu.org/ gcc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.3.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada,  as well as libraries for these languages (libstdc++, libgcj,...).&amp;lt;br /&amp;gt;Homepage: http://gcc.gnu.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/gdb/gdb.html gdb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Project Debugger&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/gdb/gdb.html &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://git-scm.com/ git]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.16.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.&amp;lt;br /&amp;gt;Homepage: http://git-scm.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gtk.org/ glib]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.3 2.22.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GLib is one of the base libraries of the GTK+ project&amp;lt;br /&amp;gt;Homepage: http://www.gtk.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://rashmikumari.github.io/g_mmpbsa/ g_mmpbsa]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: calculates components of binding energy using MM-PBSA method except the entropic term and energetic contribution of each residue to the binding using energy decomposition scheme.&amp;lt;br /&amp;gt;Homepage: https://rashmikumari.github.io/g_mmpbsa/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gmplib.org/ gmp]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GMP is a free library for arbitrary precision arithmetic,  operating on signed integers, rational numbers, and floating point numbers. &amp;lt;br /&amp;gt;Homepage: http://gmplib.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.gnu.org/software/parallel/ gnu-parallel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 20180322&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Tools for running commands in parallel on one or more nodes.&amp;lt;br /&amp;gt;Homepage: https://www.gnu.org/software/parallel/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gnuplot.sourceforge.net/ gnuplot]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.2.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Portable interactive, function plotting utility&amp;lt;br /&amp;gt;Homepage: http://gnuplot.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/google/googletest googletest]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   Welcome to Google Test, Google-s C++ test framework!  Please see the project page for more information as well as the mailing list for questions, discussions, and development. There is also an IRC channel on OFTC.  Getting started information for Google Test is available in the Google Test Primer documentation.  Google Mock is an extension to Google Test for writing and using C++ mock classes. See the separate Google Mock documentation.  More detailed documentation for googletest (including build instructions) are in its interior googletest/README.md file. &amp;lt;br /&amp;gt;Homepage: https://github.com/google/googletest &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gromacs.org gromacs]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2016.5 2016.5-plumed-2.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  GROMACS is a versatile package to perform molecular dynamics,  i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. This is CPU only build, containing both MPI and threadMPI builds. &amp;lt;br /&amp;gt;Homepage: http://www.gromacs.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/gsl/ gsl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers.  The library provides a wide range of mathematical routines such as random number generators, special functions  and least-squares fitting.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/gsl/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/guile guile]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.2.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Guile is the GNU Ubiquitous Intelligent Language for Extensions,  the official extension language for the GNU operating system.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/guile &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Harminv harminv]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Harminv is a free program (and accompanying library) to solve the problem of harmonic inversion -  given a discrete-time, finite-length signal that consists of a sum of finitely-many sinusoids (possibly exponentially  decaying) in a given bandwidth, it determines the frequencies, decay constants, amplitudes, and phases of those  sinusoids.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Harminv &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://support.hdfgroup.org/HDF5/ hdf5]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.2 1.8.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HDF5 is a data model, library, and file format for storing and managing data.  It supports an unlimited variety of datatypes, and is designed for flexible  and efficient I/O and for high volume and complex data.&amp;lt;br /&amp;gt;Homepage: https://support.hdfgroup.org/HDF5/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://support.hdfgroup.org/HDF5/ hdf5-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.2 1.8.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HDF5 is a data model, library, and file format for storing and managing data.  It supports an unlimited variety of datatypes, and is designed for flexible  and efficient I/O and for high volume and complex data.&amp;lt;br /&amp;gt;Homepage: https://support.hdfgroup.org/HDF5/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://ccb.jhu.edu/software/hisat2/index.shtml hisat2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads  (both DNA and RNA) against the general human population (as well as against a single reference genome).&amp;lt;br /&amp;gt;Homepage: https://ccb.jhu.edu/software/hisat2/index.shtml &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ hpn-ssh]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.7p1-14v15&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.x.org/ imake]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.0.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: imake is a Makefile-generator that is intended to make it easier to develop software  portably for multiple systems.&amp;lt;br /&amp;gt;Homepage: http://www.x.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://software.intel.com/en-us/intel-parallel-studio-xe intel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Compiler toolchain including Intel compilers and Intel Math Kernel Library (MKL).&amp;lt;br /&amp;gt;Homepage: https://software.intel.com/en-us/intel-parallel-studio-xe &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://software.intel.com/en-us/intel-parallel-studio-xe intelmpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Intel's (MPICH-based) MPI implementation.&amp;lt;br /&amp;gt;Homepage: https://software.intel.com/en-us/intel-parallel-studio-xe &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ intelpython2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2018.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ intelpython3]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2018.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://java.com/ java]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8.0_162&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Java Platform, Standard Edition (Java SE) lets you develop and deploy  Java applications on desktops and servers.&amp;lt;br /&amp;gt;Homepage: http://java.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://lammps.sandia.gov/ lammps]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 11May2018 22Sep2017&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) is  a classical molecular dynamics simulation. LAMMPS has potentials for solid-state materials  (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or  mesoscopic systems. It can be used to model atoms or, more generically, as a parallel  particle simulator at the atomic, meso, or continuum scale. It can be coupled to various  programs. The following packages are not included within this version:   -KIM, -MSCG, -KOKKOS, -USER-QUIP, -USER-INTEL, -USER-VTK&amp;lt;br /&amp;gt;Homepage: http://lammps.sandia.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/ivmai/libatomic_ops libatomic_ops]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.6.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This package provides semi-portable access to hardware-provided atomic memory update operations on a number of architectures.&amp;lt;br /&amp;gt;Homepage: https://github.com/ivmai/libatomic_ops &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.hyperrealm.com/libconfig/ libconfig]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.7.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libconfig is a simple library for processing structured configuration files&amp;lt;br /&amp;gt;Homepage: http://www.hyperrealm.com/libconfig/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/libctl libctl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0.1 4.1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: libctl is a free Guile-based library implementing flexible control files for scientific simulations.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/libctl &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://sourceware.org/libffi/ libffi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.2.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The libffi library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run-time.&amp;lt;br /&amp;gt;Homepage: http://sourceware.org/libffi/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://sourceforge.net/p/libint/ libint]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.5 1.1.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libint library is used to evaluate the traditional (electron repulsion) and certain novel two-body  matrix elements (integrals) over Cartesian Gaussian functions used in modern atomic and molecular theory.&amp;lt;br /&amp;gt;Homepage: https://sourceforge.net/p/libint/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/libunistring/ libunistring]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.9.9&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This library provides functions for manipulating Unicode strings and for manipulating C strings  according to the Unicode standard.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/libunistring/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.tddft.org/programs/octopus/wiki/index.php/Libxc libxc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.0 4.0.3 4.0.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libxc is a library of exchange-correlation functionals for density-functional theory.  The aim is to provide a portable, well tested and reliable set of exchange and correlation functionals.&amp;lt;br /&amp;gt;Homepage: http://www.tddft.org/programs/octopus/wiki/index.php/Libxc &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://xmlsoft.org/ libxslt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.32&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libxslt is the XSLT C library developed for the GNOME project  (but usable outside of the Gnome platform).&amp;lt;br /&amp;gt;Homepage: http://xmlsoft.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ libyaml]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.1.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: LibYAML is a YAML parser and emitter written in C.&amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://symas.com/lmdb lmdb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.9.22&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: LMDB is a fast, memory-efficient database. With memory-mapped files, it has the read performance  of a pure in-memory database while retaining the persistence of standard disk-based databases.&amp;lt;br /&amp;gt;Homepage: https://symas.com/lmdb &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.linuxfromscratch.org/blfs/view/svn/x/makedepend.html makedepend]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.0.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The makedepend package contains a C-preprocessor like utility to determine build-time dependencies.&amp;lt;br /&amp;gt;Homepage: http://www.linuxfromscratch.org/blfs/view/svn/x/makedepend.html &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mathworks.com/products/compiler/mcr/ mcr]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | R2018a&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The MATLAB Runtime is a standalone set of shared libraries  that enables the execution of compiled MATLAB applications  or components on computers that do not have MATLAB installed.&amp;lt;br /&amp;gt;Homepage: http://www.mathworks.com/products/compiler/mcr/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Meep meep]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.3 1.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package  developed at MIT to model electromagnetic systems.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Meep &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Meep meep-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.3 1.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package  developed at MIT to model electromagnetic systems.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Meep &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://mercurial.selenic.com/ mercurial]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Mercurial is a free, distributed source control management tool. It efficiently handles projects of any size and offers an easy and intuitive interface. &amp;lt;br /&amp;gt;Homepage: http://mercurial.selenic.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://glaros.dtc.umn.edu/gkhome/metis/metis/overview metis]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: METIS is a set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices. The algorithms implemented in METIS are based on the multilevel recursive-bisection, multilevel k-way, and multi-constraint partitioning schemes.&amp;lt;br /&amp;gt;Homepage: http://glaros.dtc.umn.edu/gkhome/metis/metis/overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://software.intel.com/en-us/intel-mkl/ mkl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.2.lua 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Intel's Math Kernel Library, for use with gcc modules (the intel modules include mkl already).&amp;lt;br /&amp;gt;Homepage: http://software.intel.com/en-us/intel-mkl/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/MPB mpb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.6.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: MPB is a free and open-source software package for computing electromagnetic band structures and modes.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/MPB&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/MPB mpb-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.6.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: MPB is a free and open-source software package for computing electromagnetic band structures and modes.&amp;lt;br /&amp;gt;Homepage:  http://ab-initio.mit.edu/wiki/index.php/MPB&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mpfr.org mpfr]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The MPFR library is a C library for multiple-precision   floating-point computations with correct rounding.&amp;lt;br /&amp;gt;Homepage: http://www.mpfr.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.ncl.ucar.edu ncl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NCL is an interpreted language designed specifically for scientific data analysis and visualization.&amp;lt;br /&amp;gt;Homepage: http://www.ncl.ucar.edu &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/nco/pynco nco]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.7.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Python bindings for NCO&amp;lt;br /&amp;gt;Homepage: https://github.com/nco/pynco &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.unidata.ucar.edu/software/netcdf/ netcdf]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.0 4.6.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NetCDF (network Common Data Form) is a set of software libraries   and machine-independent data formats that support the creation, access, and sharing of array-oriented   scientific data.&amp;lt;br /&amp;gt;Homepage: http://www.unidata.ucar.edu/software/netcdf/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.unidata.ucar.edu/software/netcdf/ netcdf-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.6.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NetCDF (network Common Data Form) is a set of software libraries   and machine-independent data formats that support the creation, access, and sharing of array-oriented   scientific data.&amp;lt;br /&amp;gt;Homepage: http://www.unidata.ucar.edu/software/netcdf/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/octave/ octave]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GNU Octave is a high-level interpreted language, primarily intended for numerical computations.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/octave/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://xianyi.github.com/OpenBLAS/ openblas]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.2.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.&amp;lt;br /&amp;gt;Homepage: http://xianyi.github.com/OpenBLAS/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.openfoam.com/ openfoam]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 17.12 4.1 5.0 5.0-debug&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: OpenFOAM is a free, open source CFD software package.   OpenFOAM has an extensive range of features to solve anything from complex fluid flows  involving chemical reactions, turbulence and heat transfer,   to solid dynamics and electromagnetics.&amp;lt;br /&amp;gt;Homepage: http://www.openfoam.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.open-mpi.org/ openmpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.7 2.1.3 3.0.1 3.0.2 3.1.0 3.1.0rc3 3.1.0rc4 3.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Open MPI Project is an open source MPI-2 implementation.&amp;lt;br /&amp;gt;Homepage: http://www.open-mpi.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.paraview.org paraview]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ParaView is a scientific parallel visualizer.&amp;lt;br /&amp;gt;Homepage: http://www.paraview.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ perf-reports]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 18.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.perl.org/ perl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.20.3 5.26.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Larry Wall's Practical Extraction and Report Language&amp;lt;br /&amp;gt;Homepage: http://www.perl.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mcs.anl.gov/petsc petsc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.8.4 3.8.4-debug 3.9.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: PETSc, pronounced PET-see (the S is silent), is a suite of data structures and routines for the  scalable (parallel) solution of scientific applications modeled by partial differential equations.&amp;lt;br /&amp;gt;Homepage: http://www.mcs.anl.gov/petsc &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.astro.caltech.edu/~tjp/pgplot/ pgplot]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.2.2 5.2.2-x&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The PGPLOT Graphics Subroutine Library is a Fortran- or C-callable, device-independent graphics package for making simple scientific graphs. It is intended for making graphical images of publication quality with minimum effort on the part of the user. For most applications, the program can be device-independent, and the output can be directed to the appropriate device at run time.&amp;lt;br /&amp;gt;Homepage: http://www.astro.caltech.edu/~tjp/pgplot/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.cog-genomics.org/plink/1.9/ plink]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.07&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: plink-1.9-x86_64: Whole-genome association analysis toolset&amp;lt;br /&amp;gt;Homepage: https://www.cog-genomics.org/plink/1.9/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.plumed-code.org plumed]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: PLUMED is an open source library for free energy calculations in molecular systems which  works together with some of the most popular molecular dynamics engines. Free energy calculations can be  performed as a function of many order parameters with a particular  focus on biological problems, using  state of the art methods such as metadynamics, umbrella sampling and Jarzynski-equation based steered MD.  The software, written in C++, can be easily interfaced with both fortran and C/C++ codes. &amp;lt;br /&amp;gt;Homepage: http://www.plumed-code.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://trac.mcs.anl.gov/projects/parallel-netcdf pnetcdf]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Parallel netCDF: A Parallel I/O Library for NetCDF File Access&amp;lt;br /&amp;gt;Homepage: https://trac.mcs.anl.gov/projects/parallel-netcdf &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://python.org/ python]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.7.14 3.6.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Python is a programming language that lets you work more quickly and integrate your systems  more effectively.&amp;lt;br /&amp;gt;Homepage: http://python.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.qhull.org/ qhull]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2015.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   Qhull computes the convex hull, Delaunay triangulation, Voronoi diagram,  halfspace intersection about a point, furthest-site Delaunay triangulation,  and furthest-site Voronoi diagram. The source code runs in 2-d, 3-d, 4-d, and  higher dimensions. Qhull implements the Quickhull algorithm for computing the  convex hull. &amp;lt;br /&amp;gt;Homepage: http://www.qhull.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.quantum-espresso.org/ quantum-espresso]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.2.1 6.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Quantum ESPRESSO is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.&amp;lt;br /&amp;gt;Homepage: https://www.quantum-espresso.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.r-project.org/ r]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1 3.5.0 4.0.1 R2018a&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: R is a free software environment for statistical computing and graphics.&amp;lt;br /&amp;gt;Homepage: http://www.r-project.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/vanzonr/rarray rarray]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: rarray is a C++ library for multidimensional arrays. It is a header-only implementation that uses templates, which allows most compilers to generate fast code.&amp;lt;br /&amp;gt;Homepage: https://github.com/vanzonr/rarray &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://rsync.samba.org/ rsync]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.1.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Utility that provides fast incremental file transfer; this module provides a newer version than is present in the operating system.&amp;lt;br /&amp;gt;Homepage: https://rsync.samba.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.rust-lang.org rust]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.26.1 1.28.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Rust is a systems programming language that runs blazingly fast, prevents segfaults,  and guarantees thread safety.&amp;lt;br /&amp;gt;Homepage: https://www.rust-lang.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.htslib.org/ samtools]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SAM Tools provide various utilities for manipulating alignments in the SAM format,   including sorting, merging, indexing and generating alignments in a per-position format.&amp;lt;br /&amp;gt;Homepage: http://www.htslib.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.netlib.org/scalapack/ scalapack]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.0.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The ScaLAPACK (or Scalable LAPACK) library includes a subset of LAPACK routines  redesigned for distributed memory MIMD parallel computers.&amp;lt;br /&amp;gt;Homepage: http://www.netlib.org/scalapack/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.scons.org/ scons]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SCons is a software construction tool.&amp;lt;br /&amp;gt;Homepage: http://www.scons.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.shengbte.org/ shengbte]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ShengBTE is a software package for solving the Boltzmann Transport Equation for phonons.  Also installed is the 'thirdorder' package of Python scripts.&amp;lt;br /&amp;gt;Homepage: http://www.shengbte.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://wci.llnl.gov/codes/silo/ silo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.10.2-bsd&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Silo is a library for reading and writing a wide variety of scientific data to binary, disk files&amp;lt;br /&amp;gt;Homepage: https://wci.llnl.gov/codes/silo/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://singularity.lbl.gov singularity]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.5.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Singularity is a portable application stack packaging and runtime utility.&amp;lt;br /&amp;gt;Homepage: http://singularity.lbl.gov &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://spglib.sourceforge.net/ spglib]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Spglib is a C library for finding and handling crystal symmetries.&amp;lt;br /&amp;gt;Homepage: http://spglib.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.sqlite.org/ sqlite]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.23.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SQLite: SQL Database Engine in a C Library&amp;lt;br /&amp;gt;Homepage: https://www.sqlite.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://su2.stanford.edu su2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.0.0 6.0.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: An open-source collection of software tools written in C++ for performing Partial Differential Equation (PDE) analysis and solving PDE-constrained optimization problems. The toolset is designed with computational fluid dynamics and aerodynamic shape optimization in mind.&amp;lt;br /&amp;gt;Homepage: http://su2.stanford.edu &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://subversion.apache.org/ subversion]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  Subversion is an open source version control system.&amp;lt;br /&amp;gt;Homepage: http://subversion.apache.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://computation.llnl.gov/projects/sundials sundials]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.6.0 2.7.0 3.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers&amp;lt;br /&amp;gt;Homepage: http://computation.llnl.gov/projects/sundials &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://computation.llnl.gov/projects/sundials sundials-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers&amp;lt;br /&amp;gt;Homepage: http://computation.llnl.gov/projects/sundials &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.swig.org/ swig]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.12&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SWIG is a software development tool that connects programs written in C and C++ with  a variety of high-level programming languages.&amp;lt;br /&amp;gt;Homepage: http://www.swig.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ thirdorder]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://trilinos.sandia.gov/ trilinos]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 12.12.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Trilinos Project is an effort to develop algorithms and enabling technologies  within an object-oriented software framework for the solution of large-scale, complex multi-physics  engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.&amp;lt;br /&amp;gt;Homepage: http://trilinos.sandia.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://valgrind.org valgrind]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.13.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Valgrind: Debugging and profiling tools&amp;lt;br /&amp;gt;Homepage: http://valgrind.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://wci.llnl.gov/simulation/computer-codes/visit visit]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.13.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  VisIt is an Open Source, interactive, scalable,     visualization, animation and analysis tool.&amp;lt;br /&amp;gt;Homepage: https://wci.llnl.gov/simulation/computer-codes/visit &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.ks.uiuc.edu/Research/vmd vmd]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.4a12&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular     systems using 3-D graphics and built-in scripting.&amp;lt;br /&amp;gt;Homepage: http://www.ks.uiuc.edu/Research/vmd &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.vtk.org vtk]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Visualization Toolkit (VTK) is an open-source, freely available software system for  3D computer graphics, image processing and visualization. VTK consists of a C++ class library and several  interpreted interface layers including Tcl/Tk, Java, and Python. VTK supports a wide variety of visualization  algorithms including: scalar, vector, tensor, texture, and volumetric methods; and advanced modeling techniques  such as: implicit modeling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation.&amp;lt;br /&amp;gt;Homepage: http://www.vtk.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Modules_specific_to_Niagara&amp;diff=1377</id>
		<title>Modules specific to Niagara</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Modules_specific_to_Niagara&amp;diff=1377"/>
		<updated>2018-08-16T15:00:33Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;width:85%&amp;quot;&lt;br /&gt;
! style=&amp;quot;width: 25%&amp;quot; align=&amp;quot;center&amp;quot; | Module&lt;br /&gt;
! style=&amp;quot;width: 15%&amp;quot; align=&amp;quot;center&amp;quot; | Versions&lt;br /&gt;
! style=&amp;quot;width: 15%&amp;quot; align=&amp;quot;center&amp;quot; | Documentation&lt;br /&gt;
! align=&amp;quot;center&amp;quot; | Description&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.continuum.io/anaconda-overview anaconda2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform  that empowers companies to adopt a modern open data science analytics architecture. &amp;lt;br /&amp;gt;Homepage: https://www.continuum.io/anaconda-overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.continuum.io/anaconda-overview anaconda3]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform  that empowers companies to adopt a modern open data science analytics architecture. &amp;lt;br /&amp;gt;Homepage: https://www.continuum.io/anaconda-overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://arma.sourceforge.net/ armadillo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.500.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Armadillo is an open-source C++ linear algebra library (matrix maths) aiming towards  a good balance between speed and ease of use. Integer, floating point and complex numbers are supported,  as well as a subset of trigonometric and statistics functions.&amp;lt;br /&amp;gt;Homepage: http://arma.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ arpack-ng]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ aspect]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.0.0 2.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://autotools.io autotools]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This bundle collect the standard GNU build tools: Autoconf, Automake and libtool&amp;lt;br /&amp;gt;Homepage: http://autotools.io &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://bazel.io/ bazel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.11.1 0.15.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Bazel is a build tool that builds code quickly and reliably.  It is used to build the majority of Google's software.&amp;lt;br /&amp;gt;Homepage: http://bazel.io/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://blast.ncbi.nlm.nih.gov/ blast+]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.7.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Basic Local Alignment Search Tool, or BLAST, is an algorithm  for comparing primary biological sequence information, such as the amino-acid  sequences of different proteins or the nucleotides of DNA sequences.&amp;lt;br /&amp;gt;Homepage: http://blast.ncbi.nlm.nih.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.boost.org/ boost]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.63.0 1.66.0 1.67.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Boost provides free peer-reviewed portable C++ source libraries.&amp;lt;br /&amp;gt;Homepage: http://www.boost.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.boost.org/ boost-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.67.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Boost provides free peer-reviewed portable C++ source libraries.&amp;lt;br /&amp;gt;Homepage: http://www.boost.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://code.zmaw.de/projects/cdo cdo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data.&amp;lt;br /&amp;gt;Homepage: https://code.zmaw.de/projects/cdo &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://heasarc.gsfc.nasa.gov/fitsio/ cfitsio]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.430&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CFITSIO is a library of C and Fortran subroutines for reading and writing data files in FITS (Flexible Image Transport System) data format.&amp;lt;br /&amp;gt;Homepage: http://heasarc.gsfc.nasa.gov/fitsio/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://cmake.org/ cmake]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.10.3 3.11.0 3.11.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   CMake, the cross-platform, open-source build system.  CMake is a family of  tools designed to build, test and package software. &amp;lt;br /&amp;gt;Homepage: https://cmake.org/&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.cp2k.org/ cp2k]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CP2K is a freely available (GPL) program, written in Fortran 95, to perform atomistic and molecular  simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different  methods such as e.g. density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and  classical pair and many-body potentials. &amp;lt;br /&amp;gt;Homepage: http://www.cp2k.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ dakota-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.7.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge ddt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 18.1.2 18.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ARM's HPC development tools: Distributed Debugging Tool and MAP Profiler&amp;lt;br /&amp;gt;Homepage: https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://eigen.tuxfamily.org/index.php?title=Main_Page eigen]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Eigen is a C++ template library for linear algebra:  matrices, vectors, numerical solvers, and related algorithms.&amp;lt;br /&amp;gt;Homepage: http://eigen.tuxfamily.org/index.php?title=Main_Page &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://elpa.rzg.mpg.de elpa]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2016.05.003 2017.05.003 2018.05.001&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Eigenvalue SoLvers for Petaflop-Applications .&amp;lt;br /&amp;gt;Homepage: http://elpa.rzg.mpg.de &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://fenicsproject.org/ fenics]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.2.0 2018.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FEniCS Project is a collection of free and open-source software components with the common goal to enable automated solution of differential equations. The components provide scientific computing tools for working with computational meshes, finite-element variational formulations of ordinary and partial differential equations, and numerical linear algebra.&amp;lt;br /&amp;gt;Homepage: https://fenicsproject.org/&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fetk.org/ fetk]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Finite Element ToolKit (FETK) is a collaboratively developed, evolving collection of adaptive finite element method (AFEM) software libraries and tools for solving coupled systems of nonlinear geometric partial differential equations (PDE).&amp;lt;br /&amp;gt;Homepage: http://www.fetk.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.ffmpeg.org/ ffmpeg]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.4.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: A complete, cross-platform solution to record, convert and stream audio and video.&amp;lt;br /&amp;gt;Homepage: https://www.ffmpeg.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fftw.org fftw]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FFTW is a C subroutine library for computing the discrete Fourier transform (DFT)  in one or more dimensions, of arbitrary input size, and of both real and complex data.&amp;lt;br /&amp;gt;Homepage: http://www.fftw.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fftw.org fftw-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FFTW is a C subroutine library for computing the discrete Fourier transform (DFT)  in one or more dimensions, of arbitrary input size, and of both real and complex data.&amp;lt;br /&amp;gt;Homepage: http://www.fftw.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://fmtlib.net/ fmt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.2 4.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: fmt (formerly cppformat) is an open-source formatting library.&amp;lt;br /&amp;gt;Homepage: http://fmtlib.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ foam-extend]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://hboehm.info/gc/ gc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.6.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Boehm-Demers-Weiser conservative garbage collector can be used as a garbage collecting   replacement for C malloc or C++ new.&amp;lt;br /&amp;gt;Homepage: http://hboehm.info/gc/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gcc.gnu.org/ gcc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.3.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada,  as well as libraries for these languages (libstdc++, libgcj,...).&amp;lt;br /&amp;gt;Homepage: http://gcc.gnu.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/gdb/gdb.html gdb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Project Debugger&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/gdb/gdb.html &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://git-scm.com/ git]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.16.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.&amp;lt;br /&amp;gt;Homepage: http://git-scm.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gtk.org/ glib]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.3 2.22.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GLib is one of the base libraries of the GTK+ project&amp;lt;br /&amp;gt;Homepage: http://www.gtk.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://rashmikumari.github.io/g_mmpbsa/ g_mmpbsa]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: calculates components of binding energy using MM-PBSA method except the entropic term and energetic contribution of each residue to the binding using energy decomposition scheme.&amp;lt;br /&amp;gt;Homepage: https://rashmikumari.github.io/g_mmpbsa/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gmplib.org/ gmp]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GMP is a free library for arbitrary precision arithmetic,  operating on signed integers, rational numbers, and floating point numbers. &amp;lt;br /&amp;gt;Homepage: http://gmplib.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.gnu.org/software/parallel/ gnu-parallel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 20180322&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Tools for running commands in parallel on one or more nodes.&amp;lt;br /&amp;gt;Homepage: https://www.gnu.org/software/parallel/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gnuplot.sourceforge.net/ gnuplot]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.2.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Portable interactive, function plotting utility&amp;lt;br /&amp;gt;Homepage: http://gnuplot.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/google/googletest googletest]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   Welcome to Google Test, Google-s C++ test framework!  Please see the project page for more information as well as the mailing list for questions, discussions, and development. There is also an IRC channel on OFTC.  Getting started information for Google Test is available in the Google Test Primer documentation.  Google Mock is an extension to Google Test for writing and using C++ mock classes. See the separate Google Mock documentation.  More detailed documentation for googletest (including build instructions) are in its interior googletest/README.md file. &amp;lt;br /&amp;gt;Homepage: https://github.com/google/googletest &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gromacs.org gromacs]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2016.5 2016.5-plumed-2.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  GROMACS is a versatile package to perform molecular dynamics,  i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. This is CPU only build, containing both MPI and threadMPI builds. &amp;lt;br /&amp;gt;Homepage: http://www.gromacs.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/gsl/ gsl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers.  The library provides a wide range of mathematical routines such as random number generators, special functions  and least-squares fitting.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/gsl/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/guile guile]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.2.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Guile is the GNU Ubiquitous Intelligent Language for Extensions,  the official extension language for the GNU operating system.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/guile &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Harminv harminv]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Harminv is a free program (and accompanying library) to solve the problem of harmonic inversion -  given a discrete-time, finite-length signal that consists of a sum of finitely-many sinusoids (possibly exponentially  decaying) in a given bandwidth, it determines the frequencies, decay constants, amplitudes, and phases of those  sinusoids.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Harminv &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://support.hdfgroup.org/HDF5/ hdf5]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.2 1.8.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HDF5 is a data model, library, and file format for storing and managing data.  It supports an unlimited variety of datatypes, and is designed for flexible  and efficient I/O and for high volume and complex data.&amp;lt;br /&amp;gt;Homepage: https://support.hdfgroup.org/HDF5/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://support.hdfgroup.org/HDF5/ hdf5-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.2 1.8.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HDF5 is a data model, library, and file format for storing and managing data.  It supports an unlimited variety of datatypes, and is designed for flexible  and efficient I/O and for high volume and complex data.&amp;lt;br /&amp;gt;Homepage: https://support.hdfgroup.org/HDF5/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://ccb.jhu.edu/software/hisat2/index.shtml hisat2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads  (both DNA and RNA) against the general human population (as well as against a single reference genome).&amp;lt;br /&amp;gt;Homepage: https://ccb.jhu.edu/software/hisat2/index.shtml &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ hpn-ssh]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.7p1-14v15&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.x.org/ imake]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.0.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: imake is a Makefile-generator that is intended to make it easier to develop software  portably for multiple systems.&amp;lt;br /&amp;gt;Homepage: http://www.x.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://software.intel.com/en-us/intel-parallel-studio-xe intel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Compiler toolchain including Intel compilers and Intel Math Kernel Library (MKL).&amp;lt;br /&amp;gt;Homepage: https://software.intel.com/en-us/intel-parallel-studio-xe &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://software.intel.com/en-us/intel-parallel-studio-xe intelmpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Intel's (MPICH-based) MPI implementation.&amp;lt;br /&amp;gt;Homepage: https://software.intel.com/en-us/intel-parallel-studio-xe &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ intelpython2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2018.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ intelpython3]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2018.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://java.com/ java]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8.0_162&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Java Platform, Standard Edition (Java SE) lets you develop and deploy  Java applications on desktops and servers.&amp;lt;br /&amp;gt;Homepage: http://java.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://lammps.sandia.gov/ lammps]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 11May2018 22Sep2017&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) is  a classical molecular dynamics simulation. LAMMPS has potentials for solid-state materials  (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or  mesoscopic systems. It can be used to model atoms or, more generically, as a parallel  particle simulator at the atomic, meso, or continuum scale. It can be coupled to various  programs. The following packages are not included within this version:   -KIM, -MSCG, -KOKKOS, -USER-QUIP, -USER-INTEL, -USER-VTK&amp;lt;br /&amp;gt;Homepage: http://lammps.sandia.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/ivmai/libatomic_ops libatomic_ops]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.6.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This package provides semi-portable access to hardware-provided atomic memory update operations on a number of architectures.&amp;lt;br /&amp;gt;Homepage: https://github.com/ivmai/libatomic_ops &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.hyperrealm.com/libconfig/ libconfig]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.7.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libconfig is a simple library for processing structured configuration files&amp;lt;br /&amp;gt;Homepage: http://www.hyperrealm.com/libconfig/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/libctl libctl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: libctl is a free Guile-based library implementing flexible control files for scientific simulations.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/libctl &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://sourceware.org/libffi/ libffi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.2.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The libffi library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run-time.&amp;lt;br /&amp;gt;Homepage: http://sourceware.org/libffi/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://sourceforge.net/p/libint/ libint]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.5 1.1.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libint library is used to evaluate the traditional (electron repulsion) and certain novel two-body  matrix elements (integrals) over Cartesian Gaussian functions used in modern atomic and molecular theory.&amp;lt;br /&amp;gt;Homepage: https://sourceforge.net/p/libint/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/libunistring/ libunistring]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.9.9&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This library provides functions for manipulating Unicode strings and for manipulating C strings  according to the Unicode standard.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/libunistring/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.tddft.org/programs/octopus/wiki/index.php/Libxc libxc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.0 4.0.3 4.0.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libxc is a library of exchange-correlation functionals for density-functional theory.  The aim is to provide a portable, well tested and reliable set of exchange and correlation functionals.&amp;lt;br /&amp;gt;Homepage: http://www.tddft.org/programs/octopus/wiki/index.php/Libxc &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://xmlsoft.org/ libxslt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.32&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libxslt is the XSLT C library developed for the GNOME project  (but usable outside of the Gnome platform).&amp;lt;br /&amp;gt;Homepage: http://xmlsoft.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ libyaml]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.1.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: LibYAML is a YAML parser and emitter written in C.&amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://symas.com/lmdb lmdb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.9.22&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: LMDB is a fast, memory-efficient database. With memory-mapped files, it has the read performance  of a pure in-memory database while retaining the persistence of standard disk-based databases.&amp;lt;br /&amp;gt;Homepage: https://symas.com/lmdb &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.linuxfromscratch.org/blfs/view/svn/x/makedepend.html makedepend]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.0.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The makedepend package contains a C-preprocessor like utility to determine build-time dependencies.&amp;lt;br /&amp;gt;Homepage: http://www.linuxfromscratch.org/blfs/view/svn/x/makedepend.html &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mathworks.com/products/compiler/mcr/ mcr]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | R2018a&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The MATLAB Runtime is a standalone set of shared libraries  that enables the execution of compiled MATLAB applications  or components on computers that do not have MATLAB installed.&amp;lt;br /&amp;gt;Homepage: http://www.mathworks.com/products/compiler/mcr/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Meep meep]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.3 1.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package  developed at MIT to model electromagnetic systems.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Meep &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Meep meep-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.3 1.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package  developed at MIT to model electromagnetic systems.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Meep &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://mercurial.selenic.com/ mercurial]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Mercurial is a free, distributed source control management tool. It efficiently handles projects of any size and offers an easy and intuitive interface. &amp;lt;br /&amp;gt;Homepage: http://mercurial.selenic.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://glaros.dtc.umn.edu/gkhome/metis/metis/overview metis]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: METIS is a set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices. The algorithms implemented in METIS are based on the multilevel recursive-bisection, multilevel k-way, and multi-constraint partitioning schemes.&amp;lt;br /&amp;gt;Homepage: http://glaros.dtc.umn.edu/gkhome/metis/metis/overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://software.intel.com/en-us/intel-mkl/ mkl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.2.lua 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Intel's Math Kernel Library, for use with gcc modules (the intel modules include mkl already).&amp;lt;br /&amp;gt;Homepage: http://software.intel.com/en-us/intel-mkl/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/MPB mpb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.6.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: MPB is a free and open-source software package for computing electromagnetic band structures and modes.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/MPB&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/MPB mpb-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.6.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: MPB is a free and open-source software package for computing electromagnetic band structures and modes.&amp;lt;br /&amp;gt;Homepage:  http://ab-initio.mit.edu/wiki/index.php/MPB&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mpfr.org mpfr]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The MPFR library is a C library for multiple-precision   floating-point computations with correct rounding.&amp;lt;br /&amp;gt;Homepage: http://www.mpfr.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.ncl.ucar.edu ncl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NCL is an interpreted language designed specifically for scientific data analysis and visualization.&amp;lt;br /&amp;gt;Homepage: http://www.ncl.ucar.edu &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/nco/pynco nco]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.7.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Python bindings for NCO&amp;lt;br /&amp;gt;Homepage: https://github.com/nco/pynco &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.unidata.ucar.edu/software/netcdf/ netcdf]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.0 4.6.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NetCDF (network Common Data Form) is a set of software libraries   and machine-independent data formats that support the creation, access, and sharing of array-oriented   scientific data.&amp;lt;br /&amp;gt;Homepage: http://www.unidata.ucar.edu/software/netcdf/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.unidata.ucar.edu/software/netcdf/ netcdf-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.6.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NetCDF (network Common Data Form) is a set of software libraries   and machine-independent data formats that support the creation, access, and sharing of array-oriented   scientific data.&amp;lt;br /&amp;gt;Homepage: http://www.unidata.ucar.edu/software/netcdf/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/octave/ octave]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GNU Octave is a high-level interpreted language, primarily intended for numerical computations.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/octave/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://xianyi.github.com/OpenBLAS/ openblas]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.2.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.&amp;lt;br /&amp;gt;Homepage: http://xianyi.github.com/OpenBLAS/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.openfoam.com/ openfoam]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 17.12 4.1 5.0 5.0-debug&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: OpenFOAM is a free, open source CFD software package.   OpenFOAM has an extensive range of features to solve anything from complex fluid flows  involving chemical reactions, turbulence and heat transfer,   to solid dynamics and electromagnetics.&amp;lt;br /&amp;gt;Homepage: http://www.openfoam.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.open-mpi.org/ openmpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.7 2.1.3 3.0.1 3.0.2 3.1.0 3.1.0rc3 3.1.0rc4 3.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Open MPI Project is an open source MPI-2 implementation.&amp;lt;br /&amp;gt;Homepage: http://www.open-mpi.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.paraview.org paraview]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ParaView is a scientific parallel visualizer.&amp;lt;br /&amp;gt;Homepage: http://www.paraview.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ perf-reports]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 18.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.perl.org/ perl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.20.3 5.26.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Larry Wall's Practical Extraction and Report Language&amp;lt;br /&amp;gt;Homepage: http://www.perl.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mcs.anl.gov/petsc petsc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.8.4 3.8.4-debug 3.9.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: PETSc, pronounced PET-see (the S is silent), is a suite of data structures and routines for the  scalable (parallel) solution of scientific applications modeled by partial differential equations.&amp;lt;br /&amp;gt;Homepage: http://www.mcs.anl.gov/petsc &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.astro.caltech.edu/~tjp/pgplot/ pgplot]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.2.2 5.2.2-x&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The PGPLOT Graphics Subroutine Library is a Fortran- or C-callable, device-independent graphics package for making simple scientific graphs. It is intended for making graphical images of publication quality with minimum effort on the part of the user. For most applications, the program can be device-independent, and the output can be directed to the appropriate device at run time.&amp;lt;br /&amp;gt;Homepage: http://www.astro.caltech.edu/~tjp/pgplot/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.cog-genomics.org/plink/1.9/ plink]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.07&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: plink-1.9-x86_64: Whole-genome association analysis toolset&amp;lt;br /&amp;gt;Homepage: https://www.cog-genomics.org/plink/1.9/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.plumed-code.org plumed]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: PLUMED is an open source library for free energy calculations in molecular systems which  works together with some of the most popular molecular dynamics engines. Free energy calculations can be  performed as a function of many order parameters with a particular  focus on biological problems, using  state of the art methods such as metadynamics, umbrella sampling and Jarzynski-equation based steered MD.  The software, written in C++, can be easily interfaced with both fortran and C/C++ codes. &amp;lt;br /&amp;gt;Homepage: http://www.plumed-code.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://trac.mcs.anl.gov/projects/parallel-netcdf pnetcdf]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Parallel netCDF: A Parallel I/O Library for NetCDF File Access&amp;lt;br /&amp;gt;Homepage: https://trac.mcs.anl.gov/projects/parallel-netcdf &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://python.org/ python]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.7.14 3.6.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Python is a programming language that lets you work more quickly and integrate your systems  more effectively.&amp;lt;br /&amp;gt;Homepage: http://python.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.qhull.org/ qhull]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2015.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   Qhull computes the convex hull, Delaunay triangulation, Voronoi diagram,  halfspace intersection about a point, furthest-site Delaunay triangulation,  and furthest-site Voronoi diagram. The source code runs in 2-d, 3-d, 4-d, and  higher dimensions. Qhull implements the Quickhull algorithm for computing the  convex hull. &amp;lt;br /&amp;gt;Homepage: http://www.qhull.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.quantum-espresso.org/ quantum-espresso]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.2.1 6.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Quantum ESPRESSO is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.&amp;lt;br /&amp;gt;Homepage: https://www.quantum-espresso.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.r-project.org/ r]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1 3.5.0 4.0.1 R2018a&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: R is a free software environment for statistical computing and graphics.&amp;lt;br /&amp;gt;Homepage: http://www.r-project.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/vanzonr/rarray rarray]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: rarray is a C++ library for multidimensional arrays. It is a header-only implementation that uses templates, which allows most compilers to generate fast code.&amp;lt;br /&amp;gt;Homepage: https://github.com/vanzonr/rarray &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://rsync.samba.org/ rsync]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.1.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Utility that provides fast incremental file transfer; this module provides a newer version than is present in the operating system.&amp;lt;br /&amp;gt;Homepage: https://rsync.samba.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.rust-lang.org rust]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.26.1 1.28.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Rust is a systems programming language that runs blazingly fast, prevents segfaults,  and guarantees thread safety.&amp;lt;br /&amp;gt;Homepage: https://www.rust-lang.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.htslib.org/ samtools]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SAM Tools provide various utilities for manipulating alignments in the SAM format,   including sorting, merging, indexing and generating alignments in a per-position format.&amp;lt;br /&amp;gt;Homepage: http://www.htslib.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.netlib.org/scalapack/ scalapack]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.0.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The ScaLAPACK (or Scalable LAPACK) library includes a subset of LAPACK routines  redesigned for distributed memory MIMD parallel computers.&amp;lt;br /&amp;gt;Homepage: http://www.netlib.org/scalapack/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.scons.org/ scons]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SCons is a software construction tool.&amp;lt;br /&amp;gt;Homepage: http://www.scons.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.shengbte.org/ shengbte]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ShengBTE is a software package for solving the Boltzmann Transport Equation for phonons.  Also installed is the 'thirdorder' package of Python scripts.&amp;lt;br /&amp;gt;Homepage: http://www.shengbte.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://wci.llnl.gov/codes/silo/ silo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.10.2-bsd&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Silo is a library for reading and writing a wide variety of scientific data to binary, disk files&amp;lt;br /&amp;gt;Homepage: https://wci.llnl.gov/codes/silo/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://singularity.lbl.gov singularity]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.5.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Singularity is a portable application stack packaging and runtime utility.&amp;lt;br /&amp;gt;Homepage: http://singularity.lbl.gov &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://spglib.sourceforge.net/ spglib]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Spglib is a C library for finding and handling crystal symmetries.&amp;lt;br /&amp;gt;Homepage: http://spglib.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.sqlite.org/ sqlite]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.23.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SQLite: SQL Database Engine in a C Library&amp;lt;br /&amp;gt;Homepage: https://www.sqlite.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://su2.stanford.edu su2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.0.0 6.0.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: An open-source collection of software tools written in C++ for performing Partial Differential Equation (PDE) analysis and solving PDE-constrained optimization problems. The toolset is designed with computational fluid dynamics and aerodynamic shape optimization in mind.&amp;lt;br /&amp;gt;Homepage: http://su2.stanford.edu &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://subversion.apache.org/ subversion]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  Subversion is an open source version control system.&amp;lt;br /&amp;gt;Homepage: http://subversion.apache.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://computation.llnl.gov/projects/sundials sundials]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.6.0 2.7.0 3.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers&amp;lt;br /&amp;gt;Homepage: http://computation.llnl.gov/projects/sundials &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://computation.llnl.gov/projects/sundials sundials-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers&amp;lt;br /&amp;gt;Homepage: http://computation.llnl.gov/projects/sundials &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.swig.org/ swig]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.12&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SWIG is a software development tool that connects programs written in C and C++ with  a variety of high-level programming languages.&amp;lt;br /&amp;gt;Homepage: http://www.swig.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ thirdorder]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://trilinos.sandia.gov/ trilinos]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 12.12.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Trilinos Project is an effort to develop algorithms and enabling technologies  within an object-oriented software framework for the solution of large-scale, complex multi-physics  engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.&amp;lt;br /&amp;gt;Homepage: http://trilinos.sandia.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://valgrind.org valgrind]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.13.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Valgrind: Debugging and profiling tools&amp;lt;br /&amp;gt;Homepage: http://valgrind.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://wci.llnl.gov/simulation/computer-codes/visit visit]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.13.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  VisIt is an Open Source, interactive, scalable,     visualization, animation and analysis tool.&amp;lt;br /&amp;gt;Homepage: https://wci.llnl.gov/simulation/computer-codes/visit &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.ks.uiuc.edu/Research/vmd vmd]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.4a12&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular     systems using 3-D graphics and built-in scripting.&amp;lt;br /&amp;gt;Homepage: http://www.ks.uiuc.edu/Research/vmd &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.vtk.org vtk]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Visualization Toolkit (VTK) is an open-source, freely available software system for  3D computer graphics, image processing and visualization. VTK consists of a C++ class library and several  interpreted interface layers including Tcl/Tk, Java, and Python. VTK supports a wide variety of visualization  algorithms including: scalar, vector, tensor, texture, and volumetric methods; and advanced modeling techniques  such as: implicit modeling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation.&amp;lt;br /&amp;gt;Homepage: http://www.vtk.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Modules_specific_to_Niagara&amp;diff=1376</id>
		<title>Modules specific to Niagara</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Modules_specific_to_Niagara&amp;diff=1376"/>
		<updated>2018-08-16T14:59:52Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;width:85%&amp;quot;&lt;br /&gt;
! style=&amp;quot;width: 25%&amp;quot; align=&amp;quot;center&amp;quot; | Module&lt;br /&gt;
! style=&amp;quot;width: 15%&amp;quot; align=&amp;quot;center&amp;quot; | Versions&lt;br /&gt;
! style=&amp;quot;width: 15%&amp;quot; align=&amp;quot;center&amp;quot; | Documentation&lt;br /&gt;
! align=&amp;quot;center&amp;quot; | Description&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.continuum.io/anaconda-overview anaconda2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform  that empowers companies to adopt a modern open data science analytics architecture. &amp;lt;br /&amp;gt;Homepage: https://www.continuum.io/anaconda-overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.continuum.io/anaconda-overview anaconda3]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform  that empowers companies to adopt a modern open data science analytics architecture. &amp;lt;br /&amp;gt;Homepage: https://www.continuum.io/anaconda-overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://arma.sourceforge.net/ armadillo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.500.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Armadillo is an open-source C++ linear algebra library (matrix maths) aiming towards  a good balance between speed and ease of use. Integer, floating point and complex numbers are supported,  as well as a subset of trigonometric and statistics functions.&amp;lt;br /&amp;gt;Homepage: http://arma.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ arpack-ng]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ aspect]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.0.0 2.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://autotools.io autotools]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This bundle collect the standard GNU build tools: Autoconf, Automake and libtool&amp;lt;br /&amp;gt;Homepage: http://autotools.io &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://bazel.io/ bazel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.11.1 0.15.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Bazel is a build tool that builds code quickly and reliably.  It is used to build the majority of Google's software.&amp;lt;br /&amp;gt;Homepage: http://bazel.io/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://blast.ncbi.nlm.nih.gov/ blast+]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.7.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Basic Local Alignment Search Tool, or BLAST, is an algorithm  for comparing primary biological sequence information, such as the amino-acid  sequences of different proteins or the nucleotides of DNA sequences.&amp;lt;br /&amp;gt;Homepage: http://blast.ncbi.nlm.nih.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.boost.org/ boost]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.63.0 1.66.0 1.67.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Boost provides free peer-reviewed portable C++ source libraries.&amp;lt;br /&amp;gt;Homepage: http://www.boost.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.boost.org/ boost-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.67.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Boost provides free peer-reviewed portable C++ source libraries.&amp;lt;br /&amp;gt;Homepage: http://www.boost.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://code.zmaw.de/projects/cdo cdo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data.&amp;lt;br /&amp;gt;Homepage: https://code.zmaw.de/projects/cdo &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://heasarc.gsfc.nasa.gov/fitsio/ cfitsio]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.430&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CFITSIO is a library of C and Fortran subroutines for reading and writing data files in FITS (Flexible Image Transport System) data format.&amp;lt;br /&amp;gt;Homepage: http://heasarc.gsfc.nasa.gov/fitsio/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://cmake.org/ cmake]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.10.3 3.11.0 3.11.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   CMake, the cross-platform, open-source build system.  CMake is a family of  tools designed to build, test and package software. &amp;lt;br /&amp;gt;Homepage: https://cmake.org/&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.cp2k.org/ cp2k]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CP2K is a freely available (GPL) program, written in Fortran 95, to perform atomistic and molecular  simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different  methods such as e.g. density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and  classical pair and many-body potentials. &amp;lt;br /&amp;gt;Homepage: http://www.cp2k.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ dakota-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.7.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge ddt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 18.1.2 18.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ARM's HPC development tools: Distributed Debugging Tool and MAP Profiler&amp;lt;br /&amp;gt;Homepage: https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://eigen.tuxfamily.org/index.php?title=Main_Page eigen]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Eigen is a C++ template library for linear algebra:  matrices, vectors, numerical solvers, and related algorithms.&amp;lt;br /&amp;gt;Homepage: http://eigen.tuxfamily.org/index.php?title=Main_Page &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://elpa.rzg.mpg.de elpa]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2016.05.003 2017.05.003 2018.05.001&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Eigenvalue SoLvers for Petaflop-Applications .&amp;lt;br /&amp;gt;Homepage: http://elpa.rzg.mpg.de &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://fenicsproject.org/ fenics]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.2.0 2018.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FEniCS Project is a collection of free and open-source software components with the common goal to enable automated solution of differential equations. The components provide scientific computing tools for working with computational meshes, finite-element variational formulations of ordinary and partial differential equations, and numerical linear algebra.&amp;lt;br /&amp;gt;Homepage: https://fenicsproject.org/&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fetk.org/ fetk]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Finite Element ToolKit (FETK) is a collaboratively developed, evolving collection of adaptive finite element method (AFEM) software libraries and tools for solving coupled systems of nonlinear geometric partial differential equations (PDE).&amp;lt;br /&amp;gt;Homepage: http://www.fetk.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.ffmpeg.org/ ffmpeg]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.4.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: A complete, cross-platform solution to record, convert and stream audio and video.&amp;lt;br /&amp;gt;Homepage: https://www.ffmpeg.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fftw.org fftw]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FFTW is a C subroutine library for computing the discrete Fourier transform (DFT)  in one or more dimensions, of arbitrary input size, and of both real and complex data.&amp;lt;br /&amp;gt;Homepage: http://www.fftw.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fftw.org fftw-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FFTW is a C subroutine library for computing the discrete Fourier transform (DFT)  in one or more dimensions, of arbitrary input size, and of both real and complex data.&amp;lt;br /&amp;gt;Homepage: http://www.fftw.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://fmtlib.net/ fmt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.2 4.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: fmt (formerly cppformat) is an open-source formatting library.&amp;lt;br /&amp;gt;Homepage: http://fmtlib.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ foam-extend]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://hboehm.info/gc/ gc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.6.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Boehm-Demers-Weiser conservative garbage collector can be used as a garbage collecting   replacement for C malloc or C++ new.&amp;lt;br /&amp;gt;Homepage: http://hboehm.info/gc/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gcc.gnu.org/ gcc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.3.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada,  as well as libraries for these languages (libstdc++, libgcj,...).&amp;lt;br /&amp;gt;Homepage: http://gcc.gnu.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/gdb/gdb.html gdb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Project Debugger&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/gdb/gdb.html &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://git-scm.com/ git]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.16.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.&amp;lt;br /&amp;gt;Homepage: http://git-scm.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gtk.org/ glib]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.3 2.22.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GLib is one of the base libraries of the GTK+ project&amp;lt;br /&amp;gt;Homepage: http://www.gtk.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://rashmikumari.github.io/g_mmpbsa/ g_mmpbsa]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: calculates components of binding energy using MM-PBSA method except the entropic term and energetic contribution of each residue to the binding using energy decomposition scheme.&amp;lt;br /&amp;gt;Homepage: https://rashmikumari.github.io/g_mmpbsa/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gmplib.org/ gmp]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GMP is a free library for arbitrary precision arithmetic,  operating on signed integers, rational numbers, and floating point numbers. &amp;lt;br /&amp;gt;Homepage: http://gmplib.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.gnu.org/software/parallel/ gnu-parallel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 20180322&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Tools for running commands in parallel on one or more nodes.&amp;lt;br /&amp;gt;Homepage: https://www.gnu.org/software/parallel/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gnuplot.sourceforge.net/ gnuplot]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.2.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Portable interactive, function plotting utility&amp;lt;br /&amp;gt;Homepage: http://gnuplot.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/google/googletest googletest]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   Welcome to Google Test, Google-s C++ test framework!  Please see the project page for more information as well as the mailing list for questions, discussions, and development. There is also an IRC channel on OFTC.  Getting started information for Google Test is available in the Google Test Primer documentation.  Google Mock is an extension to Google Test for writing and using C++ mock classes. See the separate Google Mock documentation.  More detailed documentation for googletest (including build instructions) are in its interior googletest/README.md file. &amp;lt;br /&amp;gt;Homepage: https://github.com/google/googletest &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gromacs.org gromacs]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2016.5 2016.5-plumed-2.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  GROMACS is a versatile package to perform molecular dynamics,  i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. This is CPU only build, containing both MPI and threadMPI builds. &amp;lt;br /&amp;gt;Homepage: http://www.gromacs.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/gsl/ gsl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers.  The library provides a wide range of mathematical routines such as random number generators, special functions  and least-squares fitting.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/gsl/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/guile guile]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.2.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Guile is the GNU Ubiquitous Intelligent Language for Extensions,  the official extension language for the GNU operating system.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/guile &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Harminv harminv]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Harminv is a free program (and accompanying library) to solve the problem of harmonic inversion -  given a discrete-time, finite-length signal that consists of a sum of finitely-many sinusoids (possibly exponentially  decaying) in a given bandwidth, it determines the frequencies, decay constants, amplitudes, and phases of those  sinusoids.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Harminv &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://support.hdfgroup.org/HDF5/ hdf5]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.2 1.8.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HDF5 is a data model, library, and file format for storing and managing data.  It supports an unlimited variety of datatypes, and is designed for flexible  and efficient I/O and for high volume and complex data.&amp;lt;br /&amp;gt;Homepage: https://support.hdfgroup.org/HDF5/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://support.hdfgroup.org/HDF5/ hdf5-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.2 1.8.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HDF5 is a data model, library, and file format for storing and managing data.  It supports an unlimited variety of datatypes, and is designed for flexible  and efficient I/O and for high volume and complex data.&amp;lt;br /&amp;gt;Homepage: https://support.hdfgroup.org/HDF5/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://ccb.jhu.edu/software/hisat2/index.shtml hisat2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads  (both DNA and RNA) against the general human population (as well as against a single reference genome).&amp;lt;br /&amp;gt;Homepage: https://ccb.jhu.edu/software/hisat2/index.shtml &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ hpn-ssh]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.7p1-14v15&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.x.org/ imake]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.0.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: imake is a Makefile-generator that is intended to make it easier to develop software  portably for multiple systems.&amp;lt;br /&amp;gt;Homepage: http://www.x.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://software.intel.com/en-us/intel-parallel-studio-xe intel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Compiler toolchain including Intel compilers and Intel Math Kernel Library (MKL).&amp;lt;br /&amp;gt;Homepage: https://software.intel.com/en-us/intel-parallel-studio-xe &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://software.intel.com/en-us/intel-parallel-studio-xe intelmpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Intel's (MPICH-based) MPI implementation.&amp;lt;br /&amp;gt;Homepage: https://software.intel.com/en-us/intel-parallel-studio-xe &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ intelpython2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2018.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ intelpython3]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2018.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://java.com/ java]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8.0_162&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Java Platform, Standard Edition (Java SE) lets you develop and deploy  Java applications on desktops and servers.&amp;lt;br /&amp;gt;Homepage: http://java.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://lammps.sandia.gov/ lammps]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 11May2018 22Sep2017&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) is  a classical molecular dynamics simulation. LAMMPS has potentials for solid-state materials  (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or  mesoscopic systems. It can be used to model atoms or, more generically, as a parallel  particle simulator at the atomic, meso, or continuum scale. It can be coupled to various  programs. The following packages are not included within this version:   -KIM, -MSCG, -KOKKOS, -USER-QUIP, -USER-INTEL, -USER-VTK&amp;lt;br /&amp;gt;Homepage: http://lammps.sandia.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/ivmai/libatomic_ops libatomic_ops]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.6.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This package provides semi-portable access to hardware-provided atomic memory update operations on a number of architectures.&amp;lt;br /&amp;gt;Homepage: https://github.com/ivmai/libatomic_ops &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.hyperrealm.com/libconfig/ libconfig]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.7.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libconfig is a simple library for processing structured configuration files&amp;lt;br /&amp;gt;Homepage: http://www.hyperrealm.com/libconfig/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/libctl libctl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: libctl is a free Guile-based library implementing flexible control files for scientific simulations.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/libctl &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://sourceware.org/libffi/ libffi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.2.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The libffi library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run-time.&amp;lt;br /&amp;gt;Homepage: http://sourceware.org/libffi/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://sourceforge.net/p/libint/ libint]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.5 1.1.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libint library is used to evaluate the traditional (electron repulsion) and certain novel two-body  matrix elements (integrals) over Cartesian Gaussian functions used in modern atomic and molecular theory.&amp;lt;br /&amp;gt;Homepage: https://sourceforge.net/p/libint/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/libunistring/ libunistring]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.9.9&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This library provides functions for manipulating Unicode strings and for manipulating C strings  according to the Unicode standard.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/libunistring/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.tddft.org/programs/octopus/wiki/index.php/Libxc libxc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.0 4.0.3 4.0.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libxc is a library of exchange-correlation functionals for density-functional theory.  The aim is to provide a portable, well tested and reliable set of exchange and correlation functionals.&amp;lt;br /&amp;gt;Homepage: http://www.tddft.org/programs/octopus/wiki/index.php/Libxc &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://xmlsoft.org/ libxslt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.32&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libxslt is the XSLT C library developed for the GNOME project  (but usable outside of the Gnome platform).&amp;lt;br /&amp;gt;Homepage: http://xmlsoft.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ libyaml]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.1.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: LibYAML is a YAML parser and emitter written in C.&amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://symas.com/lmdb lmdb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.9.22&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: LMDB is a fast, memory-efficient database. With memory-mapped files, it has the read performance  of a pure in-memory database while retaining the persistence of standard disk-based databases.&amp;lt;br /&amp;gt;Homepage: https://symas.com/lmdb &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.linuxfromscratch.org/blfs/view/svn/x/makedepend.html makedepend]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.0.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The makedepend package contains a C-preprocessor like utility to determine build-time dependencies.&amp;lt;br /&amp;gt;Homepage: http://www.linuxfromscratch.org/blfs/view/svn/x/makedepend.html &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mathworks.com/products/compiler/mcr/ mcr]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | R2018a&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The MATLAB Runtime is a standalone set of shared libraries  that enables the execution of compiled MATLAB applications  or components on computers that do not have MATLAB installed.&amp;lt;br /&amp;gt;Homepage: http://www.mathworks.com/products/compiler/mcr/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Meep meep]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.3 1.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package  developed at MIT to model electromagnetic systems.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Meep &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Meep meep-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.3 1.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package  developed at MIT to model electromagnetic systems.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Meep &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://mercurial.selenic.com/ mercurial]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Mercurial is a free, distributed source control management tool. It efficiently handles projects of any size and offers an easy and intuitive interface. &amp;lt;br /&amp;gt;Homepage: http://mercurial.selenic.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://glaros.dtc.umn.edu/gkhome/metis/metis/overview metis]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: METIS is a set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices. The algorithms implemented in METIS are based on the multilevel recursive-bisection, multilevel k-way, and multi-constraint partitioning schemes.&amp;lt;br /&amp;gt;Homepage: http://glaros.dtc.umn.edu/gkhome/metis/metis/overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://software.intel.com/en-us/intel-mkl/ mkl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.2.lua 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Intel's Math Kernel Library, for use with gcc modules (the intel modules include mkl already).&amp;lt;br /&amp;gt;Homepage: http://software.intel.com/en-us/intel-mkl/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/MPB mpb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.6.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: MPB is a free and open-source software package for computing electromagnetic band structures and modes.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/MPB&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/MPB mpb-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.6.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: MPB is a free and open-source software package for computing electromagnetic band structures and modes.&amp;lt;br /&amp;gt;Homepage:  http://ab-initio.mit.edu/wiki/index.php/MPB&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mpfr.org mpfr]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The MPFR library is a C library for multiple-precision   floating-point computations with correct rounding.&amp;lt;br /&amp;gt;Homepage: http://www.mpfr.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.ncl.ucar.edu ncl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NCL is an interpreted language designed specifically for scientific data analysis and visualization.&amp;lt;br /&amp;gt;Homepage: http://www.ncl.ucar.edu &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/nco/pynco nco]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.7.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Python bindings for NCO&amp;lt;br /&amp;gt;Homepage: https://github.com/nco/pynco &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.unidata.ucar.edu/software/netcdf/ netcdf]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.0 4.6.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NetCDF (network Common Data Form) is a set of software libraries   and machine-independent data formats that support the creation, access, and sharing of array-oriented   scientific data.&amp;lt;br /&amp;gt;Homepage: http://www.unidata.ucar.edu/software/netcdf/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.unidata.ucar.edu/software/netcdf/ netcdf-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.6.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NetCDF (network Common Data Form) is a set of software libraries   and machine-independent data formats that support the creation, access, and sharing of array-oriented   scientific data.&amp;lt;br /&amp;gt;Homepage: http://www.unidata.ucar.edu/software/netcdf/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/octave/ octave]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GNU Octave is a high-level interpreted language, primarily intended for numerical computations.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/octave/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://xianyi.github.com/OpenBLAS/ openblas]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.2.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.&amp;lt;br /&amp;gt;Homepage: http://xianyi.github.com/OpenBLAS/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.openfoam.com/ openfoam]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 17.12 4.1 5.0 5.0-debug&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: OpenFOAM is a free, open source CFD software package.   OpenFOAM has an extensive range of features to solve anything from complex fluid flows  involving chemical reactions, turbulence and heat transfer,   to solid dynamics and electromagnetics.&amp;lt;br /&amp;gt;Homepage: http://www.openfoam.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.open-mpi.org/ openmpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.7 2.1.3 3.0.1 3.0.2 3.1.0 3.1.0rc3 3.1.0rc4 3.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Open MPI Project is an open source MPI-2 implementation.&amp;lt;br /&amp;gt;Homepage: http://www.open-mpi.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.paraview.org paraview]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ParaView is a scientific parallel visualizer.&amp;lt;br /&amp;gt;Homepage: http://www.paraview.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ perf-reports]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 18.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.perl.org/ perl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.20.3 5.26.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Larry Wall's Practical Extraction and Report Language&amp;lt;br /&amp;gt;Homepage: http://www.perl.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mcs.anl.gov/petsc petsc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.8.4 3.8.4-debug 3.9.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: PETSc, pronounced PET-see (the S is silent), is a suite of data structures and routines for the  scalable (parallel) solution of scientific applications modeled by partial differential equations.&amp;lt;br /&amp;gt;Homepage: http://www.mcs.anl.gov/petsc &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.astro.caltech.edu/~tjp/pgplot/ pgplot]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.2.2 5.2.2-x&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The PGPLOT Graphics Subroutine Library is a Fortran- or C-callable, device-independent graphics package for making simple scientific graphs. It is intended for making graphical images of publication quality with minimum effort on the part of the user. For most applications, the program can be device-independent, and the output can be directed to the appropriate device at run time.&amp;lt;br /&amp;gt;Homepage: http://www.astro.caltech.edu/~tjp/pgplot/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.cog-genomics.org/plink/1.9/ plink]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.07&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: plink-1.9-x86_64: Whole-genome association analysis toolset&amp;lt;br /&amp;gt;Homepage: https://www.cog-genomics.org/plink/1.9/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.plumed-code.org plumed]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: PLUMED is an open source library for free energy calculations in molecular systems which  works together with some of the most popular molecular dynamics engines. Free energy calculations can be  performed as a function of many order parameters with a particular  focus on biological problems, using  state of the art methods such as metadynamics, umbrella sampling and Jarzynski-equation based steered MD.  The software, written in C++, can be easily interfaced with both fortran and C/C++ codes. &amp;lt;br /&amp;gt;Homepage: http://www.plumed-code.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://trac.mcs.anl.gov/projects/parallel-netcdf pnetcdf]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Parallel netCDF: A Parallel I/O Library for NetCDF File Access&amp;lt;br /&amp;gt;Homepage: https://trac.mcs.anl.gov/projects/parallel-netcdf &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://python.org/ python]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.7.14 3.6.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Python is a programming language that lets you work more quickly and integrate your systems  more effectively.&amp;lt;br /&amp;gt;Homepage: http://python.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.qhull.org/ qhull]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2015.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   Qhull computes the convex hull, Delaunay triangulation, Voronoi diagram,  halfspace intersection about a point, furthest-site Delaunay triangulation,  and furthest-site Voronoi diagram. The source code runs in 2-d, 3-d, 4-d, and  higher dimensions. Qhull implements the Quickhull algorithm for computing the  convex hull. &amp;lt;br /&amp;gt;Homepage: http://www.qhull.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.quantum-espresso.org/ quantum-espresso]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.2.1 6.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Quantum ESPRESSO is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.&amp;lt;br /&amp;gt;Homepage: https://www.quantum-espresso.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.r-project.org/ r]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1 3.5.0 4.0.1 R2018a&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: R is a free software environment for statistical computing and graphics.&amp;lt;br /&amp;gt;Homepage: http://www.r-project.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/vanzonr/rarray rarray]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: rarray is a C++ library for multidimensional arrays. It is a header-only implementation that uses templates, which allows most compilers to generate fast code.&amp;lt;br /&amp;gt;Homepage: https://github.com/vanzonr/rarray &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://rsync.samba.org/ rsync]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.1.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Utility that provides fast incremental file transfer; this module provides a newer version than is present in the operating system.&amp;lt;br /&amp;gt;Homepage: https://rsync.samba.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.rust-lang.org rust]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.26.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Rust is a systems programming language that runs blazingly fast, prevents segfaults,  and guarantees thread safety.&amp;lt;br /&amp;gt;Homepage: https://www.rust-lang.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.htslib.org/ samtools]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SAM Tools provide various utilities for manipulating alignments in the SAM format,   including sorting, merging, indexing and generating alignments in a per-position format.&amp;lt;br /&amp;gt;Homepage: http://www.htslib.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.netlib.org/scalapack/ scalapack]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.0.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The ScaLAPACK (or Scalable LAPACK) library includes a subset of LAPACK routines  redesigned for distributed memory MIMD parallel computers.&amp;lt;br /&amp;gt;Homepage: http://www.netlib.org/scalapack/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.scons.org/ scons]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SCons is a software construction tool.&amp;lt;br /&amp;gt;Homepage: http://www.scons.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.shengbte.org/ shengbte]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ShengBTE is a software package for solving the Boltzmann Transport Equation for phonons.  Also installed is the 'thirdorder' package of Python scripts.&amp;lt;br /&amp;gt;Homepage: http://www.shengbte.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://wci.llnl.gov/codes/silo/ silo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.10.2-bsd&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Silo is a library for reading and writing a wide variety of scientific data to binary, disk files&amp;lt;br /&amp;gt;Homepage: https://wci.llnl.gov/codes/silo/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://singularity.lbl.gov singularity]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.5.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Singularity is a portable application stack packaging and runtime utility.&amp;lt;br /&amp;gt;Homepage: http://singularity.lbl.gov &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://spglib.sourceforge.net/ spglib]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Spglib is a C library for finding and handling crystal symmetries.&amp;lt;br /&amp;gt;Homepage: http://spglib.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.sqlite.org/ sqlite]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.23.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SQLite: SQL Database Engine in a C Library&amp;lt;br /&amp;gt;Homepage: https://www.sqlite.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://su2.stanford.edu su2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.0.0 6.0.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: An open-source collection of software tools written in C++ for performing Partial Differential Equation (PDE) analysis and solving PDE-constrained optimization problems. The toolset is designed with computational fluid dynamics and aerodynamic shape optimization in mind.&amp;lt;br /&amp;gt;Homepage: http://su2.stanford.edu &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://subversion.apache.org/ subversion]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  Subversion is an open source version control system.&amp;lt;br /&amp;gt;Homepage: http://subversion.apache.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://computation.llnl.gov/projects/sundials sundials]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.6.0 2.7.0 3.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers&amp;lt;br /&amp;gt;Homepage: http://computation.llnl.gov/projects/sundials &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://computation.llnl.gov/projects/sundials sundials-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers&amp;lt;br /&amp;gt;Homepage: http://computation.llnl.gov/projects/sundials &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.swig.org/ swig]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.12&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SWIG is a software development tool that connects programs written in C and C++ with  a variety of high-level programming languages.&amp;lt;br /&amp;gt;Homepage: http://www.swig.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ thirdorder]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://trilinos.sandia.gov/ trilinos]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 12.12.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Trilinos Project is an effort to develop algorithms and enabling technologies  within an object-oriented software framework for the solution of large-scale, complex multi-physics  engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.&amp;lt;br /&amp;gt;Homepage: http://trilinos.sandia.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://valgrind.org valgrind]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.13.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Valgrind: Debugging and profiling tools&amp;lt;br /&amp;gt;Homepage: http://valgrind.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://wci.llnl.gov/simulation/computer-codes/visit visit]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.13.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  VisIt is an Open Source, interactive, scalable,     visualization, animation and analysis tool.&amp;lt;br /&amp;gt;Homepage: https://wci.llnl.gov/simulation/computer-codes/visit &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.ks.uiuc.edu/Research/vmd vmd]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.4a12&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular     systems using 3-D graphics and built-in scripting.&amp;lt;br /&amp;gt;Homepage: http://www.ks.uiuc.edu/Research/vmd &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.vtk.org vtk]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Visualization Toolkit (VTK) is an open-source, freely available software system for  3D computer graphics, image processing and visualization. VTK consists of a C++ class library and several  interpreted interface layers including Tcl/Tk, Java, and Python. VTK supports a wide variety of visualization  algorithms including: scalar, vector, tensor, texture, and volumetric methods; and advanced modeling techniques  such as: implicit modeling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation.&amp;lt;br /&amp;gt;Homepage: http://www.vtk.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Modules_specific_to_Niagara&amp;diff=1375</id>
		<title>Modules specific to Niagara</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Modules_specific_to_Niagara&amp;diff=1375"/>
		<updated>2018-08-16T14:58:58Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{| class=&amp;quot;wikitable sortable&amp;quot; style=&amp;quot;width:85%&amp;quot;&lt;br /&gt;
! style=&amp;quot;width: 25%&amp;quot; align=&amp;quot;center&amp;quot; | Module&lt;br /&gt;
! style=&amp;quot;width: 15%&amp;quot; align=&amp;quot;center&amp;quot; | Versions&lt;br /&gt;
! style=&amp;quot;width: 15%&amp;quot; align=&amp;quot;center&amp;quot; | Documentation&lt;br /&gt;
! align=&amp;quot;center&amp;quot; | Description&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.continuum.io/anaconda-overview anaconda2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform  that empowers companies to adopt a modern open data science analytics architecture. &amp;lt;br /&amp;gt;Homepage: https://www.continuum.io/anaconda-overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.continuum.io/anaconda-overview anaconda3]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform  that empowers companies to adopt a modern open data science analytics architecture. &amp;lt;br /&amp;gt;Homepage: https://www.continuum.io/anaconda-overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://arma.sourceforge.net/ armadillo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.500.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Armadillo is an open-source C++ linear algebra library (matrix maths) aiming towards  a good balance between speed and ease of use. Integer, floating point and complex numbers are supported,  as well as a subset of trigonometric and statistics functions.&amp;lt;br /&amp;gt;Homepage: http://arma.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ arpack-ng]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ aspect]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.0.0 2.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://autotools.io autotools]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This bundle collect the standard GNU build tools: Autoconf, Automake and libtool&amp;lt;br /&amp;gt;Homepage: http://autotools.io &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://bazel.io/ bazel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.11.1 0.15.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Bazel is a build tool that builds code quickly and reliably.  It is used to build the majority of Google's software.&amp;lt;br /&amp;gt;Homepage: http://bazel.io/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://blast.ncbi.nlm.nih.gov/ blast+]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.7.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Basic Local Alignment Search Tool, or BLAST, is an algorithm  for comparing primary biological sequence information, such as the amino-acid  sequences of different proteins or the nucleotides of DNA sequences.&amp;lt;br /&amp;gt;Homepage: http://blast.ncbi.nlm.nih.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.boost.org/ boost]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.63.0 1.66.0 1.67.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Boost provides free peer-reviewed portable C++ source libraries.&amp;lt;br /&amp;gt;Homepage: http://www.boost.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.boost.org/ boost-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.67.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Boost provides free peer-reviewed portable C++ source libraries.&amp;lt;br /&amp;gt;Homepage: http://www.boost.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://code.zmaw.de/projects/cdo cdo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data.&amp;lt;br /&amp;gt;Homepage: https://code.zmaw.de/projects/cdo &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://heasarc.gsfc.nasa.gov/fitsio/ cfitsio]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.430&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CFITSIO is a library of C and Fortran subroutines for reading and writing data files in FITS (Flexible Image Transport System) data format.&amp;lt;br /&amp;gt;Homepage: http://heasarc.gsfc.nasa.gov/fitsio/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://cmake.org/ cmake]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.10.3 3.11.0 3.11.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   CMake, the cross-platform, open-source build system.  CMake is a family of  tools designed to build, test and package software. &amp;lt;br /&amp;gt;Homepage: https://cmake.org/&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.cp2k.org/ cp2k]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: CP2K is a freely available (GPL) program, written in Fortran 95, to perform atomistic and molecular  simulations of solid state, liquid, molecular and biological systems. It provides a general framework for different  methods such as e.g. density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW), and  classical pair and many-body potentials. &amp;lt;br /&amp;gt;Homepage: http://www.cp2k.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ dakota-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.7.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge ddt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 18.1.2 18.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ARM's HPC development tools: Distributed Debugging Tool and MAP Profiler&amp;lt;br /&amp;gt;Homepage: https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://eigen.tuxfamily.org/index.php?title=Main_Page eigen]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Eigen is a C++ template library for linear algebra:  matrices, vectors, numerical solvers, and related algorithms.&amp;lt;br /&amp;gt;Homepage: http://eigen.tuxfamily.org/index.php?title=Main_Page &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://elpa.rzg.mpg.de elpa]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2016.05.003 2017.05.003 2018.05.001&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Eigenvalue SoLvers for Petaflop-Applications .&amp;lt;br /&amp;gt;Homepage: http://elpa.rzg.mpg.de &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://fenicsproject.org/ fenics]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.2.0 2018.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FEniCS Project is a collection of free and open-source software components with the common goal to enable automated solution of differential equations. The components provide scientific computing tools for working with computational meshes, finite-element variational formulations of ordinary and partial differential equations, and numerical linear algebra.&amp;lt;br /&amp;gt;Homepage: https://fenicsproject.org/&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fetk.org/ fetk]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Finite Element ToolKit (FETK) is a collaboratively developed, evolving collection of adaptive finite element method (AFEM) software libraries and tools for solving coupled systems of nonlinear geometric partial differential equations (PDE).&amp;lt;br /&amp;gt;Homepage: http://www.fetk.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.ffmpeg.org/ ffmpeg]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.4.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: A complete, cross-platform solution to record, convert and stream audio and video.&amp;lt;br /&amp;gt;Homepage: https://www.ffmpeg.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fftw.org fftw]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FFTW is a C subroutine library for computing the discrete Fourier transform (DFT)  in one or more dimensions, of arbitrary input size, and of both real and complex data.&amp;lt;br /&amp;gt;Homepage: http://www.fftw.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.fftw.org fftw-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.3.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: FFTW is a C subroutine library for computing the discrete Fourier transform (DFT)  in one or more dimensions, of arbitrary input size, and of both real and complex data.&amp;lt;br /&amp;gt;Homepage: http://www.fftw.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://fmtlib.net/ fmt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.2 4.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: fmt (formerly cppformat) is an open-source formatting library.&amp;lt;br /&amp;gt;Homepage: http://fmtlib.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ foam-extend]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://hboehm.info/gc/ gc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.6.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Boehm-Demers-Weiser conservative garbage collector can be used as a garbage collecting   replacement for C malloc or C++ new.&amp;lt;br /&amp;gt;Homepage: http://hboehm.info/gc/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gcc.gnu.org/ gcc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.3.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, and Ada,  as well as libraries for these languages (libstdc++, libgcj,...).&amp;lt;br /&amp;gt;Homepage: http://gcc.gnu.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/gdb/gdb.html gdb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Project Debugger&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/gdb/gdb.html &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://git-scm.com/ git]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.16.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.&amp;lt;br /&amp;gt;Homepage: http://git-scm.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gtk.org/ glib]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.3 2.22.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GLib is one of the base libraries of the GTK+ project&amp;lt;br /&amp;gt;Homepage: http://www.gtk.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://rashmikumari.github.io/g_mmpbsa/ g_mmpbsa]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: calculates components of binding energy using MM-PBSA method except the entropic term and energetic contribution of each residue to the binding using energy decomposition scheme.&amp;lt;br /&amp;gt;Homepage: https://rashmikumari.github.io/g_mmpbsa/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gmplib.org/ gmp]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GMP is a free library for arbitrary precision arithmetic,  operating on signed integers, rational numbers, and floating point numbers. &amp;lt;br /&amp;gt;Homepage: http://gmplib.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.gnu.org/software/parallel/ gnu-parallel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 20180322&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Tools for running commands in parallel on one or more nodes.&amp;lt;br /&amp;gt;Homepage: https://www.gnu.org/software/parallel/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://gnuplot.sourceforge.net/ gnuplot]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.2.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Portable interactive, function plotting utility&amp;lt;br /&amp;gt;Homepage: http://gnuplot.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/google/googletest googletest]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   Welcome to Google Test, Google-s C++ test framework!  Please see the project page for more information as well as the mailing list for questions, discussions, and development. There is also an IRC channel on OFTC.  Getting started information for Google Test is available in the Google Test Primer documentation.  Google Mock is an extension to Google Test for writing and using C++ mock classes. See the separate Google Mock documentation.  More detailed documentation for googletest (including build instructions) are in its interior googletest/README.md file. &amp;lt;br /&amp;gt;Homepage: https://github.com/google/googletest &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gromacs.org gromacs]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2016.5 2016.5-plumed-2.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  GROMACS is a versatile package to perform molecular dynamics,  i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. This is CPU only build, containing both MPI and threadMPI builds. &amp;lt;br /&amp;gt;Homepage: http://www.gromacs.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/gsl/ gsl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers.  The library provides a wide range of mathematical routines such as random number generators, special functions  and least-squares fitting.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/gsl/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/guile guile]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.2.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Guile is the GNU Ubiquitous Intelligent Language for Extensions,  the official extension language for the GNU operating system.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/guile &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Harminv harminv]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Harminv is a free program (and accompanying library) to solve the problem of harmonic inversion -  given a discrete-time, finite-length signal that consists of a sum of finitely-many sinusoids (possibly exponentially  decaying) in a given bandwidth, it determines the frequencies, decay constants, amplitudes, and phases of those  sinusoids.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Harminv &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://support.hdfgroup.org/HDF5/ hdf5]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.2 1.8.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HDF5 is a data model, library, and file format for storing and managing data.  It supports an unlimited variety of datatypes, and is designed for flexible  and efficient I/O and for high volume and complex data.&amp;lt;br /&amp;gt;Homepage: https://support.hdfgroup.org/HDF5/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://support.hdfgroup.org/HDF5/ hdf5-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.2 1.8.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HDF5 is a data model, library, and file format for storing and managing data.  It supports an unlimited variety of datatypes, and is designed for flexible  and efficient I/O and for high volume and complex data.&amp;lt;br /&amp;gt;Homepage: https://support.hdfgroup.org/HDF5/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://ccb.jhu.edu/software/hisat2/index.shtml hisat2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads  (both DNA and RNA) against the general human population (as well as against a single reference genome).&amp;lt;br /&amp;gt;Homepage: https://ccb.jhu.edu/software/hisat2/index.shtml &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ hpn-ssh]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.7p1-14v15&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.x.org/ imake]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.0.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: imake is a Makefile-generator that is intended to make it easier to develop software  portably for multiple systems.&amp;lt;br /&amp;gt;Homepage: http://www.x.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://software.intel.com/en-us/intel-parallel-studio-xe intel]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Compiler toolchain including Intel compilers and Intel Math Kernel Library (MKL).&amp;lt;br /&amp;gt;Homepage: https://software.intel.com/en-us/intel-parallel-studio-xe &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://software.intel.com/en-us/intel-parallel-studio-xe intelmpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Intel's (MPICH-based) MPI implementation.&amp;lt;br /&amp;gt;Homepage: https://software.intel.com/en-us/intel-parallel-studio-xe &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ intelpython2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2018.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ intelpython3]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2018.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://java.com/ java]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8.0_162&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Java Platform, Standard Edition (Java SE) lets you develop and deploy  Java applications on desktops and servers.&amp;lt;br /&amp;gt;Homepage: http://java.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://lammps.sandia.gov/ lammps]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 11May2018 22Sep2017&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) is  a classical molecular dynamics simulation. LAMMPS has potentials for solid-state materials  (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or  mesoscopic systems. It can be used to model atoms or, more generically, as a parallel  particle simulator at the atomic, meso, or continuum scale. It can be coupled to various  programs. The following packages are not included within this version:   -KIM, -MSCG, -KOKKOS, -USER-QUIP, -USER-INTEL, -USER-VTK&amp;lt;br /&amp;gt;Homepage: http://lammps.sandia.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/ivmai/libatomic_ops libatomic_ops]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 7.6.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This package provides semi-portable access to hardware-provided atomic memory update operations on a number of architectures.&amp;lt;br /&amp;gt;Homepage: https://github.com/ivmai/libatomic_ops &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.hyperrealm.com/libconfig/ libconfig]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.7.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libconfig is a simple library for processing structured configuration files&amp;lt;br /&amp;gt;Homepage: http://www.hyperrealm.com/libconfig/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/libctl libctl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: libctl is a free Guile-based library implementing flexible control files for scientific simulations.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/libctl &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://sourceware.org/libffi/ libffi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.2.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The libffi library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run-time.&amp;lt;br /&amp;gt;Homepage: http://sourceware.org/libffi/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://sourceforge.net/p/libint/ libint]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.5 1.1.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libint library is used to evaluate the traditional (electron repulsion) and certain novel two-body  matrix elements (integrals) over Cartesian Gaussian functions used in modern atomic and molecular theory.&amp;lt;br /&amp;gt;Homepage: https://sourceforge.net/p/libint/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/libunistring/ libunistring]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.9.9&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: This library provides functions for manipulating Unicode strings and for manipulating C strings  according to the Unicode standard.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/libunistring/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.tddft.org/programs/octopus/wiki/index.php/Libxc libxc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.0 4.0.3 4.0.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libxc is a library of exchange-correlation functionals for density-functional theory.  The aim is to provide a portable, well tested and reliable set of exchange and correlation functionals.&amp;lt;br /&amp;gt;Homepage: http://www.tddft.org/programs/octopus/wiki/index.php/Libxc &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://xmlsoft.org/ libxslt]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.32&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Libxslt is the XSLT C library developed for the GNOME project  (but usable outside of the Gnome platform).&amp;lt;br /&amp;gt;Homepage: http://xmlsoft.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ libyaml]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.1.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: LibYAML is a YAML parser and emitter written in C.&amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://symas.com/lmdb lmdb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.9.22&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: LMDB is a fast, memory-efficient database. With memory-mapped files, it has the read performance  of a pure in-memory database while retaining the persistence of standard disk-based databases.&amp;lt;br /&amp;gt;Homepage: https://symas.com/lmdb &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.linuxfromscratch.org/blfs/view/svn/x/makedepend.html makedepend]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.0.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The makedepend package contains a C-preprocessor like utility to determine build-time dependencies.&amp;lt;br /&amp;gt;Homepage: http://www.linuxfromscratch.org/blfs/view/svn/x/makedepend.html &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mathworks.com/products/compiler/mcr/ mcr]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | R2018a&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The MATLAB Runtime is a standalone set of shared libraries  that enables the execution of compiled MATLAB applications  or components on computers that do not have MATLAB installed.&amp;lt;br /&amp;gt;Homepage: http://www.mathworks.com/products/compiler/mcr/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Meep meep]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package  developed at MIT to model electromagnetic systems.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Meep &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/Meep meep-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.4.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Meep (or MEEP) is a free finite-difference time-domain (FDTD) simulation software package  developed at MIT to model electromagnetic systems.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/Meep &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://mercurial.selenic.com/ mercurial]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.6&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Mercurial is a free, distributed source control management tool. It efficiently handles projects of any size and offers an easy and intuitive interface. &amp;lt;br /&amp;gt;Homepage: http://mercurial.selenic.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://glaros.dtc.umn.edu/gkhome/metis/metis/overview metis]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: METIS is a set of serial programs for partitioning graphs, partitioning finite element meshes, and producing fill reducing orderings for sparse matrices. The algorithms implemented in METIS are based on the multilevel recursive-bisection, multilevel k-way, and multi-constraint partitioning schemes.&amp;lt;br /&amp;gt;Homepage: http://glaros.dtc.umn.edu/gkhome/metis/metis/overview &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://software.intel.com/en-us/intel-mkl/ mkl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2017.7 2018.2 2018.2.lua 2018.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Intel's Math Kernel Library, for use with gcc modules (the intel modules include mkl already).&amp;lt;br /&amp;gt;Homepage: http://software.intel.com/en-us/intel-mkl/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/MPB mpb]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.6.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: MPB is a free and open-source software package for computing electromagnetic band structures and modes.&amp;lt;br /&amp;gt;Homepage: http://ab-initio.mit.edu/wiki/index.php/MPB&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://ab-initio.mit.edu/wiki/index.php/MPB mpb-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.6.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: MPB is a free and open-source software package for computing electromagnetic band structures and modes.&amp;lt;br /&amp;gt;Homepage:  http://ab-initio.mit.edu/wiki/index.php/MPB&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mpfr.org mpfr]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The MPFR library is a C library for multiple-precision   floating-point computations with correct rounding.&amp;lt;br /&amp;gt;Homepage: http://www.mpfr.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.ncl.ucar.edu ncl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NCL is an interpreted language designed specifically for scientific data analysis and visualization.&amp;lt;br /&amp;gt;Homepage: http://www.ncl.ucar.edu &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/nco/pynco nco]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.7.4&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Python bindings for NCO&amp;lt;br /&amp;gt;Homepage: https://github.com/nco/pynco &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.unidata.ucar.edu/software/netcdf/ netcdf]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.0 4.6.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NetCDF (network Common Data Form) is a set of software libraries   and machine-independent data formats that support the creation, access, and sharing of array-oriented   scientific data.&amp;lt;br /&amp;gt;Homepage: http://www.unidata.ucar.edu/software/netcdf/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.unidata.ucar.edu/software/netcdf/ netcdf-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.6.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: NetCDF (network Common Data Form) is a set of software libraries   and machine-independent data formats that support the creation, access, and sharing of array-oriented   scientific data.&amp;lt;br /&amp;gt;Homepage: http://www.unidata.ucar.edu/software/netcdf/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.gnu.org/software/octave/ octave]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.4.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: GNU Octave is a high-level interpreted language, primarily intended for numerical computations.&amp;lt;br /&amp;gt;Homepage: http://www.gnu.org/software/octave/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://xianyi.github.com/OpenBLAS/ openblas]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 0.2.20&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.&amp;lt;br /&amp;gt;Homepage: http://xianyi.github.com/OpenBLAS/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.openfoam.com/ openfoam]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 17.12 4.1 5.0 5.0-debug&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: OpenFOAM is a free, open source CFD software package.   OpenFOAM has an extensive range of features to solve anything from complex fluid flows  involving chemical reactions, turbulence and heat transfer,   to solid dynamics and electromagnetics.&amp;lt;br /&amp;gt;Homepage: http://www.openfoam.com/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.open-mpi.org/ openmpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.7 2.1.3 3.0.1 3.0.2 3.1.0 3.1.0rc3 3.1.0rc4 3.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Open MPI Project is an open source MPI-2 implementation.&amp;lt;br /&amp;gt;Homepage: http://www.open-mpi.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.paraview.org paraview]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.5.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ParaView is a scientific parallel visualizer.&amp;lt;br /&amp;gt;Homepage: http://www.paraview.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ perf-reports]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 18.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.perl.org/ perl]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.20.3 5.26.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Larry Wall's Practical Extraction and Report Language&amp;lt;br /&amp;gt;Homepage: http://www.perl.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.mcs.anl.gov/petsc petsc]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.8.4 3.8.4-debug 3.9.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: PETSc, pronounced PET-see (the S is silent), is a suite of data structures and routines for the  scalable (parallel) solution of scientific applications modeled by partial differential equations.&amp;lt;br /&amp;gt;Homepage: http://www.mcs.anl.gov/petsc &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.astro.caltech.edu/~tjp/pgplot/ pgplot]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.2.2 5.2.2-x&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The PGPLOT Graphics Subroutine Library is a Fortran- or C-callable, device-independent graphics package for making simple scientific graphs. It is intended for making graphical images of publication quality with minimum effort on the part of the user. For most applications, the program can be device-independent, and the output can be directed to the appropriate device at run time.&amp;lt;br /&amp;gt;Homepage: http://www.astro.caltech.edu/~tjp/pgplot/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.cog-genomics.org/plink/1.9/ plink]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.07&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: plink-1.9-x86_64: Whole-genome association analysis toolset&amp;lt;br /&amp;gt;Homepage: https://www.cog-genomics.org/plink/1.9/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.plumed-code.org plumed]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.4.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: PLUMED is an open source library for free energy calculations in molecular systems which  works together with some of the most popular molecular dynamics engines. Free energy calculations can be  performed as a function of many order parameters with a particular  focus on biological problems, using  state of the art methods such as metadynamics, umbrella sampling and Jarzynski-equation based steered MD.  The software, written in C++, can be easily interfaced with both fortran and C/C++ codes. &amp;lt;br /&amp;gt;Homepage: http://www.plumed-code.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://trac.mcs.anl.gov/projects/parallel-netcdf pnetcdf]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Parallel netCDF: A Parallel I/O Library for NetCDF File Access&amp;lt;br /&amp;gt;Homepage: https://trac.mcs.anl.gov/projects/parallel-netcdf &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://python.org/ python]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.7.14 3.6.5&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Python is a programming language that lets you work more quickly and integrate your systems  more effectively.&amp;lt;br /&amp;gt;Homepage: http://python.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.qhull.org/ qhull]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2015.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:   Qhull computes the convex hull, Delaunay triangulation, Voronoi diagram,  halfspace intersection about a point, furthest-site Delaunay triangulation,  and furthest-site Voronoi diagram. The source code runs in 2-d, 3-d, 4-d, and  higher dimensions. Qhull implements the Quickhull algorithm for computing the  convex hull. &amp;lt;br /&amp;gt;Homepage: http://www.qhull.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.quantum-espresso.org/ quantum-espresso]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 6.2.1 6.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Quantum ESPRESSO is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.&amp;lt;br /&amp;gt;Homepage: https://www.quantum-espresso.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.r-project.org/ r]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1 3.5.0 4.0.1 R2018a&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: R is a free software environment for statistical computing and graphics.&amp;lt;br /&amp;gt;Homepage: http://www.r-project.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://github.com/vanzonr/rarray rarray]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: rarray is a C++ library for multidimensional arrays. It is a header-only implementation that uses templates, which allows most compilers to generate fast code.&amp;lt;br /&amp;gt;Homepage: https://github.com/vanzonr/rarray &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://rsync.samba.org/ rsync]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.1.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Utility that provides fast incremental file transfer; this module provides a newer version than is present in the operating system.&amp;lt;br /&amp;gt;Homepage: https://rsync.samba.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.rust-lang.org rust]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.26.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Rust is a systems programming language that runs blazingly fast, prevents segfaults,  and guarantees thread safety.&amp;lt;br /&amp;gt;Homepage: https://www.rust-lang.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.htslib.org/ samtools]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.8&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SAM Tools provide various utilities for manipulating alignments in the SAM format,   including sorting, merging, indexing and generating alignments in a per-position format.&amp;lt;br /&amp;gt;Homepage: http://www.htslib.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.netlib.org/scalapack/ scalapack]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.0.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The ScaLAPACK (or Scalable LAPACK) library includes a subset of LAPACK routines  redesigned for distributed memory MIMD parallel computers.&amp;lt;br /&amp;gt;Homepage: http://www.netlib.org/scalapack/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.scons.org/ scons]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SCons is a software construction tool.&amp;lt;br /&amp;gt;Homepage: http://www.scons.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.shengbte.org/ shengbte]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: ShengBTE is a software package for solving the Boltzmann Transport Equation for phonons.  Also installed is the 'thirdorder' package of Python scripts.&amp;lt;br /&amp;gt;Homepage: http://www.shengbte.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://wci.llnl.gov/codes/silo/ silo]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 4.10.2-bsd&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Silo is a library for reading and writing a wide variety of scientific data to binary, disk files&amp;lt;br /&amp;gt;Homepage: https://wci.llnl.gov/codes/silo/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://singularity.lbl.gov singularity]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.5.2&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Singularity is a portable application stack packaging and runtime utility.&amp;lt;br /&amp;gt;Homepage: http://singularity.lbl.gov &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://spglib.sourceforge.net/ spglib]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.10.3&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Spglib is a C library for finding and handling crystal symmetries.&amp;lt;br /&amp;gt;Homepage: http://spglib.sourceforge.net/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://www.sqlite.org/ sqlite]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.23.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SQLite: SQL Database Engine in a C Library&amp;lt;br /&amp;gt;Homepage: https://www.sqlite.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://su2.stanford.edu su2]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 5.0.0 6.0.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: An open-source collection of software tools written in C++ for performing Partial Differential Equation (PDE) analysis and solving PDE-constrained optimization problems. The toolset is designed with computational fluid dynamics and aerodynamic shape optimization in mind.&amp;lt;br /&amp;gt;Homepage: http://su2.stanford.edu &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://subversion.apache.org/ subversion]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.7&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  Subversion is an open source version control system.&amp;lt;br /&amp;gt;Homepage: http://subversion.apache.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://computation.llnl.gov/projects/sundials sundials]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.6.0 2.7.0 3.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers&amp;lt;br /&amp;gt;Homepage: http://computation.llnl.gov/projects/sundials &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://computation.llnl.gov/projects/sundials sundials-mpi]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.1.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SUNDIALS: SUite of Nonlinear and DIfferential/ALgebraic Equation Solvers&amp;lt;br /&amp;gt;Homepage: http://computation.llnl.gov/projects/sundials &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.swig.org/ swig]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.0.12&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: SWIG is a software development tool that connects programs written in C and C++ with  a variety of high-level programming languages.&amp;lt;br /&amp;gt;Homepage: http://www.swig.org/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [ thirdorder]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: &amp;lt;br /&amp;gt;Homepage:  &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://trilinos.sandia.gov/ trilinos]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 12.12.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Trilinos Project is an effort to develop algorithms and enabling technologies  within an object-oriented software framework for the solution of large-scale, complex multi-physics  engineering and scientific problems. A unique design feature of Trilinos is its focus on packages.&amp;lt;br /&amp;gt;Homepage: http://trilinos.sandia.gov/ &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://valgrind.org valgrind]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 3.13.0&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: Valgrind: Debugging and profiling tools&amp;lt;br /&amp;gt;Homepage: http://valgrind.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [https://wci.llnl.gov/simulation/computer-codes/visit visit]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 2.13.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description:  VisIt is an Open Source, interactive, scalable,     visualization, animation and analysis tool.&amp;lt;br /&amp;gt;Homepage: https://wci.llnl.gov/simulation/computer-codes/visit &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.ks.uiuc.edu/Research/vmd vmd]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 1.9.4a12&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular     systems using 3-D graphics and built-in scripting.&amp;lt;br /&amp;gt;Homepage: http://www.ks.uiuc.edu/Research/vmd &amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | [http://www.vtk.org vtk]&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | 8.1.1&lt;br /&gt;
| align=&amp;quot;center&amp;quot; | &lt;br /&gt;
| &amp;lt;div class=&amp;quot;mw-collapsible mw-collapsed&amp;quot; style=&amp;quot;white-space: pre-line;&amp;quot;&amp;gt;&amp;lt;br /&amp;gt;Description: The Visualization Toolkit (VTK) is an open-source, freely available software system for  3D computer graphics, image processing and visualization. VTK consists of a C++ class library and several  interpreted interface layers including Tcl/Tk, Java, and Python. VTK supports a wide variety of visualization  algorithms including: scalar, vector, tensor, texture, and volumetric methods; and advanced modeling techniques  such as: implicit modeling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation.&amp;lt;br /&amp;gt;Homepage: http://www.vtk.org &amp;lt;/div&amp;gt;&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=R&amp;diff=1374</id>
		<title>R</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=R&amp;diff=1374"/>
		<updated>2018-08-15T16:35:54Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* The cluster R code */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[http://www.r-project.org/ R] is programing language that continues to grow in popularity for data analysis. It is very fast to write code in, but the software that results is much much slower than C or Fortran; one should be wary of doing too much compute-intensive work in R.&lt;br /&gt;
&lt;br /&gt;
==Running R on Niagara==&lt;br /&gt;
&lt;br /&gt;
We currently have two families of R installed on Niagara. &lt;br /&gt;
* Anaconda R&lt;br /&gt;
* regular R&lt;br /&gt;
&lt;br /&gt;
Here we describe the differences between these packages.&lt;br /&gt;
&lt;br /&gt;
=== Anaconda R===&lt;br /&gt;
&lt;br /&gt;
Anaconda is a pre-assembled set of commonly-used data science tools, which recently added R to its suite of packages.  The source for this collection is [https://anaconda.org/r here]. &lt;br /&gt;
&lt;br /&gt;
As of 30 July 2018 the following Anaconda modules are available:&lt;br /&gt;
&lt;br /&gt;
    $ module avail anaconda&lt;br /&gt;
    ----------------- /scinet/niagara/software/2018a/modules/base ------------------&lt;br /&gt;
     anaconda2/5.1.0    python/2.7.14-anaconda5.1.0    r/3.4.3-anaconda5.1.0&lt;br /&gt;
     anaconda3/5.1.0    python/3.6.4-anaconda5.1.0&lt;br /&gt;
&lt;br /&gt;
Note that there is a single Anaconda R module available, and that none of these modules require a compiler to be loaded.  The Anaconda R module is R version 3.4.3, which comes from the Anaconda version 5.1.0.&lt;br /&gt;
&lt;br /&gt;
You load the module in the usual way:&lt;br /&gt;
&lt;br /&gt;
     $ module load r/3.4.3-anaconda5.1.0&lt;br /&gt;
     $ R&lt;br /&gt;
     &amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Regular R ===&lt;br /&gt;
&lt;br /&gt;
The base R program has also been installed from source.  This installation comes with no R packages installed other than the base installation.&lt;br /&gt;
&lt;br /&gt;
     $ module spider r&lt;br /&gt;
     --------------------------------------------------------------------------------&lt;br /&gt;
     r:&lt;br /&gt;
     --------------------------------------------------------------------------------&lt;br /&gt;
     Versions:&lt;br /&gt;
        r/3.4.3-anaconda5.1.0&lt;br /&gt;
        r/3.5.0&lt;br /&gt;
     $&lt;br /&gt;
     $ module spider r/3.5.0&lt;br /&gt;
     --------------------------------------------------------------------------------&lt;br /&gt;
     r: r/3.5.0&lt;br /&gt;
     --------------------------------------------------------------------------------&lt;br /&gt;
     You will need to load all module(s) on any one of the lines below before the &amp;quot;r/3.5.0&amp;quot; module is available to load.&lt;br /&gt;
     intel/2018.2&lt;br /&gt;
     $&lt;br /&gt;
     $ module load intel/2018.2 r/3.5.0&lt;br /&gt;
     $ R&lt;br /&gt;
     &amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
(The intel module is a prerequesite for the R module).  If you will be using Rmpi, you will need to load the openmpi module as well.--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Many optional packages are available for R which add functionality for specific domains; they are available through the [http://cran.r-project.org/mirrors.html Comprehensive R Archive Network (CRAN)]. &lt;br /&gt;
&lt;br /&gt;
R provides an easy way for users to install the libraries they need in their home directories rather than having them installed system-wide; there are so many  potential optional  packages for R people could potentially want, we recommend users who want additional packages to proceed this way.  This is almost certainly the easiest way to deal with the wide range of packages, ensure they're up to date, and ensure that users package choices don't conflict. &lt;br /&gt;
&lt;br /&gt;
In general, you can install those that you need yourself in your home directory; eg, &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ R &lt;br /&gt;
&amp;gt; install.packages(&amp;quot;package-name&amp;quot;, dependencies = TRUE)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
will download and compile the source for the packages you need in your home  directory under &amp;lt;tt&amp;gt;${HOME}/R/x86_64-unknown-linux-gnu-library/2.11/&amp;lt;/tt&amp;gt; (you can specify another directory with a lib= option.)  Then take a look at  help(&amp;quot;.libPaths&amp;quot;) to make sure that R knows where to look for  the packages you've compiled. Note that you must install packages with logged into a development node as write access to the library folder is not available to a standard node on the cluster. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Note that during the installation you may get warnings that the packages cannot be installed in e.g. /scinet/gpc/Applications/R/3.0.1/lib64/R/bin/. But after those messages, R should have succeeded in installing the package into your home directory.--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Running serial R jobs ===&lt;br /&gt;
&lt;br /&gt;
As with all serial jobs, if your R computation does not use multiple cores, you should bundle them up so the 40 cores of a node are all performing work.  Examples of this can be found on [[Running_Serial_Jobs_on_Niagara|this]] page.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== Saving images from R in compute jobs ==&lt;br /&gt;
&lt;br /&gt;
To make use of the graphics capability of R, R insists on having an X server running, even if you're just writing to a file.  There is no X server on the compute nodes, and you'd get a message like&lt;br /&gt;
&lt;br /&gt;
 unable to open connection to X11 display ''&lt;br /&gt;
&lt;br /&gt;
To get around this issue, you can run a 'virtual' X server on the compute nodes by adding the following commands at the start of your job script:&lt;br /&gt;
&lt;br /&gt;
 # Make virtual X server command called Xvfb available:&lt;br /&gt;
 module load Xlibraries&lt;br /&gt;
  &lt;br /&gt;
 # Select a unique display number:&lt;br /&gt;
 let DISPLAYNUM=$UID%65274&lt;br /&gt;
 export DISPLAY=&amp;quot;:$DISPLAYNUM&amp;quot;&lt;br /&gt;
  &lt;br /&gt;
 # Start the virtual X server&lt;br /&gt;
 Xvfb $DISPLAY -fp $SCINET_FONTPATH -ac 2&amp;gt;/dev/null &amp;amp;&lt;br /&gt;
&lt;br /&gt;
After this, run R or Rscript as usual. The virtual X server will be running in the background and will get killed which your job is done. Alternatively, you may want to kill it explicitly at the end of you job script  using &lt;br /&gt;
&lt;br /&gt;
 # Kill any remaining Xvfb server&lt;br /&gt;
 pkill -u $UID Xvfb&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Rmpi (R with MPI) ==&lt;br /&gt;
&lt;br /&gt;
None of the R installations on Niagara have Rmpi installed by default. &lt;br /&gt;
&lt;br /&gt;
=== Installing Rmpi, version 3.5.0 ===&lt;br /&gt;
&lt;br /&gt;
Version 3.5.0 does not have the Rmpi library as a standard package, which means you have to install it yourself if you are using that version.  The same is true if you want to use IntelMPI instead of OpenMPI.  &lt;br /&gt;
&lt;br /&gt;
Installing the Rmpi package can be a bit challenging, since some additional parameters need to be given to the installation, which contain the path to various header files and libraries. These paths differ depending on what MPI version you are using.   &lt;br /&gt;
&lt;br /&gt;
The various MPI versions on Niagara are loaded with the module command. So the first thing to do is to decide what MPI version to use (OpenMPI or IntelMPI), and to type the corresponding &amp;quot;module load&amp;quot; command on the command-line (as well as in your jobs scripts).&lt;br /&gt;
&lt;br /&gt;
Because the MPI modules define all the paths in environment variables, the following line seem to work for installations of all OpenMPI versions.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load intel/2018.2&lt;br /&gt;
$ module load openmpi/3.1.0&lt;br /&gt;
$ module load r/3.5.0&lt;br /&gt;
$&lt;br /&gt;
$ R&lt;br /&gt;
&amp;gt;&lt;br /&gt;
&amp;gt; install.packages(&amp;quot;Rmpi&amp;quot;,&lt;br /&gt;
                   configure.args =&lt;br /&gt;
                   c(paste(&amp;quot;--with-Rmpi-include=&amp;quot;, Sys.getenv(&amp;quot;SCINET_OPENMPI_ROOT&amp;quot;), &amp;quot;/include&amp;quot;, sep=&amp;quot;&amp;quot;),&lt;br /&gt;
                     paste(&amp;quot;--with-Rmpi-libpath=&amp;quot;, Sys.getenv(&amp;quot;SCINET_OPENMPI_ROOT&amp;quot;), &amp;quot;lib&amp;quot;, sep=&amp;quot;&amp;quot;),&lt;br /&gt;
                     &amp;quot;--with-Rmpi-type=OPENMPI&amp;quot;))&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For intelmpi, you only need to change &amp;lt;tt&amp;gt;OPENMPI&amp;lt;/tt&amp;gt; to &amp;lt;tt&amp;gt;MPICH2&amp;lt;/tt&amp;gt; in the last line.&lt;br /&gt;
&lt;br /&gt;
=== Running Rmpi ===&lt;br /&gt;
&lt;br /&gt;
To start using R with Rmpi, make sure you have all require modules loaded (e.g. &amp;lt;tt&amp;gt;module load intel/2018.2 openmpi/3.1.0 r/3.5.0&amp;lt;/tt&amp;gt;), then launch it with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -np 1 R --no-save&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
which starts one master mpi process, but starts up the infrastructure to be able to spawn additional processes.&lt;br /&gt;
&lt;br /&gt;
== Creating an R cluster ==&lt;br /&gt;
&lt;br /&gt;
The 'parallel' package allows you to use R to launch individual serial subjobs across multiple nodes.  This section describes how this is accomplished.&lt;br /&gt;
&lt;br /&gt;
=== Creating your Rscript wrapper ===&lt;br /&gt;
&lt;br /&gt;
The first thing to do is create a wrapper for Rscript.  This needs to be done because the R module needs to be loaded on all nodes, but the submission script only loads modules on the head node of the job.  The wrapper script, let's call it MyRscript.sh, is short:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
module load intel/2018.2 r/3.5.0&lt;br /&gt;
${SCINET_R_ROOT}/bin/Rscript --no-restore &amp;quot;$@&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
The &amp;quot;--no-restore&amp;quot; flag prevents Rscript from loading your &amp;quot;workspace image&amp;quot;, if you have one saved.  Loading the image causes problems for the cluster.&lt;br /&gt;
&lt;br /&gt;
Once you've created your wrapper, make it executable:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ chmod u+x MyRscript.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Your wrapper is now ready to be used.&lt;br /&gt;
&lt;br /&gt;
=== The cluster R code ===&lt;br /&gt;
The R code which we will run consists of two parts, the code which launches the cluster, and does pre- and post-analysis, and the code which is run on the individual cluster &amp;quot;nodes&amp;quot;.  Here is some code which demonstrates this functionality.  Let's call it MyClusterCode.R.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
######################################################&lt;br /&gt;
#&lt;br /&gt;
#  worker code&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
# first define the function which will be run on all the cluster nodes.  This is just a test function.  &lt;br /&gt;
# Put your real worker code here.&lt;br /&gt;
testfunc &amp;lt;- function(a) {&lt;br /&gt;
&lt;br /&gt;
  # this part is just to waste time&lt;br /&gt;
  b &amp;lt;- 0&lt;br /&gt;
  for (i in 1:10000) {&lt;br /&gt;
      b &amp;lt;- b + 1&lt;br /&gt;
  }&lt;br /&gt;
&lt;br /&gt;
  s &amp;lt;- Sys.info()['nodename']&lt;br /&gt;
  return(paste0(s, &amp;quot; &amp;quot;, a[1], &amp;quot; &amp;quot;, a[2]))&lt;br /&gt;
&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
######################################################&lt;br /&gt;
#&lt;br /&gt;
#  head node code&lt;br /&gt;
#&lt;br /&gt;
&lt;br /&gt;
# Create a bunch of index pairs to feed to the worker function.  These could be parameters,&lt;br /&gt;
# or whatever your code needs to vary across jobs.  Note that the worker function only &lt;br /&gt;
# takes a single argument; each entry in the list must contain all the information &lt;br /&gt;
# that the function needs to run.  In this example, each entry contains a list which&lt;br /&gt;
# contains two pieces of information, a pair of indices.&lt;br /&gt;
indexlist &amp;lt;- list()&lt;br /&gt;
index &amp;lt;- 1&lt;br /&gt;
for (i in 1:10) {&lt;br /&gt;
  for (j in 1:10) {&lt;br /&gt;
     indexlist[index] &amp;lt;- list(c(i,j))&lt;br /&gt;
     index &amp;lt;- index +1&lt;br /&gt;
   }&lt;br /&gt;
}&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
# Now set up the cluster.&lt;br /&gt;
&lt;br /&gt;
# First load the parallel library.&lt;br /&gt;
library(parallel)&lt;br /&gt;
&lt;br /&gt;
# Next find all the nodes which the scheduler has given to us.&lt;br /&gt;
# These are given by the SLURM_JOB_NODELIST environment variable.&lt;br /&gt;
nodelist &amp;lt;- Sys.getenv(&amp;quot;SLURM_JOB_NODELIST&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
node_ids &amp;lt;- unlist(strsplit(nodelist,split=&amp;quot;[^a-z0-9-]&amp;quot;))[-1]&lt;br /&gt;
&lt;br /&gt;
if (length(node_ids)&amp;gt;0) {&lt;br /&gt;
  expanded_ids &amp;lt;- lapply(node_ids, function (id) {&lt;br /&gt;
    ranges &amp;lt;- as.numeric(&lt;br /&gt;
      unlist(strsplit(id, split=&amp;quot;[-]&amp;quot;))&lt;br /&gt;
    )&lt;br /&gt;
    if (length(ranges)&amp;gt;1) seq(ranges[1], ranges[2], by=1) else ranges&lt;br /&gt;
  })&lt;br /&gt;
  &lt;br /&gt;
  nodelist &amp;lt;- sprintf(&amp;quot;nia%04d&amp;quot;, unlist(expanded_ids))&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# Now launch the cluster, using the list of nodes and our Rscript&lt;br /&gt;
# wrapper.&lt;br /&gt;
cl &amp;lt;- makePSOCKcluster(names = nodelist, rscript = &amp;quot;/path/to/your/MyRscript.sh&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# Now run the worker code, using the parameter list we created above.&lt;br /&gt;
result &amp;lt;- clusterApplyLB(cl, indexlist, testfunc)&lt;br /&gt;
&lt;br /&gt;
# The results of all the jobs will now be put in the 'result' variable,&lt;br /&gt;
# in the order they were specified in the 'indexlist' variable.&lt;br /&gt;
&lt;br /&gt;
# Don't forget to stop the cluster when you're finished.&lt;br /&gt;
stopCluster(cl)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
You can, of course, add any post-processing code you need to the above code.&lt;br /&gt;
&lt;br /&gt;
=== Submitting an R cluster job ===&lt;br /&gt;
You are now ready to submit your job to the Niagara queue.  The submission script is like most others:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --time=5:00:00&lt;br /&gt;
#SBATCH --job-name MyRCluster&lt;br /&gt;
&lt;br /&gt;
cd $SLURM_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
module load intel/2018.2 r/3.5.0&lt;br /&gt;
&lt;br /&gt;
${SCINET_R_ROOT}/bin/Rscript --no-restore MyClusterCode.R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Be sure to use whatever number of nodes, length of time, etc., is appropriate for your job.&lt;br /&gt;
&lt;br /&gt;
== SciNet's R Classes ==&lt;br /&gt;
&lt;br /&gt;
There is a dizzying amount of documentation available for programming in R; consult your favourite search engine.  That begin said, SciNet runs several classes each year on using R for research:&lt;br /&gt;
* [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=msc1090&amp;amp;include=all&amp;amp;filter=Filter MSC1090]: Introduction to Computational BioStatistics with R.  This class graduate-level [https://ims.utoronto.ca IMS]-sponsored class is open to graduate students in IMS or other fields.  This class is intended for those with little-to-no programming experience who wish to use R in scientific research.&lt;br /&gt;
* [https://support.scinet.utoronto.ca/education/browse.php?category=-1&amp;amp;search=ees1137&amp;amp;include=all&amp;amp;filter=Filter EES1137]: Quantitative Applications for Data Analysis.  [https://www.utsc.utoronto.ca/gradpes/ees1137h-quantitative-applications-data-analysis This class] is similar to MSC1090, but takes class at UTSC, and is sponsored by the department of Physical and Environmental Sciences.age].&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
	<entry>
		<id>https://docs.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=1209</id>
		<title>Data Management</title>
		<link rel="alternate" type="text/html" href="https://docs.scinet.utoronto.ca/index.php?title=Data_Management&amp;diff=1209"/>
		<updated>2018-08-08T20:34:36Z</updated>

		<summary type="html">&lt;p&gt;Afedosee: /* Scratch Disk Purging Policy */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Understanding the various file systems, and how to use them properly, is critical to optimizing your workflow and being a good SciNet citizen.  This page describes the various Niagara file systems, and how to properly use them.&lt;br /&gt;
&lt;br /&gt;
=Performance=&lt;br /&gt;
The file systems on SciNet, with the exception of archive, are [http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System GPFS], a high-performance file system which provides rapid reads and writes to large datasets in parallel from many nodes.  As a consequence of this design, however, '''the file system performs quite ''poorly'' at accessing data sets which consist of many, small files.'''  For instance, you will find that reading data in from one 16MB file is enormously faster than from 400 40KB files. Such small files are also quite wasteful of space, as the [https://en.wikipedia.org/wiki/Block_(data_storage) blocksize] for the scratch and project filesystems is 16MB. This is something you should keep in mind when planning your input/output strategy for runs on SciNet.&lt;br /&gt;
&lt;br /&gt;
For instance, if you run multi-process jobs, having each process write to a file of its own is not an scalable I/O solution. A directory gets locked by the first process accessing it, so all other processes have to wait for it. Not only has the code just become considerably less parallel, chances are the file system will have a time-out while waiting for your other processes, leading your program to crash mysteriously.&lt;br /&gt;
Consider using MPI-IO (part of the MPI-2 standard), which allows files to be opened simultaneously by different processes, or using a dedicated process for I/O to which all other processes send their data, and which subsequently writes this data to a single file.&lt;br /&gt;
&lt;br /&gt;
= Purpose of each file system =&lt;br /&gt;
&lt;br /&gt;
Niagara accesses several different file systems.  Note that not all of these file systems are available to all users.&lt;br /&gt;
&lt;br /&gt;
== /home ==&lt;br /&gt;
/home is intended primarily for individual user files, common software or small datasets used by others in the same group, provided it does not exceed individual quotas. Otherwise you may consider /scratch or /project. /home is read-only on the compute nodes.&lt;br /&gt;
&lt;br /&gt;
== /scratch ==&lt;br /&gt;
/scratch is to be used primarily for temporary or transient files, for all the results of your computations and simulations, or any material that can be easily recreated or reacquired. You may use scratch as well for any intermediate step in your workflow, provided it does not induce too much I/O (Input/Output) or too many small files on this disk-based storage pool, otherwise you should consider burst buffer (/bb). Once you have your final results, those that you want to keep for the long term, you may migrate them to /project or /archive. /scratch is purged on a regular basis and has no backups.&lt;br /&gt;
&lt;br /&gt;
== /project ==&lt;br /&gt;
/project is intended for common group software, large static datasets, or any material very costly to be reacquired or re-generated by the group. &amp;lt;font color=red&amp;gt;Material on /project is expected to be relatively immutable over time.&amp;lt;/font&amp;gt; Temporary or transient files should be kept on scratch, not project. High data turnover induces the consumption of a lot of tapes on the TSM backup system, long after this material has been deleted, due to backup retention policies and the extra versions kept of the same file. Users abusing the project file system and using it as scratch will be flagged and contacted. Note that on niagara /project is only available to groups with RAC allocation.&lt;br /&gt;
&lt;br /&gt;
== /bb (burst buffer) ==&lt;br /&gt;
/bb, the [[Burst_Buffer| burst buffer]], is a very fast, very high performance alternative to /scratch, made of solid-state drives (SSD). You may request this resource if you anticipate a lot of IOPs (Input/Output Operations) or when you notice your job is not performing well running on scratch or project because of I/O (Input/Output) bottlenecks. See [[Burst_Buffer|here]] for more details.&lt;br /&gt;
&lt;br /&gt;
== /archive ==&lt;br /&gt;
/archive is a nearline storage pool, if you want to temporarily offload semi-active material from any of the above file systems. In practice users will offload/recall material as part of their regular workflow, or when they hit their quotas on scratch or project. That material can remain on HPSS for a few months to a few years. Note that on niagara /archive is only available to groups with RAC allocation.&lt;br /&gt;
&lt;br /&gt;
= Quotas and purging =&lt;br /&gt;
&lt;br /&gt;
You should familiarize yourself with the [[Data_Management#Purpose_of_each_file_system | various file systems]], what purpose they serve, and how to properly use them.  This table summarizes the various file systems.  &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! location&lt;br /&gt;
!colspan=&amp;quot;2&amp;quot;| quota&lt;br /&gt;
!align=&amp;quot;right&amp;quot;| block size&lt;br /&gt;
! expiration time&lt;br /&gt;
! backed up&lt;br /&gt;
! on login nodes&lt;br /&gt;
! on compute nodes&lt;br /&gt;
|-&lt;br /&gt;
| $HOME&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 100 GB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| read-only&lt;br /&gt;
|-&lt;br /&gt;
|rowspan=&amp;quot;6&amp;quot;| $SCRATCH&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 25 TB per user (dynamic per group)&lt;br /&gt;
|align=&amp;quot;right&amp;quot; rowspan=&amp;quot;6&amp;quot; | 16 MB&lt;br /&gt;
|rowspan=&amp;quot;6&amp;quot;| 2 months&lt;br /&gt;
|rowspan=&amp;quot;6&amp;quot;| no&lt;br /&gt;
|rowspan=&amp;quot;6&amp;quot;| yes&lt;br /&gt;
|rowspan=&amp;quot;6&amp;quot;| yes&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|up to 4 users per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|50TB&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|up to 11 users per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|125TB&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|up to 28 users per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|250TB&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|up to 60 users per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|400TB&lt;br /&gt;
|-&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|above 60 users per group&lt;br /&gt;
|align=&amp;quot;right&amp;quot;|500TB&lt;br /&gt;
|-&lt;br /&gt;
| $PROJECT&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 16 MB&lt;br /&gt;
| &lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|-&lt;br /&gt;
| $ARCHIVE&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| by group allocation&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| &lt;br /&gt;
|&lt;br /&gt;
| dual-copy&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|-&lt;br /&gt;
| $BBUFFER&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot;| 10 TB per user&lt;br /&gt;
|align=&amp;quot;right&amp;quot;| 1 MB&lt;br /&gt;
| very short&lt;br /&gt;
| no&lt;br /&gt;
| yes&lt;br /&gt;
| yes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;ul&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[https://docs.scinet.utoronto.ca/images/9/9a/Inode_vs._Space_quota_-_v2x.pdf Inode vs. Space quota (PROJECT and SCRATCH)]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;[https://docs.scinet.utoronto.ca/images/0/0e/Scratch-quota.pdf dynamic quota per group (SCRATCH)]&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Compute nodes do not have local storage.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Archive space is on [[HPSS|HPSS]].&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;Backup means a recent snapshot, not a replica of all data or version that ever was.&amp;lt;/li&amp;gt;&lt;br /&gt;
&amp;lt;li&amp;gt;&amp;lt;p&amp;gt;&amp;lt;code&amp;gt;$BBUFFER&amp;lt;/code&amp;gt; stands for the [[Burst Buffer]], a faster parallel storage tier for temporary data.&amp;lt;/p&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How much Disk Space Do I have left? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;tt&amp;gt;'''/scinet/niagara/bin/diskUsage'''&amp;lt;/tt&amp;gt; command, available on the login nodes and datamovers, provides information in a number of ways on the home, scratch, project and archive file systems. For instance, how much disk space is being used by yourself and your group (with the -a option), or how much your usage has changed over a certain period (&amp;quot;delta information&amp;quot;) or you may generate plots of your usage over time. Please see the usage help below for more details.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Usage: diskUsage [-h|-?| [-a] [-u &amp;lt;user&amp;gt;]&lt;br /&gt;
       -h|-?: help&lt;br /&gt;
       -a: list usages of all members on the group&lt;br /&gt;
       -u &amp;lt;user&amp;gt;: as another user on your group&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Did you know that you can check which of your directories have more than 1000 files with the &amp;lt;tt&amp;gt;'''/scinet/niagara/bin/topUserDirOver1000list'''&amp;lt;/tt&amp;gt; command and which have more than 1GB of material with the &amp;lt;tt&amp;gt;'''/scinet/niagara/bin/topUserDirOver1GBlist'''&amp;lt;/tt&amp;gt; command?&lt;br /&gt;
&lt;br /&gt;
Note: information on usage and quota is only updated every 3 hours!&lt;br /&gt;
&lt;br /&gt;
== Scratch Disk Purging Policy ==&lt;br /&gt;
&lt;br /&gt;
In order to ensure that there is always sufficient space available for running jobs '''we automatically delete files in /scratch that have not been accessed or modified for more than 2 months by the 15th of each month'''. Note that we recently changed the reference time to be the ''MostRecentOf(atime,ctime)''. This policy is subject to revision depending on its effectiveness. More details about the purging process and how users can check if their files will be deleted follow. If you have files scheduled for deletion you should move them to a more permanent location, such as your departmental server, your /project space or into HPSS (for PIs who have either been allocated storage space by the RAC on project or HPSS).&lt;br /&gt;
&lt;br /&gt;
On the '''first''' of each month, a list of files scheduled for purging is produced, and an email notification is sent to each user on that list. Users also get a shell notification on every login to Niagara. Furthermore, at/or about the '''12th''' of each month a 2nd scan produces a more-current assessment and another email notification is sent. This way users can double check that they have indeed taken care of all the files they needed to relocate before the purging deadline. Those files will be automatically deleted on the '''15th''' of the same month unless they have been accessed or relocated in the interim. If you have files scheduled for deletion then they will be listed in a file in /scratch/t/todelete/current, which has your userid and groupid in the filename. For example, if user xxyz wants to check if they have files scheduled for deletion they can issue the following command on a system which mounts /scratch (e.g. a Niagara login node): '''ls -1 /scratch/t/todelete/current | grep xxyz'''. In the example below, the name of this file indicates that user xxyz is part of group abc, has 9,560 files scheduled for deletion and they take up 1.0TB of space:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
 [xxyz@nia-login03 ~]$ ls -1 /scratch/t/todelete/current |grep xxyz&lt;br /&gt;
 -rw-r----- 1 xxyz     root       1733059 Jan 17 11:46 3110001___xxyz_______abc_________1.00T_____9560files&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The file itself contains a list of all files scheduled for deletion (in the last column) and can be viewed with standard commands like more/less/cat - e.g. &lt;br /&gt;
&lt;br /&gt;
'''more /scratch/t/todelete/current/3110001___xxyz_______abc_________1.00T_____9560files'''&lt;br /&gt;
&lt;br /&gt;
Similarly, you can also verify all other users in your group by using the ls command with grep on your group. For example: '''ls -1 /scratch/t/todelete/current | grep abc'''. That will list all other users in the same group that xxyz is part of, and have files to be purged on the 15th. Members of the same group have access to each other's contents.&lt;br /&gt;
&lt;br /&gt;
'''NOTE:''' Preparing these assessments takes several hours. If you change the access/modification time of a file in the interim, that will not be detected until the next cycle. A way for you to get immediate feedback is to use the ''''ls -lu'''' command on the file to verify the ctime and ''''ls -lc'''' for the mtime. If the file atime/ctime has been updated in the meantime, come the purging date on the 15th it will no longer be deleted.&lt;br /&gt;
&lt;br /&gt;
= Moving data =&lt;br /&gt;
&lt;br /&gt;
Data for analysis and final results need to be moved to and from Niagara.  There are several ways to accomplish this.&lt;br /&gt;
&lt;br /&gt;
== Using rsync/scp ==&lt;br /&gt;
&lt;br /&gt;
Move amounts less than 10GB through the login nodes.&lt;br /&gt;
* Niagara login nodes and datamovers are visible from outside SciNet.&lt;br /&gt;
* Use scp or rsync to niagara.scinet.utoronto.ca or niagara.computecanada.ca (no difference).&lt;br /&gt;
* This will time out for amounts larger than about 10GB.&lt;br /&gt;
&lt;br /&gt;
Move amounts larger than 10GB through the datamover nodes.&lt;br /&gt;
* From a Niagara login node, ssh to &amp;lt;code&amp;gt;nia-datamover1&amp;lt;/code&amp;gt; or  &amp;lt;code&amp;gt;nia-datamover2&amp;lt;/code&amp;gt;.  From there you can transfer to or from Niagara.&lt;br /&gt;
* Alternatively, you may also login/scp/rsync directly to the datamovers from the outside:&lt;br /&gt;
  nia-datamover1.scinet.utoronto.ca&lt;br /&gt;
  nia-datamover2.scinet.utoronto.ca&lt;br /&gt;
* If you do this often, consider using [[https://docs.computecanada.ca/wiki/Globus Globus]], a web-based tool for data transfer.&lt;br /&gt;
&lt;br /&gt;
== Using Globus ==&lt;br /&gt;
Please check the comprehensive documentation [[https://docs.computecanada.ca/wiki/Globus here]], and [[Globus | here]].&lt;br /&gt;
&lt;br /&gt;
The Niagara endpoint is &amp;quot;computecanada#niagara&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Moving data to HPSS/Archive/Nearline ==&lt;br /&gt;
HPSS is for long-term storage of data.&lt;br /&gt;
* [[HPSS]] is a tape-based storage solution, and is SciNet's nearline a.k.a. archive facility.&lt;br /&gt;
* Storage space on HPSS is allocated through the annual [https://www.computecanada.ca/research-portal/accessing-resources/resource-allocation-competitions Compute Canada RAC allocation].&lt;br /&gt;
&lt;br /&gt;
=File/Ownership Management (ACL)=&lt;br /&gt;
* By default, at SciNet, users within the same group already have read permission to each other's files (not write)&lt;br /&gt;
* You may use access control list ('''ACL''') to allow your supervisor (or another user within your group) to manage files for you (i.e., create, move, rename, delete), while still retaining your access and permission as the original owner of the files/directories. You may also let users in other groups or whole other groups access (read, execute) your files using this same mechanism. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
===Using  setfacl/getfacl===&lt;br /&gt;
* To allow [supervisor] to manage files in /project/g/group/[owner] using '''setfacl''' and '''getfacl''' commands, follow the 3-steps below as the [owner] account from a shell:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ /scinet/gpc/bin/setfacl -d -m user:[supervisor]:rwx /project/g/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default from now on)&lt;br /&gt;
&lt;br /&gt;
2) $ /scinet/gpc/bin/setfacl -d -m user:[owner]:rwx /project/g/group/[owner]&lt;br /&gt;
   (but will also inherit [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
3) $ /scinet/gpc/bin/setfacl -Rm user:[supervisor]:rwx /project/g/group/[owner]&lt;br /&gt;
   (recursively modify all *existing* files/directories inside [owner] to also be rwx by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/getfacl /project/g/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ /scinet/gpc/bin/setfacl -b /project/g/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
PS: on the datamovers getfacl, setfacl and chacl will be on your path&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
For more information on using [http://linux.die.net/man/1/setfacl &amp;lt;tt&amp;gt;setfacl&amp;lt;/tt&amp;gt;] or [http://linux.die.net/man/1/getfacl &amp;lt;tt&amp;gt;getfacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
==Using mmputacl/mmgetacl==&lt;br /&gt;
* You may use gpfs' native '''mmputacl''' and '''mmgetacl''' commands. The advantages are that you can set &amp;quot;control&amp;quot; permission and that [http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.doc%2Fgpfs31%2Fbl1adm1160.html POSIX or NFS v4 style ACL] are supported. You will need first to create a /tmp/supervisor.acl file with the following contents:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
user::rwxc&lt;br /&gt;
group::----&lt;br /&gt;
other::----&lt;br /&gt;
mask::rwxc&lt;br /&gt;
user:[owner]:rwxc&lt;br /&gt;
user:[supervisor]:rwxc&lt;br /&gt;
group:[othegroup]:r-xc&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then issue the following 2 commands:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
1) $ mmputacl -i /tmp/supervisor.acl /project/g/group/[owner]&lt;br /&gt;
2) $ mmputacl -d -i /tmp/supervisor.acl /project/g/group/[owner]&lt;br /&gt;
   (every *new* file/directory inside [owner] will inherit [supervisor] ownership by default as well as &lt;br /&gt;
   [owner] ownership, ie, ownership of both by default, for files/directories created by [supervisor])&lt;br /&gt;
&lt;br /&gt;
   $ mmgetacl /project/g/group/[owner]&lt;br /&gt;
   (to determine the current ACL attributes)&lt;br /&gt;
&lt;br /&gt;
   $ mmdelacl -d /project/g/group/[owner]&lt;br /&gt;
   (to remove any previously set ACL)&lt;br /&gt;
&lt;br /&gt;
   $ mmeditacl /project/g/group/[owner]&lt;br /&gt;
   (to create or change a GPFS access control list)&lt;br /&gt;
   (for this command to work set the EDITOR environment variable: export EDITOR=/usr/bin/vi)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
NOTES:&lt;br /&gt;
* There is no option to recursively add or remove ACL attributes using a gpfs built-in command to existing files. You'll need to use the -i option as above for each file or directory individually. [[Recursive_ACL_script | Here is a sample bash script you may use for that purpose]]&lt;br /&gt;
&lt;br /&gt;
* mmputacl will not overwrite the original linux group permissions for a directory when copied to another directory already with ACLs, hence the &amp;quot;#effective:r-x&amp;quot; note you may see from time to time with mmgetacf. If you want to give rwx permissions to everyone in your group you should simply rely on the plain unix 'chmod g+rwx' command. You may do that before or after copying the original material to another folder with the ACLs.&lt;br /&gt;
&lt;br /&gt;
* In the case of PROJECT, your group's supervisor will need to set proper ACL to the /project/G/GROUP level in order to let users from other groups access your files.&lt;br /&gt;
&lt;br /&gt;
* ACL's won't let you give away permissions to files or directories that do not belong to you.&lt;br /&gt;
&lt;br /&gt;
* We highly recommend that you never give write permission to other users on the top level of your home directory (/home/G/GROUP/[owner]), since that would seriously compromise your privacy, in addition to disable ssh key authentication, among other things. If necessary, make specific sub-directories under your home directory so that other users can manipulate/access files from those.&lt;br /&gt;
&lt;br /&gt;
For more information on using [https://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs100.doc/bl1adm_mmputacl.htm &amp;lt;tt&amp;gt;mmputacl&amp;lt;/tt&amp;gt;] or [https://www.ibm.com/support/knowledgecenter/SSFKCN_4.1.0/com.ibm.cluster.gpfs.v4r1.gpfs100.doc/bl1adm_mmgetacl.htm &amp;lt;tt&amp;gt;mmgetacl&amp;lt;/tt&amp;gt;] see their man pages.&lt;br /&gt;
&lt;br /&gt;
==Recursive ACL script ==&lt;br /&gt;
You may use/adapt '''[[Recursive_ACL_script| this sample bash script]]''' to recursively add or remove ACL attributes using gpfs built-in commands&lt;br /&gt;
&lt;br /&gt;
Courtesy of Agata Disks (http://csngwinfo.in2p3.fr/mediawiki/index.php/GPFS_ACL)&lt;/div&gt;</summary>
		<author><name>Afedosee</name></author>
	</entry>
</feed>