Balam
Balam | |
---|---|
Installed | October 2023 |
Operating System | Linux (Rocky 9.2) |
Number of Nodes | 10 |
Interconnect | Infiniband |
Ram/Node | 1 TB |
Cores/Node | 64 |
GPUs/Node | 4 A100-40GB |
Login/Devel Node | balam-login01 |
Vendor Compilers | cuda/intel/gcc |
Queue Submission | slurm |
Specifications
The Balam cluster is owned by the Acceleration Consortium at the University of Toronto, and hosted at SciNet The cluster consists 10 x86_64 nodes each with two Intel Xeon(R) Platinum 8358 32-core CPUs running at 2.6GHz with 1 TB of RAM and four NVIDIA A100 GPUs per node.
The nodes are interconnected with Infiniband for internode communications and disk I/O to the SciNet Niagara file systems. In total this cluster contains 640 CPU cores and 40 GPUs.
Access is available only to those affiliated with the Acceleration Consortium. Support requests should be sent to support@scinet.utoronto.ca.
Getting started on Balam
Balam can be accessed directly.
ssh -Y MYCCUSERNAME@balam.scinet.utoronto.ca
Or, the Balam login node balam-login01 can be accessed via the Niagara cluster.
ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca ssh -Y balam-login01
Storage
The filesystem for Balam is currently shared with Niagara cluster. See Niagara Storage for more details.
Loading software modules
You have two options for running code on Balam: use existing software, or compile your own. This section focuses on the former.
Other than essentials, all installed software is made available using module commands. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be found on the modules page.
Common module subcommands are:
module load <module-name>
: load the default version of a particular software.module load <module-name>/<module-version>
: load a specific version of a particular software.module purge
: unload all currently loaded modules.module spider
(ormodule spider <module-name>
): list available software packages.module avail
: list loadable software packages.module list
: list loaded modules.
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a MODULE_MODULENAME_PREFIX environment variable, which can be used to access commonly needed software directories, such as /include and /lib.
There are handy abbreviations for the module commands. ml
is the same as module list
, and ml <module-name>
is the same as module load <module-name>
.
Software stacks: BalamEnv and CCEnv
On Balam, there are two available software stacks:
BalamEnv
A software stack with Modules specific to Balam tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with
module load BalamEnv
This loads the default (set of modules), which is currently the 2023a epoch.
No modules are loaded by default on Balam except BalamEnv.
CCEnv
The same software stack available on Alliance (formerly Compute Canada)'s General Purpose clusters too, with:
module load CCEnv
Or, if you want the same default modules loaded as on Béluga and Narval, then do
module load CCEnv StdEnv
or, if you want the same default modules loaded as on Cedar and Graham, do
module load CCEnv arch/avx2 StdEnv/2020
Available compilers and interpreters
- In the BalamEnv, the cuda module has to be loaded first for GPU software.
- To compile mpi code, you must additionally load an openmpi module.
CUDA
The current installed CUDA versions are 11.8.0 and 12.3.1
module load cuda/<version>
The current NVIDIA driver version is 535.104.12. Use nvidia-smi -a for full details.
Other Compilers and Tools
Other available compiler modules are:
gcc/12.3.0
GNU Compiler Collection, compatible with CUDA/12.3
<coda>gcc/13.2.0 GNU Compiler Collection, incompatible with CUDA/12.3
intel/2023u1
Intel compiler suite
OpenMPI
openmpi/5.0.0 module is available once gcc/13.2.0 is loaded.