Mist
Mist | |
---|---|
Installed | Dec 2019 |
Operating System | Red Hat Enterprise Linux 7.6 |
Number of Nodes | 54 IBM AC922 |
Interconnect | Mellanox EDR |
Ram/Node | 256 GB |
GPUs/Node | 4 V100-SMX2-32GB |
Login/Devel Node | mist.scinet.utoronto.ca |
Vendor Compilers | IBM XL |
Queue Submission | Slurm |
Warning
Mist is in early users/beta testing phase. All instructions below are temporary and subject to change.
Specifications
The Mist cluster is a GPU cluster of 54 IBM AC922 servers each with 32 IBM Power9 cores with 4 NVIDIA V100-SMX2-32GB GPU and NVLINKs in between. Each node of the cluster has 256GB RAM. It has InfiniBand EDR interconnection providing GPU-Direct RMDA capability.
Getting started on Mist
Currently Mist is under testing phase. Mist login node mist-login01 can be accessed via Niagara cluster.
ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca ssh -Y mist-login01
Storage
The filesystem for Mist is shared with Niagara cluster. See Niagara Storage for more details.
Loading software modules
You have two options for running code on Mist: use existing software, or compile your own. This section focuses on the former.
Other than essentials, all installed software is made available using module commands. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be found on the modules page.
Common module subcommands are:
module load <module-name>
: load the default version of a particular software.module load <module-name>/<module-version>
: load a specific version of a particular software.module purge
: unload all currently loaded modules.module spider
(ormodule spider <module-name>
): list available software packages.module avail
: list loadable software packages.module list
: list loaded modules.
Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.
There are handy abbreviations for the module commands. ml
is the same as module list
, and ml <module-name>
is the same as module load <module-name>
.
Tips for loading software
- We advise against loading modules in your .bashrc. This can lead to very confusing behaviour under certain circumstances. Our guidelines for .bashrc files can be found here.
- Instead, load modules by hand when needed, or by sourcing a separate script.
- Load run-specific modules inside your job submission script.
- Short names give default versions; e.g.
cuda
→cuda/10.1.243
. It is usually better to be explicit about the versions, for future reproducibility. - Modules often require other modules to be loaded first. Solve these dependencies by using
module spider
.
Available compilers and interpreters
- cuda module has to be loaded first for GPU softwares.
- For most compiled software, one should use the GNU compilers (gcc for C, g++ for C++, and gfortran for Fortran). Loading an at ( IBM Advance Toolchain) or gcc module makes these available.
- The IBM XL compiler suite (xlc_r, xlc++_r, xlf_r) is also available, if you load one of the xl modules.
- To compile mpi code, you must additionally load an openmpi or spectrummpi module.
CUDA
The current installed CUDA Tookits are 10.1.243 and 10.2.89 (default)
module load cuda/<version>
- A compiler (GCC, XL or PGI) module must be loaded in order to use CUDA to build any code.
The current NVIDIA driver version is 440.33.01.
GNU Compilers
Available GCC modules are:
gcc/7.5.0 gcc/8.3.0
IBM XL Compilers
To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run
module load xl/16.1.1.3
IBM XL Compilers are enabled for use with NVIDIA GPUs, including support for OpenMP GPU offloading and integration with NVIDIA's nvcc command to compile host-side code for the POWER9 CPU. Information about the IBM XL Compilers can be found at the following links:IBM XL C/C++, IBM XL Fortran
OpenMPI
openmpi/<version> module is avaiable with different compilers including GCC and XL. spectrum-mpi/<version> module provides IBM Spectrum MPI.
PGI
To load PGI compiler and its own OpenMPI environment, run:
module load pgi/19.10 module load openmpi/3.1.3-pgi-19.10
Softwares
Anaconda (Python)
Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation. Anaconda is provided as modules: anaconda3
To install Anaconda locally, user need to load the module and create a conda environment:
module load anaconda3 conda create -n myPythonEnv python=3.7
- Note: By default, conda environments are located in $HOME/.conda/envs. Cache (downloaded tarballs and packages) is under $HOME/.conda/pkgs. User may run into problem with disk quota if there are too many environments created. To clean conda cache, please run: "conda clean -y --all" and "rm -rf $HOME/.conda/pkgs/*" after installation of packages.
To activate the conda environment: (should be activated before running python)
source activate myPythonEnv
Note that you SHOULD NOT use conda activate myPythonEnv to activate the environment. This leads to all sorts of problems. Once the environment is activated, user can update or install packages via conda or pip
conda install <package_name> (preferred way to install packages) pip install <package_name>
To deactivate:
source deactivate
To remove a conda enviroment:
conda remove --name myPythonEnv --all
To verify that the environment was removed, run:
conda info --envs
Submitting Python Job
A single-gpu job example:
#!/bin/bash #SBATCH --nodes=1 #SBATCH --gpus-per-node=1 #SBATCH --time=1:00:0 #SBATCH -A <SOSCIP_PROJECT_ID> #For SOSCIP projects only module load anaconda3 source activate myPythonEnv python code.py ...
CuPy
CuPy is an open-source matrix library accelerated with NVIDIA CUDA. It also uses CUDA-related libraries including cuBLAS, cuDNN, cuRand, cuSolver, cuSPARSE, cuFFT and NCCL to make full use of the GPU architecture. CuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it. It supports a subset of numpy.ndarray interface.
CuPy can be install into any conda environment. Python packages: numpy, six and fastrlock are required. cuDNN and NCCL are optional.
module load anaconda3/2019.10 cuda/10.2.89 gcc/7.5.0 cudnn/7.6.5.32 nccl/2.5.6 conda create -n cupy-env python=3.7 numpy six fastrlock source activate cupy-env CFLAGS="-I$SCINET_CUDNN_ROOT/include -I$SCINET_NCCL_ROOT/include -I$SCINET_CUDA_ROOT/include" LDFLAGS="-L$SCINET_CUDNN_ROOT/lib64 -L$SCINET_NCCL_ROOT/lib" CUDA_PATH=$SCINET_CUDA_ROOT pip install cupy #building/installing CuPy will take a few minutes
IBM Watson Machine Learning Community Edition (PowerAI)
IBM Watson Machine Learning Community Edition (PowerAI) contains many popular ML packages including TensorFlow, PyTorch, XGBoost and RAPIDS. It is distributed through IBM Conda channel. To install packages from PowerAI, user needs to specify IBM Conda channel when using Anaconda.
module load anaconda3 conda create --name wmlce_env -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda <package_name> (e.g. powerai, tensorflow-gpu, keras, pytorch, powerai-rapids, py-xgboost-gpu, etc) source activate wmlce_env
NAMD
NAMD is a parallel, object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems.
v2.13
module load cuda/10.2.89 gcc/7.5.0 fftw/3.3.8 spectrum-mpi/10.3.1 namd/2.13
Running with one process per node
An example of the job script (using 1 node, one process per node, 32 CPU threads per process + 4 GPUs per process):
#!/bin/bash #SBATCH --time=20:00 #SBATCH --gpus-per-node=4 #SBATCH --ntasks=1 #SBATCH --nodes=1 #SBATCH -p compute_full_node module load cuda/10.2.89 gcc/7.5.0 fftw/3.3.8 spectrum-mpi/10.3.1 namd/2.13 scontrol show hostnames > nodelist-$SLURM_JOB_ID `which charmrun` -npernode 1 -hostfile nodelist-$SLURM_JOB_ID `which namd2` +setcpuaffinity +pemap 0-127:4 +idlepoll +ppn 32 +p $((32*SLURM_NTASKS)) stmv.namd
Running with one process per GPU
NAMD may scale better if using one process per GPU. Please do your own benchmark. An example of the job script (using 1 node, one process per GPU, 8 CPU threads per process):
#!/bin/sh #SBATCH --time=20:00 #SBATCH --gpus-per-node=4 #SBATCH --ntasks=4 #SBATCH --nodes=1 #SBATCH -p compute_full_node module load cuda/10.2.89 gcc/7.5.0 fftw/3.3.8 spectrum-mpi/10.3.1 namd/2.13 scontrol show hostnames > nodelist-$SLURM_JOB_ID `which charmrun` -npernode 4 -hostfile nodelist-$SLURM_JOB_ID `which namd2` +setcpuaffinity +pemap 0-127:4 +idlepoll +ppn 8 +p $((8*SLURM_NTASKS)) stmv.namd
PyTorch
Installing from IBM Conda Channel
The easiest way to install PyTorch on Mist is using IBM's Conda channel. User needs to prepare a conda environment with Python 3.6 or 3.7 and install PyTorch using IBM's Conda channel.
module load anaconda3 conda create -n pytorch_env python=3.7 source activate pytorch_env conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ pytorch
Once the installation finishes, please clean the cache:
conda clean -y --all rm -rf $HOME/.conda/pkgs/*
RAPIDS
The RAPIDS is a suite of open source software libraries that gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. The RAPIDS data science framework includes a collection of libraries: cuDF(GPU DataFrames), cuML(GPU Machine Learning Algorithms), cuStrings(GPU String Manipulation), etc.
Installing from IBM Conda Channel
The easiest way to install RAPIDS on Mist is using IBM's Conda channel. User needs to prepare a conda environment with Python 3.6 or 3.7 and install powerai-rapids using IBM's Conda channel.
module load anaconda3 conda create -n rapids_env python=3.7 source activate rapids_env conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ powerai-rapids
Once the installation finishes, please clean the cache:
conda clean -y --all rm -rf $HOME/.conda/pkgs/*
TensorFlow and Keras
Installing from IBM Conda Channel
The easiest way to install TensorFlow and Keras on Mist is using IBM's Conda channel. User needs to prepare a conda environment with Python 3.6 or 3.7 and install TensorFlow-gpu using IBM's Conda channel.
module load anaconda3 conda create -n tf_env python=3.7 source activate tf_env conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ tensorflow-gpu
Once the installation finishes, please clean the cache:
conda clean -y --all rm -rf $HOME/.conda/pkgs/*
Testing and debugging
You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.
- Small test jobs can be run on the login node. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than one gpu and a few cores.
- Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:
mist-login01:~$ debugjob --clean -g G
where G is the number of gpus, If G=1, this gives an interactive session for 2 hours, whereas G=4 gets you a single node with 4 gpus for 30 minutes, and with G=8 (the maximum) gets you 2 nodes each with 4 gpus for 30 minutes. The --clean argument is optional but recommended as it will start the session without any modules loaded, thus mimicking more closely what happens when you submit a job script.
Submitting jobs
Once you have compiled and tested your code or workflow on the Mist login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Mist's 53 compute nodes. When and where your job runs is determined by the scheduler.
Mist uses SLURM as its job scheduler. It is configured to allow only Single-GPU jobs and Full-node jobs (4 GPUs per node).
You submit jobs from a login node by passing a script to the sbatch command:
mist-login01:scratch$ sbatch jobscript.sh
This puts the job in the queue. It will run on the compute nodes in due course. In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).
Example job scripts can be found below. Keep in mind:
- Scheduling is by single gpu or by full node, so you ask only 1 gpu or 4 gpus per node.
- Your job's maximum walltime is 24 hours.
- Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).
- Compute nodes have no internet access.
- Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below).
SOSCIP Users
If you are working on a SOSCIP project, please contact soscip-support@scinet.utoronto.ca to have your user account added to SOSCIP project accounts. SOSCIP users need to submit jobs with additional SLURM flag:
#SBATCH -A <SOSCIP_PROJECT_ID>
Single-GPU job script
For a single GPU job, each will have a quarter of the node which is 1 GPU + 8/32 CPU Cores/Threads + ~58GB CPU memory. Users should never ask CPU or Memory explicitly. If running MPI program, user can set --ntasks to be the number of MPI ranks. It is suggested to use NVIDIA Multi-Process Service (MPS) if running multiple MPI ranks on one GPU.
#!/bin/bash #SBATCH --nodes=1 #SBATCH --gpus-per-node=1 #SBATCH --time=1:00:0 #SBATCH -A <SOSCIP_PROJECT_ID> #For SOSCIP projects only module load anaconda3 source activate conda_env python code.py ...
Full-node job script
Multi-GPU job should ask for a minimum of one full node. User need to specify "compute_full_node" partition in order to get all resource on a node.
- An example for a 2-node, 8-rank OpenMPI job: (Each rank binds to 1 GPU and 8 physical CPU cores in this case)
#!/bin/bash #SBATCH --nodes=2 #SBATCH --gpus-per-node=4 #SBATCH --ntasks=8 #SBATCH --time=1:00:00 #SBATCH -p compute_full_node #SBATCH -A <SOSCIP_PROJECT_ID> #For SOSCIP projects only module load cuda/10.2.89 gcc/7.5.0 openmpi/4.0.2 mpirun -bind-to core -map-by slot:PE=8 -report-bindings ./program