Difference between revisions of "Mist"

From SciNet Users Documentation
Jump to: navigation, search
m (Installing from IBM Conda Channel)
m (TensorFlow and Keras)
 
(92 intermediate revisions by 3 users not shown)
Line 3: Line 3:
 
|name=Mist
 
|name=Mist
 
|installed=Dec 2019
 
|installed=Dec 2019
|operatingsystem= Red Hat Enterprise Linux 7.6
+
|operatingsystem= Red Hat Enterprise Linux 8.2
 
|loginnode= mist.scinet.utoronto.ca
 
|loginnode= mist.scinet.utoronto.ca
 
|nnodes=  54 IBM AC922
 
|nnodes=  54 IBM AC922
Line 15: Line 15:
 
=Specifications=
 
=Specifications=
 
Mist is a SciNet-[[#SOSCIP Users |SOSCIP]] joint GPU cluster consisting of 54 IBM AC922 servers. Each node of the cluster has 32 IBM Power9 cores, 256GB RAM and 4 NVIDIA V100-SMX2-32GB GPU with NVLINKs in between. The cluster has InfiniBand EDR interconnection providing GPU-Direct RMDA capability.
 
Mist is a SciNet-[[#SOSCIP Users |SOSCIP]] joint GPU cluster consisting of 54 IBM AC922 servers. Each node of the cluster has 32 IBM Power9 cores, 256GB RAM and 4 NVIDIA V100-SMX2-32GB GPU with NVLINKs in between. The cluster has InfiniBand EDR interconnection providing GPU-Direct RMDA capability.
 +
 +
'''<span style="background:#fc8383">Important note:</span>''' the majority of computer systems as of 2021 (laptops, desktops, and HPC) use the 64 bit x86 instruction set architecture (ISA) in their microprocessors produced by Intel and AMD. This ISA is incompatible with Mist, whose hardware uses the 64 bit PPC ISA (set to little endian mode). The practical meaning is that x86-compiled binaries (executables and libraries) cannot be installed on Mist. For this reason, the Niagara and Compute Canada software stacks (modules) cannot be made available on Mist, and using closed-source software is only possible when the vendor provides a compatible version of their application. '''Python applications''' almost always rely on bindings to libraries originally written in C or C++, some of them are not available on PyPI or various Conda channels as precompiled binaries compatible with Mist. The recommended way to use Python on Mist is to create a [[#Anaconda (Python)|Conda]] environment and install packages from the anaconda (default) channel, where most popular packages have a linux-ppc64le (Mist-compatible) version available. Some popular machine learning packages should be installed from the internal [[#Open-CE|Open-CE]] channel. Where a compatible Conda package cannot be found, installing from PyPI (<code>pip install</code>) can be attempted. Pip will attempt to compile the package’s source code if no compatible precompiled wheel is available, therefore a compiler module (such as <code>gcc/.core</code>) should be loaded in advance. Some packages require tweaking of the source code or build procedure to successfully compile on Mist, please contact [[#Support|support]] if you need assistance.
  
 
= Getting started on Mist =
 
= Getting started on Mist =
Line 33: Line 35:
 
You have two options for running code on Mist: use existing software, or compile your own.  This section focuses on the former.
 
You have two options for running code on Mist: use existing software, or compile your own.  This section focuses on the former.
  
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available.  A detailed explanation of the module system can be [[Using_modules | found on the modules page]].
+
Other than essentials, all installed software is made available [[Using_modules | using module commands]]. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available.  A detailed explanation of the module system can be [[Using_modules | found on the modules page]] and a list of [[Modules for Mist]] is also available.
  
 
Common module subcommands are:
 
Common module subcommands are:
Line 52: Line 54:
 
* Instead, load modules by hand when needed, or by sourcing a separate script.
 
* Instead, load modules by hand when needed, or by sourcing a separate script.
 
* Load run-specific modules inside your job submission script.
 
* Load run-specific modules inside your job submission script.
* Short names give default versions; e.g. <code>cuda</code> → <code>cuda/10.1.243</code>. It is usually better to be explicit about the versions, for future reproducibility.
+
* Short names give default versions; e.g. <code>cuda</code> → <code>cuda/11.0.3</code>. It is usually better to be explicit about the versions, for future reproducibility.
 
* Modules often require other modules to be loaded first.  Solve these dependencies by using [[Using_modules#Module_spider | <code>module spider</code>]].
 
* Modules often require other modules to be loaded first.  Solve these dependencies by using [[Using_modules#Module_spider | <code>module spider</code>]].
  
 
= Available compilers and interpreters =
 
= Available compilers and interpreters =
* <tt>cuda</tt> module has to be loaded first for GPU softwares.
+
* <tt>cuda</tt> module has to be loaded first for GPU software.
 
* For most compiled software, one should use the GNU compilers (<tt>gcc</tt> for C, <tt>g++</tt> for C++, and <tt>gfortran</tt> for Fortran). Loading <tt>gcc</tt> module makes these available.  
 
* For most compiled software, one should use the GNU compilers (<tt>gcc</tt> for C, <tt>g++</tt> for C++, and <tt>gfortran</tt> for Fortran). Loading <tt>gcc</tt> module makes these available.  
 
* The IBM XL compiler suite (<tt>xlc_r, xlc++_r, xlf_r</tt>) is also available, if you load one of the <tt>xl</tt> modules.
 
* The IBM XL compiler suite (<tt>xlc_r, xlc++_r, xlf_r</tt>) is also available, if you load one of the <tt>xl</tt> modules.
Line 63: Line 65:
 
=== CUDA ===
 
=== CUDA ===
  
The current installed CUDA Tookits are '''10.1.243''' and '''10.2.89 (default)'''
+
The current installed CUDA Tookits are '''11.0.3''' and '''10.2.2 (10.2.89)'''
 
<pre>
 
<pre>
module load cuda/<version>
+
module load cuda/11.0.3
 +
module load cuda/10.2.2
 
</pre>
 
</pre>
*A compiler (GCC, XL or PGI) module must be loaded in order to use CUDA to build any code.
+
*A compiler (GCC, XL or NVHPC/PGI) module must be loaded in order to use CUDA to build any code.
The current NVIDIA driver version is 440.33.01.
+
The current NVIDIA driver version is 450.119.04.
  
 
===GNU Compilers ===
 
===GNU Compilers ===
Line 74: Line 77:
 
Available GCC modules are:
 
Available GCC modules are:
 
<pre>
 
<pre>
gcc/7.5.0
+
gcc/9.3.0 (must load CUDA 11)
gcc/8.4.0
+
gcc/8.5.0 (must load CUDA 10)
 +
gcc/10.3.0 (w/o CUDA)
 
</pre>
 
</pre>
  
Line 83: Line 87:
  
 
<pre>
 
<pre>
module load xl/16.1.1.3
+
module load xl/16.1.1.10
 
</pre>
 
</pre>
  
Line 92: Line 96:
 
<tt>openmpi/<version></tt> module is avaiable with different compilers including GCC and XL. <tt>spectrum-mpi/<version></tt> module provides IBM Spectrum MPI.
 
<tt>openmpi/<version></tt> module is avaiable with different compilers including GCC and XL. <tt>spectrum-mpi/<version></tt> module provides IBM Spectrum MPI.
  
=== PGI ===
+
=== NVHPC/PGI ===
To load PGI compiler and its own OpenMPI environment, run:
+
PGI compiler is provided in NVHPC (NVIDIA HPC SDK).
 
<pre>
 
<pre>
module load pgi/19.10
+
module load nvhpc/21.3
module load pgi-openmpi/3.1.3
 
 
</pre>
 
</pre>
  
= Softwares =
+
= Software =
 
== Amber20 ==
 
== Amber20 ==
  
Line 107: Line 110:
 
Modules that are needed for building Amber20:
 
Modules that are needed for building Amber20:
 
<pre>
 
<pre>
1) MistEnv/2020a (S)  2) cuda/10.2.89  3) gcc/8.4.0  4) cmake/3.16.3  5) openmpi/4.0.3  6) anaconda3/2019.10  7) nccl/2.5.6
+
module load MistEnv/2021a cuda/10.2.2 gcc/8.5.0 anaconda3/2021.05 cmake/3.19.8
 
</pre>
 
</pre>
 
Cmake configuration:
 
Cmake configuration:
 
<pre>
 
<pre>
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/where-amber-install -DCOMPILER=GNU -DMPI=TRUE -DCUDA=TRUE -DINSTALL_TESTS=TRUE -DDOWNLOAD_MINICONDA=FALSE -DOPENMP=TRUE -DNCCL=TRUE -DAPPLY_UPDATES=TRUE
+
cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/where-amber-install -DCOMPILER=GNU -DMPI=FALSE -DCUDA=TRUE -DINSTALL_TESTS=TRUE -DDOWNLOAD_MINICONDA=FALSE -DOPENMP=TRUE -DNCCL=FALSE -DAPPLY_UPDATES=TRUE
 
</pre>
 
</pre>
  
 
=== Running Amber20 ===
 
=== Running Amber20 ===
'''NVIDIA Pascal and later GPUs do not scale beyond a single GPU'''. It is highly suggest to run Amber20 as a single-gpu job.
+
'''NVIDIA Pascal P100 and later GPUs like V100 do not scale beyond a single GPU'''. It is highly suggested to run Amber20 as a single-gpu job.
 
A job example:
 
A job example:
 
<pre>
 
<pre>
Line 122: Line 125:
 
#SBATCH --gpus-per-node=1
 
#SBATCH --gpus-per-node=1
 
#SBATCH --time=1:00:0
 
#SBATCH --time=1:00:0
#SBATCH -A soscip-<SOSCIP-project-ID>
+
#SBATCH --account=soscip-<SOSCIP-project-ID>
  
module load cuda/10.2.89 gcc/8.4.0 openmpi/4.0.3 nccl/2.5.6
+
module load MistEnv/2021a cuda/10.2.2 gcc/8.5.0 anaconda3/2021.05
 
export PATH=$HOME/where-amber-install/bin:$PATH
 
export PATH=$HOME/where-amber-install/bin:$PATH
 
export LD_LIBRARY_PATH=$HOME/where-amber-install/lib:$LD_LIBRARY_PATH
 
export LD_LIBRARY_PATH=$HOME/where-amber-install/lib:$LD_LIBRARY_PATH
Line 136: Line 139:
 
<pre>
 
<pre>
 
module load anaconda3
 
module load anaconda3
conda create -n myPythonEnv python=3.7
+
conda create -n myPythonEnv python=3.8
 
</pre>
 
</pre>
 
*Note: By default, conda environments are located in '''$HOME/.conda/envs'''. Cache (downloaded tarballs and packages) is under '''$HOME/.conda/pkgs'''. User may run into problem with disk quota if there are too many environments created. To clean conda cache, '''please run: "conda clean -y --all" and "rm -rf $HOME/.conda/pkgs/*" after installation of packages'''.
 
*Note: By default, conda environments are located in '''$HOME/.conda/envs'''. Cache (downloaded tarballs and packages) is under '''$HOME/.conda/pkgs'''. User may run into problem with disk quota if there are too many environments created. To clean conda cache, '''please run: "conda clean -y --all" and "rm -rf $HOME/.conda/pkgs/*" after installation of packages'''.
Line 148: Line 151:
 
conda install  <package_name> (preferred way to install packages)
 
conda install  <package_name> (preferred way to install packages)
 
pip install <package_name>
 
pip install <package_name>
 +
</pre>
 +
*Once the installation finishes, please clean the cache:
 +
<pre>
 +
conda clean -y --all
 +
rm -rf $HOME/.conda/pkgs/*
 
</pre>
 
</pre>
 
To deactivate:
 
To deactivate:
Line 169: Line 177:
 
#SBATCH --gpus-per-node=1
 
#SBATCH --gpus-per-node=1
 
#SBATCH --time=1:00:0
 
#SBATCH --time=1:00:0
#SBATCH -A soscip-<SOSCIP_PROJECT_ID> #For SOSCIP projects only
+
#SBATCH --account=soscip-<SOSCIP_PROJECT_ID> #For SOSCIP projects only
  
 
module load anaconda3
 
module load anaconda3
Line 181: Line 189:
 
CuPy can be install into any conda environment. Python packages: numpy, six and fastrlock are required. cuDNN and NCCL are optional.
 
CuPy can be install into any conda environment. Python packages: numpy, six and fastrlock are required. cuDNN and NCCL are optional.
 
<pre>
 
<pre>
module load anaconda3/2019.10 cuda/10.2.89 gcc/7.5.0 cudnn/7.6.5.32 nccl/2.5.6
+
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0  nccl/2.9.9 anaconda3/2021.05
conda create -n cupy-env python=3.7 numpy six fastrlock
+
conda create -n cupy-env python=3.8 numpy six fastrlock
 
source activate cupy-env
 
source activate cupy-env
CFLAGS="-I$SCINET_CUDNN_ROOT/include -I$SCINET_NCCL_ROOT/include -I$SCINET_CUDA_ROOT/include" LDFLAGS="-L$SCINET_CUDNN_ROOT/lib64 -L$SCINET_NCCL_ROOT/lib" CUDA_PATH=$SCINET_CUDA_ROOT pip install cupy
+
CFLAGS="-I$MODULE_CUDNN_PREFIX/include -I$MODULE_NCCL_PREFIX/include -I$MODULE_CUDA_PREFIX/include" LDFLAGS="-L$MODULE_CUDNN_PREFIX/lib64 -L$MODULE_NCCL_PREFIX/lib" CUDA_PATH=$MODULE_CUDA_PREFIX pip install cupy
 
#building/installing CuPy will take a few minutes
 
#building/installing CuPy will take a few minutes
 
</pre>
 
</pre>
Line 191: Line 199:
 
[http://www.gromacs.org/ GROMACS] is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
 
[http://www.gromacs.org/ GROMACS] is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.
 
<pre>
 
<pre>
module load cuda/10.2.89  gcc/8.3.0 openmpi/3.1.5 gromacs/2019.5
+
module load MistEnv/2021a cuda/10.2.2 gcc/8.5.0 gromacs/2019.6
module load cuda/10.2.89 gcc/8.3.0 openmpi/3.1.5 gromacs/2019.6
+
module load MistEnv/2020a cuda/10.2.89 gcc/8.3.0 openmpi/3.1.5 gromacs/2019.6 (old RHEL 7 version for testing only)
 
</pre>
 
</pre>
*'''GROMACS 2020''' Thread-MPI version supports full GPU enablement of all key computational sections. The GPU is used throughout the timestep and repeated CPU-GPU transfers are eliminated. '''Currently only single-GPU is supported on Mist'''. Users are suggested to carefully verify the results.
+
*'''GROMACS 2020 and 2021''' Thread-MPI version supports full GPU enablement of all key computational sections. The GPU is used throughout the timestep and repeated CPU-GPU transfers are eliminated. Users are suggested to carefully verify the results.
 
<pre>
 
<pre>
module load cuda/10.2.89 gcc/8.4.0 openmpi/4.0.3 gromacs/2020.4
+
module load MistEnv/2021a cuda/10.2.2 gcc/8.5.0 gromacs/2020.6
 +
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 gromacs/2021.2
 +
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 openmpi/4.1.1+ucx-1.10.0 gromacs/2021.2
 
</pre>
 
</pre>
 
=== Small/Medium Simulation ===
 
=== Small/Medium Simulation ===
 
Due to the lack of PME domain decomposition support on GPU, Gromacs uses CPU to calculate PME when using multiple GPUs. '''It is always recommended to use a single GPU to do small and medium sized simulations with Gromacs.''' By using only 1 MPI rank (w/ OpenMP threads) on a single GPU, both non-bonded PP and PME are atomically offloaded to GPU when possible.
 
Due to the lack of PME domain decomposition support on GPU, Gromacs uses CPU to calculate PME when using multiple GPUs. '''It is always recommended to use a single GPU to do small and medium sized simulations with Gromacs.''' By using only 1 MPI rank (w/ OpenMP threads) on a single GPU, both non-bonded PP and PME are atomically offloaded to GPU when possible.
* A Single-GPU Gromacs job must ask '''--ntasks=32''' even only 1 MPI rank will be launched by mpirun command. '''OMP_PLACES''' must be set to core to force OpenMP threads on physical CPU cores. '''-bind-to none''' and '''-pin off''' must be set to avoid CPU affiliate conflicts among OpenMP, MPI and Gromacs. '''OMP_NUM_THREADS''' must be set to 8 to get optimal performance.
+
* Gromacs 2019 example:
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
 
#SBATCH --time=20:00
 
#SBATCH --time=20:00
 +
#SBATCH --nodes=1
 
#SBATCH --gpus-per-node=1
 
#SBATCH --gpus-per-node=1
#SBATCH --ntasks=32
 
  
module load cuda/10.2.89  gcc/8.3.0 openmpi/3.1.5 gromacs/2019.6
+
module load MistEnv/2021a cuda/10.2.2 gcc/8.5.0 gromacs/2019.6
 
export OMP_NUM_THREADS=8
 
export OMP_NUM_THREADS=8
 
export OMP_PLACES=cores
 
export OMP_PLACES=cores
mpirun -np 1 -bind-to none gmx_mpi mdrun -pin off -ntomp 8 ... <other parameters>
+
gmx mdrun -pin off -ntmpi 1 -ntomp 8 ... <other parameters>
 
</pre>
 
</pre>
* Groamcs 2020 example: (OpenMPI module should to be loaded, but mpirun should NOT be used)
+
 
 +
* Gromacs 2020 or 2021 example:  
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
Line 219: Line 230:
 
#SBATCH --gpus-per-node=1
 
#SBATCH --gpus-per-node=1
  
module load cuda/10.2.89 gcc/8.4.0 openmpi/4.0.3 gromacs/2020.4
+
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 gromacs/2021.2
 
export OMP_NUM_THREADS=8
 
export OMP_NUM_THREADS=8
 
export OMP_PLACES=cores
 
export OMP_PLACES=cores
Line 236: Line 247:
 
#SBATCH -p compute_full_node
 
#SBATCH -p compute_full_node
  
module load cuda/10.2.89 gcc/8.3.0  openmpi/3.1.5 gromacs/2019.6
+
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0  openmpi/4.1.1+ucx-1.10.0 gromacs/2021.2
  
 
mkdir -p /dev/shm/nvidia-mps
 
mkdir -p /dev/shm/nvidia-mps
Line 266: Line 277:
 
#SBATCH -p compute_full_node
 
#SBATCH -p compute_full_node
  
module load cuda/10.2.89 gcc/8.3.0  openmpi/3.1.5 gromacs/2019.6
+
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0  openmpi/4.1.1+ucx-1.10.0 gromacs/2021.2
  
 
mkdir -p /dev/shm/nvidia-mps
 
mkdir -p /dev/shm/nvidia-mps
Line 282: Line 293:
 
*'''NOTE: The above examples will NOT work with multiple nodes. If simulation is too large for a single GPU node, please contact SciNet/SOSCIP support.'''
 
*'''NOTE: The above examples will NOT work with multiple nodes. If simulation is too large for a single GPU node, please contact SciNet/SOSCIP support.'''
  
== IBM Watson Machine Learning Community Edition (PowerAI) ==
+
== NAMD ==
[https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/ IBM Watson Machine Learning Community Edition (PowerAI)] contains many popular ML packages including TensorFlow, PyTorch, XGBoost and RAPIDS. It is distributed through IBM Conda channel. To install packages from PowerAI, user needs to specify IBM Conda channel when using Anaconda.
+
[http://www.ks.uiuc.edu/Research/namd/ NAMD] is a parallel, object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems.
 +
=== 2.14 ===
 +
<pre>
 +
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 spectrum-mpi/10.4.0 namd/2.14
 +
</pre>
 +
==== Running with single GPU ====
 +
If you have many jobs to run, it is always suggested to run with a single gpu per job. This makes jobs easier to be scheduled and gives better overall performance.
 
<pre>
 
<pre>
module load anaconda3
+
#!/bin/bash
 +
#SBATCH --time=20:00
 +
#SBATCH --gpus-per-node=1
 +
#SBATCH --nodes=1
  
conda create --name wmlce_env -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda <package_name> (e.g. powerai, tensorflow-gpu, keras, pytorch, powerai-rapids, py-xgboost-gpu,  etc)
+
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 spectrum-mpi/10.4.0 namd/2.14
 +
scontrol show hostnames > nodelist-$SLURM_JOB_ID
  
source activate wmlce_env
+
`which charmrun` -npernode 1 -bind-to none -hostfile nodelist-$SLURM_JOB_ID `which namd2` +idlepoll +ppn 8 +p 8 stmv.namd
 
</pre>
 
</pre>
*The WML CE Early Access Conda channel (https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda-early-access/) makes new versions of frameworks available in advance of formal WML CE releases. Easy upgrade between packages in the main and Early Access channels is not guaranteed. Using a separate conda environment for Early Access packages is recommended.
 
  
== NAMD ==
+
==== Running with one process per node (4 GPUs)====
[http://www.ks.uiuc.edu/Research/namd/ NAMD] is a parallel, object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems.
 
=== v2.13 ===
 
<pre>
 
module load cuda/10.2.89 gcc/7.5.0 fftw/3.3.8 spectrum-mpi/10.3.1  namd/2.13
 
</pre>
 
==== Running with one process per node====
 
 
An example of the job script (using 1 node, '''one process per node''',  32 CPU threads per process + 4 GPUs per process):
 
An example of the job script (using 1 node, '''one process per node''',  32 CPU threads per process + 4 GPUs per process):
 
<pre>
 
<pre>
Line 309: Line 323:
 
#SBATCH -p compute_full_node
 
#SBATCH -p compute_full_node
  
module load cuda/10.2.89 gcc/7.5.0 fftw/3.3.8 spectrum-mpi/10.3.namd/2.13
+
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 spectrum-mpi/10.4.0 namd/2.14
 
scontrol show hostnames > nodelist-$SLURM_JOB_ID
 
scontrol show hostnames > nodelist-$SLURM_JOB_ID
  
 
`which charmrun` -npernode 1 -hostfile nodelist-$SLURM_JOB_ID `which namd2` +setcpuaffinity +pemap 0-127:4 +idlepoll +ppn 32 +p $((32*SLURM_NTASKS)) stmv.namd
 
`which charmrun` -npernode 1 -hostfile nodelist-$SLURM_JOB_ID `which namd2` +setcpuaffinity +pemap 0-127:4 +idlepoll +ppn 32 +p $((32*SLURM_NTASKS)) stmv.namd
 
</pre>
 
</pre>
==== Running with one process per GPU ====
+
==== Running with one process per GPU (4 GPUs)====
 
NAMD may scale better if using '''one process per GPU'''. Please do your own benchmark.
 
NAMD may scale better if using '''one process per GPU'''. Please do your own benchmark.
 
An example of the job script (using 1 node, '''one process per GPU''',  8 CPU threads per process):
 
An example of the job script (using 1 node, '''one process per GPU''',  8 CPU threads per process):
Line 325: Line 339:
 
#SBATCH -p compute_full_node
 
#SBATCH -p compute_full_node
  
module load cuda/10.2.89 gcc/7.5.0 fftw/3.3.8 spectrum-mpi/10.3.namd/2.13
+
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 spectrum-mpi/10.4.0 namd/2.14
 
scontrol show hostnames > nodelist-$SLURM_JOB_ID
 
scontrol show hostnames > nodelist-$SLURM_JOB_ID
  
 
`which charmrun` -npernode 4 -hostfile nodelist-$SLURM_JOB_ID `which namd2` +setcpuaffinity +pemap 0-127:4 +idlepoll +ppn 8 +p $((8*SLURM_NTASKS)) stmv.namd
 
`which charmrun` -npernode 4 -hostfile nodelist-$SLURM_JOB_ID `which namd2` +setcpuaffinity +pemap 0-127:4 +idlepoll +ppn 8 +p $((8*SLURM_NTASKS)) stmv.namd
 +
</pre>
 +
 +
== Open-CE ==
 +
[https://github.com/open-ce/open-ce Open-CE] is an '''IBM''' repo for feedstock collection, environment data, and scripts for building Tensorflow, Pytorch, XGBoost, and other related packages and dependencies. Open-CE is distributed as a '''conda channel''' on Mist cluster.
 +
'''Available packages and versions are listed here [https://github.com/open-ce/open-ce/releases Open-CE Releases]'''. Currently only python 3.7 and 3.8 are supported. Packages are built with CUDA 11.2 (only with Open-CE 1.3), 11.0 and 10.2.
 +
 +
*Packages can be installed by setting Open-CE conda channel:
 +
<pre>
 +
conda install -c /scinet/mist/ibm/open-ce/1.3 python=3.8 cudatoolkit=11.2 PACKAGE
 +
or
 +
conda install -c /scinet/mist/ibm/open-ce/1.2 python=3.8 cudatoolkit=11.0 PACKAGE
 +
</pre>
 +
*Once the installation finishes, please clean the cache:
 +
<pre>
 +
conda clean -y --all
 +
rm -rf $HOME/.conda/pkgs/*
 
</pre>
 
</pre>
  
 
== PyTorch ==
 
== PyTorch ==
=== Installing from IBM Conda Channel ===
+
=== Installing from IBM Open-CE Conda Channel ===
The easiest way to install PyTorch on Mist is using IBM's Conda channel. User needs to prepare a conda environment with Python 3.6 or 3.7 and install PyTorch using IBM's Conda channel.
+
The easiest way to install PyTorch on Mist is using IBM's Conda channel. User needs to prepare a conda environment and install PyTorch using IBM's Open-CE Conda channel.
 
<pre>
 
<pre>
 
module load anaconda3
 
module load anaconda3
conda create -n pytorch_env python=3.7
+
conda create -n pytorch_env python=3.8 (or 3.7)
 
source activate pytorch_env
 
source activate pytorch_env
conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ pytorch=1.3.1  
+
conda install -c /scinet/mist/ibm/open-ce/1.3 pytorch=1.8.1 cudatoolkit=11.2 (or 10.2)
FOR NEWER VERSIONS:
+
or
conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda-early-access/ pytorch=1.5.0
+
conda install -c /scinet/mist/ibm/open-ce/1.2 pytorch=1.7.1 cudatoolkit=11.0 (or 10.2)
 
</pre>
 
</pre>
 
Once the installation finishes, please clean the cache:
 
Once the installation finishes, please clean the cache:
Line 346: Line 376:
 
conda clean -y --all
 
conda clean -y --all
 
rm -rf $HOME/.conda/pkgs/*
 
rm -rf $HOME/.conda/pkgs/*
 +
</pre>
 +
Add below command into your job script before python command to get deterministic results, see details here: [https://github.com/pytorch/pytorch/issues/39849]
 +
<pre>
 +
export CUBLAS_WORKSPACE_CONFIG=:4096:2
 
</pre>
 
</pre>
  
Line 357: Line 391:
 
conda create -n rapids_env python=3.7
 
conda create -n rapids_env python=3.7
 
source activate rapids_env
 
source activate rapids_env
conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ powerai-rapids
+
conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda-early-access/ powerai-rapids
 
</pre>
 
</pre>
 
Once the installation finishes, please clean the cache:
 
Once the installation finishes, please clean the cache:
Line 367: Line 401:
 
== TensorFlow and Keras ==
 
== TensorFlow and Keras ==
 
=== Installing from IBM Conda Channel ===
 
=== Installing from IBM Conda Channel ===
The easiest way to install TensorFlow and Keras on Mist is using IBM's Conda channel. User needs to prepare a conda environment with Python 3.6 or 3.7 and install TensorFlow-gpu using IBM's Conda channel.
+
The easiest way to install TensorFlow and Keras on Mist is using IBM's Open-CE Conda channel. User needs to prepare a conda environment and install TensorFlow using IBM's Open-CE Conda channel.
 
<pre>
 
<pre>
 
module load anaconda3
 
module load anaconda3
conda create -n tf_env python=3.7
+
conda create -n tf_env python=3.8 (or 3.7)
 
source activate tf_env
 
source activate tf_env
conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ tensorflow-gpu==2.1.2
+
conda install -c /scinet/mist/ibm/open-ce/1.3 tensorflow==2.5.1 cudatoolkit=11.2 (or 10.2)
If you need TF 1.x version:
+
or
conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/ tensorflow-gpu==1.15.4
+
conda install -c /scinet/mist/ibm/open-ce/1.2 tensorflow==2.4.3 cudatoolkit=11.0 (or 10.2)
 
</pre>
 
</pre>
 
Once the installation finishes, please clean the cache:
 
Once the installation finishes, please clean the cache:
Line 426: Line 460:
 
#SBATCH --gpus-per-node=1
 
#SBATCH --gpus-per-node=1
 
#SBATCH --time=1:00:0
 
#SBATCH --time=1:00:0
#SBATCH -A soscip-<SOSCIP_PROJECT_ID> #For SOSCIP projects only
+
#SBATCH --account=soscip-<SOSCIP_PROJECT_ID> #For SOSCIP projects only
  
 
module load anaconda3
 
module load anaconda3
Line 437: Line 471:
  
 
Multi-GPU job should ask for a minimum of one full node (4 GPUs). User need to specify "compute_full_node" partition in order to get all resource on a node.  
 
Multi-GPU job should ask for a minimum of one full node (4 GPUs). User need to specify "compute_full_node" partition in order to get all resource on a node.  
*An example for a 2-node, 8-rank OpenMPI job: (Each rank binds to 1 GPU and 8 physical CPU cores in this case)
+
*An example for a 1-node job:
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
#SBATCH --nodes=2
+
#SBATCH --nodes=1
 
#SBATCH --gpus-per-node=4
 
#SBATCH --gpus-per-node=4
#SBATCH --ntasks=8
+
#SBATCH --ntasks=4 #this only affects MPI job
 
#SBATCH --time=1:00:00
 
#SBATCH --time=1:00:00
 
#SBATCH -p compute_full_node
 
#SBATCH -p compute_full_node
#SBATCH -A soscip-<SOSCIP_PROJECT_ID> #For SOSCIP projects only
+
#SBATCH --account=soscip-<SOSCIP_PROJECT_ID> #For SOSCIP projects only
 +
 
 +
module load <modules you need>
 +
Run your program
 +
</pre>
 +
 
 +
= Jupyter Notebooks =
 +
SciNet’s [[Jupyter Hub]] is a Niagara-type node; it has a different CPU architecture and no GPUs. Conda environments prepared on Mist will not work there properly. Users who need to use Jupyter Notebook to develop and test some aspects of their workflow can create their own server on the Mist login node and use an SSH tunnel to connect to it from outside. Users who choose to do so have to keep in mind that the login node is a shared resource, and heavy calculations should be done only on compute nodes. Processes (including iPython kernels used by the notebooks) are limited to one hour of total CPU time: idle time will not be counted toward this one hour, and use of multiple cores will count proportionally to the number of cores (i.e. a kernel using all 128 virtual cores on the node will be killed after 28 seconds). Idle notebooks can still burden the node by hogging system and GPU memory, please be mindful of other users and terminate notebooks when work is done.
 +
 
 +
As an example, let us create a new Conda environment and activate it:
 +
<pre>
 +
module load anaconda3
 +
conda create -n jupyter_env python=3.7
 +
source activate jupyter_env
 +
</pre>
 +
Install the Jupyter Notebook server:
 +
<pre>
 +
conda install notebook
 +
</pre>
 +
 
 +
== Running the notebook server ==
 +
When the Conda environment is active, enter:
 +
<pre>
 +
jupyter-notebook
 +
</pre>
 +
By default, the Jupyter Notebook server uses port 8888 (can be overridden with the <code>--port</code> option). If another user has already started their own server, the default port may be busy, in which case the server will be listening on a different port. Once launched, the server will output some information to the terminal that will include the actual port number used and a 48-character token. For example:
 +
<pre>http://localhost:8890/?token=54c4090d……</pre>
 +
In this example, the server is listening on port 8890.
 +
 
 +
== Creating a tunnel ==
 +
In order to access this port remotely (i.e. from your office or home), an [https://en.wikipedia.org/wiki/Tunneling_protocol#Secure_Shell_tunneling SSH tunnel] has to be established. Please refer to your SSH client’s documentation for instructions on how to do that. For the OpenSSH client (standard in most Linux distributions and macOS), a tunnel can be opened in a separate terminal session to the one where the Jupyter Notebook server is running. In the new terminal, issue this command:
 +
<pre>
 +
ssh -L8888:localhost:8890 <username>@mist.scinet.utoronto.ca
 +
</pre>
 +
(replace <code><username></code> with your actual username) The tunnel is open as long as this SSH connection is alive. In this example, we tunnel Mist login node’s port 8890 (where our server is assumed to be running) to our home computer’s port 8888 (any other free port is fine). The notebook can be accessed in the browser at the <code><nowiki>http://localhost:8888</nowiki></code> address (followed by <code>/?token=54c4090d……</code>, or the token can be input on the webpage).
 +
 
 +
== Using Jupyter on compute nodes ==
 +
 
 +
You can use the instructions here to set up a Jupyter Notebook server on a compute node (including a [[#Testing_and_debugging|debugjob]]). '''We strongly discourage''' you from running an interactive notebook on a compute node (other than for a debugjob), scheduled jobs run in arbitrary times and are not meant to be interactive. Jupyter notebooks can be run non-interactively or converted to Python scripts.
  
module load cuda/10.2.89 gcc/8.3.0 openmpi/3.1.5
+
To launch the Jupyter Notebook server, load the <code>anaconda3</code> module and activate your environment as before (by adding the appropriate lines to the submission script, if you are not using the compute node with an interactive shell). Launching the server has to be done like so:
 +
<pre>
 +
HOME=/dev/shm/$USER jupyter-notebook
 +
</pre>
 +
That is because Jupyter will fail unless it can write to the home folder, which is read-only from compute nodes. This modification of the <code>$HOME</code> environment variable will carry over into the notebooks, which is usually not a problem, but in case the notebook relies on this environment variable (e.g. to read certain files), it can be reset manually in the notebook (<code>import os; os.environ['HOME']=……</code>).
  
mpirun -bind-to core -map-by slot:PE=8 -report-bindings ./program
+
Because compute nodes are not accessible from the Internet, tunneling has to be done twice, once from the remote location (office or home) to the Mist login node, and then from the login node to the compute node. Assuming the server is running on port 8890 of the mist006 node, open the first tunnel in a new terminal session in the remote computer:
 +
<pre>
 +
ssh -L8888:localhost:9999 <username>@mist.scinet.utoronto.ca
 
</pre>
 
</pre>
 +
where 9999 is any available port on the Mist login node (to test port availability enter <code>ss -Hln src :9999</code> in the terminal when connected to the Mist login node; an empty output indicates that the port is free). In the same session in the login node that was created with the above command, open the second tunnel to the compute node:
 +
<pre>
 +
ssh -L9999:localhost:8890 mist006
 +
</pre>
 +
Be aware that the second tunnel will automatically disconnect once the job on the compute node times out or is relinquished. The Jupyter Notebook server running on the compute node can now be accessed from the browser as in the previous subsection.
 +
  
 
= Support =
 
= Support =
Line 456: Line 540:
 
SciNet inquiries:
 
SciNet inquiries:
 
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]
 
* [mailto:support@scinet.utoronto.ca support@scinet.utoronto.ca]
* [mailto:niagara@computecanada.ca niagara@computecanada.ca]
 
  
 
SOSCIP inquiries:
 
SOSCIP inquiries:
 
*[mailto:soscip-support@scinet.utoronto.ca soscip-support@scinet.utoronto.ca]
 
*[mailto:soscip-support@scinet.utoronto.ca soscip-support@scinet.utoronto.ca]

Latest revision as of 14:28, 15 October 2021

Mist
Mist.jpg
Installed Dec 2019
Operating System Red Hat Enterprise Linux 8.2
Number of Nodes 54 IBM AC922
Interconnect Mellanox EDR
Ram/Node 256 GB
GPUs/Node 4 V100-SMX2-32GB
Login/Devel Node mist.scinet.utoronto.ca
Vendor Compilers NVCC, IBM XL
Queue Submission Slurm

Specifications

Mist is a SciNet-SOSCIP joint GPU cluster consisting of 54 IBM AC922 servers. Each node of the cluster has 32 IBM Power9 cores, 256GB RAM and 4 NVIDIA V100-SMX2-32GB GPU with NVLINKs in between. The cluster has InfiniBand EDR interconnection providing GPU-Direct RMDA capability.

Important note: the majority of computer systems as of 2021 (laptops, desktops, and HPC) use the 64 bit x86 instruction set architecture (ISA) in their microprocessors produced by Intel and AMD. This ISA is incompatible with Mist, whose hardware uses the 64 bit PPC ISA (set to little endian mode). The practical meaning is that x86-compiled binaries (executables and libraries) cannot be installed on Mist. For this reason, the Niagara and Compute Canada software stacks (modules) cannot be made available on Mist, and using closed-source software is only possible when the vendor provides a compatible version of their application. Python applications almost always rely on bindings to libraries originally written in C or C++, some of them are not available on PyPI or various Conda channels as precompiled binaries compatible with Mist. The recommended way to use Python on Mist is to create a Conda environment and install packages from the anaconda (default) channel, where most popular packages have a linux-ppc64le (Mist-compatible) version available. Some popular machine learning packages should be installed from the internal Open-CE channel. Where a compatible Conda package cannot be found, installing from PyPI (pip install) can be attempted. Pip will attempt to compile the package’s source code if no compatible precompiled wheel is available, therefore a compiler module (such as gcc/.core) should be loaded in advance. Some packages require tweaking of the source code or build procedure to successfully compile on Mist, please contact support if you need assistance.

Getting started on Mist

Mist can be accessed directly.

ssh -Y MYCCUSERNAME@mist.scinet.utoronto.ca

Mist login node mist-login01 can also be accessed via Niagara cluster.

ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca
ssh -Y mist-login01

Storage

The filesystem for Mist is shared with Niagara cluster. See Niagara Storage for more details.

Loading software modules

You have two options for running code on Mist: use existing software, or compile your own. This section focuses on the former.

Other than essentials, all installed software is made available using module commands. These modules set environment variables (PATH, etc.), allowing multiple, conflicting versions of a given package to be available. A detailed explanation of the module system can be found on the modules page and a list of Modules for Mist is also available.

Common module subcommands are:

  • module load <module-name>: load the default version of a particular software.
  • module load <module-name>/<module-version>: load a specific version of a particular software.
  • module purge: unload all currently loaded modules.
  • module spider (or module spider <module-name>): list available software packages.
  • module avail: list loadable software packages.
  • module list: list loaded modules.

Along with modifying common environment variables, such as PATH, and LD_LIBRARY_PATH, these modules also create a SCINET_MODULENAME_ROOT environment variable, which can be used to access commonly needed software directories, such as /include and /lib.

There are handy abbreviations for the module commands. ml is the same as module list, and ml <module-name> is the same as module load <module-name>.

Tips for loading software

  • We advise against loading modules in your .bashrc. This can lead to very confusing behaviour under certain circumstances. Our guidelines for .bashrc files can be found here.
  • Instead, load modules by hand when needed, or by sourcing a separate script.
  • Load run-specific modules inside your job submission script.
  • Short names give default versions; e.g. cudacuda/11.0.3. It is usually better to be explicit about the versions, for future reproducibility.
  • Modules often require other modules to be loaded first. Solve these dependencies by using module spider.

Available compilers and interpreters

  • cuda module has to be loaded first for GPU software.
  • For most compiled software, one should use the GNU compilers (gcc for C, g++ for C++, and gfortran for Fortran). Loading gcc module makes these available.
  • The IBM XL compiler suite (xlc_r, xlc++_r, xlf_r) is also available, if you load one of the xl modules.
  • To compile mpi code, you must additionally load an openmpi or spectrum-mpi module.

CUDA

The current installed CUDA Tookits are 11.0.3 and 10.2.2 (10.2.89)

module load cuda/11.0.3
module load cuda/10.2.2
  • A compiler (GCC, XL or NVHPC/PGI) module must be loaded in order to use CUDA to build any code.

The current NVIDIA driver version is 450.119.04.

GNU Compilers

Available GCC modules are:

gcc/9.3.0 (must load CUDA 11)
gcc/8.5.0 (must load CUDA 10)
gcc/10.3.0 (w/o CUDA)

IBM XL Compilers

To load the native IBM xlc/xlc++ and xlf (Fortran) compilers, run

module load xl/16.1.1.10

IBM XL Compilers are enabled for use with NVIDIA GPUs, including support for OpenMP GPU offloading and integration with NVIDIA's nvcc command to compile host-side code for the POWER9 CPU. Information about the IBM XL Compilers can be found at the following links:IBM XL C/C++, IBM XL Fortran

OpenMPI

openmpi/<version> module is avaiable with different compilers including GCC and XL. spectrum-mpi/<version> module provides IBM Spectrum MPI.

NVHPC/PGI

PGI compiler is provided in NVHPC (NVIDIA HPC SDK).

module load nvhpc/21.3

Software

Amber20

Users who hold Amber20 license can build Amber20 from its source code and run on Mist. SOSCIP/SciNet doesn't provide Amber license or source code.

Building Amber20

Modules that are needed for building Amber20:

module load MistEnv/2021a cuda/10.2.2 gcc/8.5.0 anaconda3/2021.05 cmake/3.19.8

Cmake configuration:

cmake .. -DCMAKE_INSTALL_PREFIX=$HOME/where-amber-install -DCOMPILER=GNU -DMPI=FALSE -DCUDA=TRUE -DINSTALL_TESTS=TRUE -DDOWNLOAD_MINICONDA=FALSE -DOPENMP=TRUE -DNCCL=FALSE -DAPPLY_UPDATES=TRUE

Running Amber20

NVIDIA Pascal P100 and later GPUs like V100 do not scale beyond a single GPU. It is highly suggested to run Amber20 as a single-gpu job. A job example:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gpus-per-node=1
#SBATCH --time=1:00:0
#SBATCH --account=soscip-<SOSCIP-project-ID>

module load MistEnv/2021a cuda/10.2.2 gcc/8.5.0 anaconda3/2021.05
export PATH=$HOME/where-amber-install/bin:$PATH
export LD_LIBRARY_PATH=$HOME/where-amber-install/lib:$LD_LIBRARY_PATH
pmemd.cuda .... <parameters> ...

Anaconda (Python)

Anaconda is a popular distribution of the Python programming language. It contains several common Python libraries such as SciPy and NumPy as pre-built packages, which eases installation. Anaconda is provided as modules: anaconda3

To install Anaconda locally, user need to load the module and create a conda environment:

module load anaconda3
conda create -n myPythonEnv python=3.8
  • Note: By default, conda environments are located in $HOME/.conda/envs. Cache (downloaded tarballs and packages) is under $HOME/.conda/pkgs. User may run into problem with disk quota if there are too many environments created. To clean conda cache, please run: "conda clean -y --all" and "rm -rf $HOME/.conda/pkgs/*" after installation of packages.

To activate the conda environment: (should be activated before running python)

source activate myPythonEnv

Note that you SHOULD NOT use conda activate myPythonEnv to activate the environment. This leads to all sorts of problems. Once the environment is activated, user can update or install packages via conda or pip

conda install  <package_name> (preferred way to install packages)
pip install <package_name>
  • Once the installation finishes, please clean the cache:
conda clean -y --all
rm -rf $HOME/.conda/pkgs/*

To deactivate:

source deactivate

To remove a conda environment:

conda remove --name myPythonEnv --all

To verify that the environment was removed, run:

conda info --envs

Submitting Python Job

A single-gpu job example:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gpus-per-node=1
#SBATCH --time=1:00:0
#SBATCH --account=soscip-<SOSCIP_PROJECT_ID> #For SOSCIP projects only

module load anaconda3
source activate myPythonEnv
python code.py ...

CuPy

CuPy is an open-source matrix library accelerated with NVIDIA CUDA. It also uses CUDA-related libraries including cuBLAS, cuDNN, cuRand, cuSolver, cuSPARSE, cuFFT and NCCL to make full use of the GPU architecture. CuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray, and many functions on it. It supports a subset of numpy.ndarray interface.

CuPy can be install into any conda environment. Python packages: numpy, six and fastrlock are required. cuDNN and NCCL are optional.

module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0  nccl/2.9.9 anaconda3/2021.05
conda create -n cupy-env python=3.8 numpy six fastrlock
source activate cupy-env
CFLAGS="-I$MODULE_CUDNN_PREFIX/include -I$MODULE_NCCL_PREFIX/include -I$MODULE_CUDA_PREFIX/include" LDFLAGS="-L$MODULE_CUDNN_PREFIX/lib64 -L$MODULE_NCCL_PREFIX/lib" CUDA_PATH=$MODULE_CUDA_PREFIX pip install cupy
#building/installing CuPy will take a few minutes

Gromacs

GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers.

module load MistEnv/2021a cuda/10.2.2 gcc/8.5.0 gromacs/2019.6
module load MistEnv/2020a cuda/10.2.89 gcc/8.3.0 openmpi/3.1.5 gromacs/2019.6 (old RHEL 7 version for testing only)
  • GROMACS 2020 and 2021 Thread-MPI version supports full GPU enablement of all key computational sections. The GPU is used throughout the timestep and repeated CPU-GPU transfers are eliminated. Users are suggested to carefully verify the results.
module load MistEnv/2021a cuda/10.2.2 gcc/8.5.0 gromacs/2020.6
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 gromacs/2021.2
module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 openmpi/4.1.1+ucx-1.10.0 gromacs/2021.2

Small/Medium Simulation

Due to the lack of PME domain decomposition support on GPU, Gromacs uses CPU to calculate PME when using multiple GPUs. It is always recommended to use a single GPU to do small and medium sized simulations with Gromacs. By using only 1 MPI rank (w/ OpenMP threads) on a single GPU, both non-bonded PP and PME are atomically offloaded to GPU when possible.

  • Gromacs 2019 example:
#!/bin/bash
#SBATCH --time=20:00
#SBATCH --nodes=1
#SBATCH --gpus-per-node=1

module load MistEnv/2021a cuda/10.2.2 gcc/8.5.0 gromacs/2019.6
export OMP_NUM_THREADS=8
export OMP_PLACES=cores
gmx mdrun -pin off -ntmpi 1 -ntomp 8  ... <other parameters>
  • Gromacs 2020 or 2021 example:
#!/bin/bash
#SBATCH --time=20:00
#SBATCH --nodes=1
#SBATCH --gpus-per-node=1

module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 gromacs/2021.2
export OMP_NUM_THREADS=8
export OMP_PLACES=cores
gmx mdrun -pin off -ntmpi 1 -ntomp 8 -update gpu ... <other parameters>

Large Simulation

If memory size (~58GB) for single-gpu job is not sufficient for the simulation, multiple GPUs can be used. It is suggested to test starting with one full node with 4GPUs and force PME on GPU. Multiple PME ranks are not supported with PME on GPU, so if GPU is used for the PME calculation -npme (number of PME ranks) must be set to 1. If PME has less work than PP, it is suggested to run multiple ranks per GPU, so the GPU for PME rank can also do some work on PP rank(s). When running multiple MPI ranks on the same GPU, NVIDIA Multi-Process Service (MPS) must be enabled.

  • An example using 4 GPUs, 7 PP ranks + 1 PME rank: (-pin on -pme gpu -npme 1 must be added to mdrun command in order to force GPU to do PME)
#!/bin/bash
#SBATCH --time=20:00
#SBATCH --gpus-per-node=4
#SBATCH --ntasks=8
#SBATCH --nodes=1
#SBATCH -p compute_full_node

module load MistEnv/2021a cuda/11.0.3  gcc/9.4.0  openmpi/4.1.1+ucx-1.10.0 gromacs/2021.2

mkdir -p /dev/shm/nvidia-mps
export CUDA_MPS_PIPE_DIRECTORY=/dev/shm/nvidia-mps
mkdir -p /dev/shm/nvidia-log
export CUDA_MPS_LOG_DIRECTORY=/dev/shm/nvidia-log
nvidia-cuda-mps-control -d

export OMP_NUM_THREADS=4
mpirun  -bind-to none gmx_mpi mdrun -pin on -pme gpu -npme 1 ... <add your parameters>
  • It is suggested to also test using --ntasks=4 and OMP_NUM_THREADS=8 if you receive a NOTE in Gromacs output saying "% performance was lost because the PME ranks had more work to do than the PP ranks". In this case, NVIDIA MPS is not needed since there is only one MPI rank per GPU.
  • Please note that the solving of PME on GPU is still only the initial version supporting this behaviour, and comes with a set of limitations outlined further below.
* Only a PME order of 4 is supported on GPUs.
* PME will run on a GPU only when exactly one rank has a PME task, ie. decompositions with multiple ranks doing PME are not supported.
* Only single precision is supported.
* Free energy calculations where charges are perturbed are not supported, because only single PME grids can be calculated.
* Only dynamical integrators are supported (ie. leap-frog, Velocity Verlet, stochastic dynamics)
* LJ PME is not supported on GPUs.
  • An example using 4 GPUs, PME on CPU: (-pin on must be added to mdrun command for proper CPU thread bindings)
#!/bin/bash
#SBATCH --time=20:00
#SBATCH --gpus-per-node=4
#SBATCH --ntasks=8
#SBATCH --nodes=1
#SBATCH -p compute_full_node

module load MistEnv/2021a cuda/11.0.3  gcc/9.4.0  openmpi/4.1.1+ucx-1.10.0 gromacs/2021.2

mkdir -p /dev/shm/nvidia-mps
export CUDA_MPS_PIPE_DIRECTORY=/dev/shm/nvidia-mps
mkdir -p /dev/shm/nvidia-log
export CUDA_MPS_LOG_DIRECTORY=/dev/shm/nvidia-log
nvidia-cuda-mps-control -d

export OMP_NUM_THREADS=4
mpirun -bind-to none gmx_mpi mdrun -pin on  ... <add your parameters>

# "--ntasks=16, OMP_NUM_THREADS=2" and "--ntasks=4, OMP_NUM_THREADS=8" should also be tested.  
# num_Tasks(MPI_ranks) * num_OpenMP_threads = 32
  • NOTE: The above examples will NOT work with multiple nodes. If simulation is too large for a single GPU node, please contact SciNet/SOSCIP support.

NAMD

NAMD is a parallel, object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems.

2.14

module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 spectrum-mpi/10.4.0 namd/2.14

Running with single GPU

If you have many jobs to run, it is always suggested to run with a single gpu per job. This makes jobs easier to be scheduled and gives better overall performance.

#!/bin/bash
#SBATCH --time=20:00
#SBATCH --gpus-per-node=1
#SBATCH --nodes=1

module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 spectrum-mpi/10.4.0 namd/2.14
scontrol show hostnames > nodelist-$SLURM_JOB_ID

`which charmrun` -npernode 1 -bind-to none -hostfile nodelist-$SLURM_JOB_ID `which namd2` +idlepoll +ppn 8 +p 8 stmv.namd

Running with one process per node (4 GPUs)

An example of the job script (using 1 node, one process per node, 32 CPU threads per process + 4 GPUs per process):

#!/bin/bash
#SBATCH --time=20:00
#SBATCH --gpus-per-node=4
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH -p compute_full_node

module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 spectrum-mpi/10.4.0 namd/2.14
scontrol show hostnames > nodelist-$SLURM_JOB_ID

`which charmrun` -npernode 1 -hostfile nodelist-$SLURM_JOB_ID `which namd2` +setcpuaffinity +pemap 0-127:4 +idlepoll +ppn 32 +p $((32*SLURM_NTASKS)) stmv.namd

Running with one process per GPU (4 GPUs)

NAMD may scale better if using one process per GPU. Please do your own benchmark. An example of the job script (using 1 node, one process per GPU, 8 CPU threads per process):

#!/bin/bash
#SBATCH --time=20:00
#SBATCH --gpus-per-node=4
#SBATCH --ntasks=4
#SBATCH --nodes=1
#SBATCH -p compute_full_node

module load MistEnv/2021a cuda/11.0.3 gcc/9.4.0 spectrum-mpi/10.4.0 namd/2.14
scontrol show hostnames > nodelist-$SLURM_JOB_ID

`which charmrun` -npernode 4 -hostfile nodelist-$SLURM_JOB_ID `which namd2` +setcpuaffinity +pemap 0-127:4 +idlepoll +ppn 8 +p $((8*SLURM_NTASKS)) stmv.namd

Open-CE

Open-CE is an IBM repo for feedstock collection, environment data, and scripts for building Tensorflow, Pytorch, XGBoost, and other related packages and dependencies. Open-CE is distributed as a conda channel on Mist cluster. Available packages and versions are listed here Open-CE Releases. Currently only python 3.7 and 3.8 are supported. Packages are built with CUDA 11.2 (only with Open-CE 1.3), 11.0 and 10.2.

  • Packages can be installed by setting Open-CE conda channel:
conda install -c /scinet/mist/ibm/open-ce/1.3 python=3.8 cudatoolkit=11.2 PACKAGE
or
conda install -c /scinet/mist/ibm/open-ce/1.2 python=3.8 cudatoolkit=11.0 PACKAGE
  • Once the installation finishes, please clean the cache:
conda clean -y --all
rm -rf $HOME/.conda/pkgs/*

PyTorch

Installing from IBM Open-CE Conda Channel

The easiest way to install PyTorch on Mist is using IBM's Conda channel. User needs to prepare a conda environment and install PyTorch using IBM's Open-CE Conda channel.

module load anaconda3
conda create -n pytorch_env python=3.8 (or 3.7)
source activate pytorch_env
conda install -c /scinet/mist/ibm/open-ce/1.3 pytorch=1.8.1 cudatoolkit=11.2 (or 10.2)
or
conda install -c /scinet/mist/ibm/open-ce/1.2 pytorch=1.7.1 cudatoolkit=11.0 (or 10.2)

Once the installation finishes, please clean the cache:

conda clean -y --all
rm -rf $HOME/.conda/pkgs/*

Add below command into your job script before python command to get deterministic results, see details here: [1]

export CUBLAS_WORKSPACE_CONFIG=:4096:2

RAPIDS

The RAPIDS is a suite of open source software libraries that gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. The RAPIDS data science framework includes a collection of libraries: cuDF(GPU DataFrames), cuML(GPU Machine Learning Algorithms), cuStrings(GPU String Manipulation), etc.

Installing from IBM Conda Channel

The easiest way to install RAPIDS on Mist is using IBM's Conda channel. User needs to prepare a conda environment with Python 3.6 or 3.7 and install powerai-rapids using IBM's Conda channel.

module load anaconda3
conda create -n rapids_env python=3.7
source activate rapids_env
conda install -c https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda-early-access/ powerai-rapids

Once the installation finishes, please clean the cache:

conda clean -y --all
rm -rf $HOME/.conda/pkgs/*

TensorFlow and Keras

Installing from IBM Conda Channel

The easiest way to install TensorFlow and Keras on Mist is using IBM's Open-CE Conda channel. User needs to prepare a conda environment and install TensorFlow using IBM's Open-CE Conda channel.

module load anaconda3
conda create -n tf_env python=3.8 (or 3.7)
source activate tf_env
conda install -c /scinet/mist/ibm/open-ce/1.3 tensorflow==2.5.1 cudatoolkit=11.2 (or 10.2)
or
conda install -c /scinet/mist/ibm/open-ce/1.2 tensorflow==2.4.3 cudatoolkit=11.0 (or 10.2)

Once the installation finishes, please clean the cache:

conda clean -y --all
rm -rf $HOME/.conda/pkgs/*

Testing and debugging

You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.

  • Small test jobs can be run on the login node. Rule of thumb: tests should run no more than a couple of minutes, taking at most about 1-2GB of memory, and use no more than one gpu and a few cores.
  • Short tests that do not fit on a login node, or for which you need a dedicated node, request an interactive debug job with the debug command:
mist-login01:~$ debugjob --clean -g G

where G is the number of gpus, If G=1, this gives an interactive session for 2 hours, whereas G=4 gets you a single node with 4 gpus for 30 minutes, and with G=8 (the maximum) gets you 2 nodes each with 4 gpus for 30 minutes. The --clean argument is optional but recommended as it will start the session without any modules loaded, thus mimicking more closely what happens when you submit a job script.

Submitting jobs

Once you have compiled and tested your code or workflow on the Mist login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Mist's 53 compute nodes. When and where your job runs is determined by the scheduler.

Mist uses SLURM as its job scheduler. It is configured to allow only Single-GPU jobs and Full-node jobs (4 GPUs per node).

You submit jobs from a login node by passing a script to the sbatch command:

mist-login01:scratch$ sbatch jobscript.sh

This puts the job in the queue. It will run on the compute nodes in due course. In most cases, you should not submit from your $HOME directory, but rather, from your $SCRATCH directory, so that the output of your compute job can be written out (as mentioned above, $HOME is read-only on the compute nodes).

Example job scripts can be found below. Keep in mind:

  • Scheduling is by single gpu or by full node, so you ask only 1 gpu or 4 gpus per node.
  • Your job's maximum walltime is 24 hours.
  • Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).
  • Compute nodes have no internet access.
  • Your job script will not remember the modules you have loaded, so it needs to contain "module load" commands of all the required modules (see examples below).

SOSCIP Users

  • SOSCIP is a consortium to bring together industrial partners and academic researchers and provide them with sophisticated advanced computing technologies and expertise to solve social, technical and business challenges across sectors and drive economic growth.

If you are working on a SOSCIP project, please contact soscip-support@scinet.utoronto.ca to have your user account added to SOSCIP project accounts. SOSCIP users need to submit jobs with additional SLURM flag to get higher priority:

#SBATCH -A soscip-<SOSCIP_PROJECT_ID>    #e.g. soscip-3-001
OR
#SBATCH --account=soscip-<SOSCIP_PROJECT_ID>

Single-GPU job script

For a single GPU job, each will have a quarter of the node which is 1 GPU + 8/32 CPU Cores/Threads + ~58GB CPU memory. Users should never ask CPU or Memory explicitly. If running MPI program, user can set --ntasks to be the number of MPI ranks. Do NOT set --ntasks for non-MPI programs.

  • It is suggested to use NVIDIA Multi-Process Service (MPS) if running multiple MPI ranks on one GPU.
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gpus-per-node=1
#SBATCH --time=1:00:0
#SBATCH --account=soscip-<SOSCIP_PROJECT_ID> #For SOSCIP projects only

module load anaconda3
source activate conda_env
python code.py ...

Full-node job script

If you are not sure the program can be executed on multiple GPUs, please follow the single-gpu job instruction above or contact SciNet/SOSCIP support.

Multi-GPU job should ask for a minimum of one full node (4 GPUs). User need to specify "compute_full_node" partition in order to get all resource on a node.

  • An example for a 1-node job:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gpus-per-node=4
#SBATCH --ntasks=4 #this only affects MPI job
#SBATCH --time=1:00:00
#SBATCH -p compute_full_node
#SBATCH --account=soscip-<SOSCIP_PROJECT_ID> #For SOSCIP projects only

module load <modules you need>
Run your program

Jupyter Notebooks

SciNet’s Jupyter Hub is a Niagara-type node; it has a different CPU architecture and no GPUs. Conda environments prepared on Mist will not work there properly. Users who need to use Jupyter Notebook to develop and test some aspects of their workflow can create their own server on the Mist login node and use an SSH tunnel to connect to it from outside. Users who choose to do so have to keep in mind that the login node is a shared resource, and heavy calculations should be done only on compute nodes. Processes (including iPython kernels used by the notebooks) are limited to one hour of total CPU time: idle time will not be counted toward this one hour, and use of multiple cores will count proportionally to the number of cores (i.e. a kernel using all 128 virtual cores on the node will be killed after 28 seconds). Idle notebooks can still burden the node by hogging system and GPU memory, please be mindful of other users and terminate notebooks when work is done.

As an example, let us create a new Conda environment and activate it:

module load anaconda3
conda create -n jupyter_env python=3.7
source activate jupyter_env

Install the Jupyter Notebook server:

conda install notebook

Running the notebook server

When the Conda environment is active, enter:

jupyter-notebook

By default, the Jupyter Notebook server uses port 8888 (can be overridden with the --port option). If another user has already started their own server, the default port may be busy, in which case the server will be listening on a different port. Once launched, the server will output some information to the terminal that will include the actual port number used and a 48-character token. For example:

http://localhost:8890/?token=54c4090d……

In this example, the server is listening on port 8890.

Creating a tunnel

In order to access this port remotely (i.e. from your office or home), an SSH tunnel has to be established. Please refer to your SSH client’s documentation for instructions on how to do that. For the OpenSSH client (standard in most Linux distributions and macOS), a tunnel can be opened in a separate terminal session to the one where the Jupyter Notebook server is running. In the new terminal, issue this command:

ssh -L8888:localhost:8890 <username>@mist.scinet.utoronto.ca

(replace <username> with your actual username) The tunnel is open as long as this SSH connection is alive. In this example, we tunnel Mist login node’s port 8890 (where our server is assumed to be running) to our home computer’s port 8888 (any other free port is fine). The notebook can be accessed in the browser at the http://localhost:8888 address (followed by /?token=54c4090d……, or the token can be input on the webpage).

Using Jupyter on compute nodes

You can use the instructions here to set up a Jupyter Notebook server on a compute node (including a debugjob). We strongly discourage you from running an interactive notebook on a compute node (other than for a debugjob), scheduled jobs run in arbitrary times and are not meant to be interactive. Jupyter notebooks can be run non-interactively or converted to Python scripts.

To launch the Jupyter Notebook server, load the anaconda3 module and activate your environment as before (by adding the appropriate lines to the submission script, if you are not using the compute node with an interactive shell). Launching the server has to be done like so:

HOME=/dev/shm/$USER jupyter-notebook

That is because Jupyter will fail unless it can write to the home folder, which is read-only from compute nodes. This modification of the $HOME environment variable will carry over into the notebooks, which is usually not a problem, but in case the notebook relies on this environment variable (e.g. to read certain files), it can be reset manually in the notebook (import os; os.environ['HOME']=……).

Because compute nodes are not accessible from the Internet, tunneling has to be done twice, once from the remote location (office or home) to the Mist login node, and then from the login node to the compute node. Assuming the server is running on port 8890 of the mist006 node, open the first tunnel in a new terminal session in the remote computer:

ssh -L8888:localhost:9999 <username>@mist.scinet.utoronto.ca

where 9999 is any available port on the Mist login node (to test port availability enter ss -Hln src :9999 in the terminal when connected to the Mist login node; an empty output indicates that the port is free). In the same session in the login node that was created with the above command, open the second tunnel to the compute node:

ssh -L9999:localhost:8890 mist006

Be aware that the second tunnel will automatically disconnect once the job on the compute node times out or is relinquished. The Jupyter Notebook server running on the compute node can now be accessed from the browser as in the previous subsection.


Support

SciNet inquiries:

SOSCIP inquiries: