Main Page

From SciNet Users Documentation
Jump to navigation Jump to search

Niagara

System architecture

  • Total of 60,000 Intel x86-64 cores.
  • 1,500 Lenovo SD530 nodes
  • 2x Intel Skylake 6148 CPUs (40 cores @2.4GHz per node).

    (with hyperthreading to 80 threads & AVX512)

  • 3.02 PFlops delivered / 4.6 PFlops theoretical.

    (would've been #42 on the TOP500 in Nov'18)

  • 188 GiB / 202 GB RAM per node.

    (at least 4 GiB/core for user jobs)
  • Operating system: Linux (CentOS 7).
  • Interconnect: EDR InfiniBand, Dragonfly+ topology with Adaptive Routing
  • 1:1 up to 432 nodes, effectively 2:1 beyond that.

  • No GPUs, no local disk.

  • Replaces the General Purpose Cluster (GPC) and Tightly Coupled System (TCS).

Migration to Niagara

Migration for Existing Users of the GPC

  • Accounts, $HOME & $PROJECT of active GPC users transferred to Niagara (except dot-files in ~).
  • Data stored in $SCRATCH will not be transfered automatically.
  • Users are to clean up $SCRATCH on the GPC as much as possible (remember it's temporary data!). Then they can transfer what they need using datamover nodes. Let us know if you need help.
  • To enable this transfer, there will be a short period during which you can have access to Niagara as well as to the GPC storage resources. This period will end no later than May 9, 2018.

For Non-GPC Users

  • Those of you new to SciNet, but with 2018 RAC allocations on Niagara, will have your accounts created and ready for you to login.

  • New, non-RAC users: we are still working out the procedure to get access.

    If you can't wait, for now, you can follow the old route of requesting a SciNet

    Consortium Account on the CCDB site.

Using Niagara: Logging in

As with all SciNet and CC compute systems, access to Niagara is via ssh (secure shell) only.

To access SciNet systems, first open a terminal window (e.g. MobaXTerm on Windows).

Then ssh into the Niagara login nodes with your CC credentials:

1
$ ssh -Y MYCCUSERNAME@niagara.scinet.utoronto.ca

or

1
$ ssh -Y MYCCUSERNAME@niagara.computecanada.ca
  • The Niagara login nodes are where you develop, edit, compile, prepare and submit jobs.
  • These login nodes are not part of the Niagara compute cluster, but have the same architecture, operating system, and software stack.
  • The optional -Y is needed to open windows from the Niagara command-line onto your local X server.
  • To run on Niagara's compute nodes, you must submit a batch job.

Storage Systems and Locations

Home and scratch

You have a home and scratch directory on the system, whose locations will be given by

$HOME=/home/g/groupname/myccusername

$SCRATCH=/scratch/g/groupname/myccusername

1
2
3
4
5
6
7
nia-login07:~$ pwd
/home/s/scinet/rzon
 
nia-login07:~$ cd $SCRATCH
 
nia-login07:rzon$ pwd
/scratch/s/scinet/rzon

Project location

Users from groups with a RAC allocation will also have a project directory.

$PROJECT=/project/g/groupname/myccusername

IMPORTANT: Future-proof your scripts

Use the environment variables instead of the actual paths!

Storage Limits on Niagara

location quota block size expiration time backed up on login on compute
$HOME 100 GB 1 MB yes yes read-only
$SCRATCH 25 TB 16 MB 2 months no yes yes
$PROJECT by group allocation 16 MB yes yes yes
$ARCHIVE by group allocation dual-copy no no
$BBUFFER ? 1 MB very short no ? ?
  • Compute nodes do not have local storage.
  • Archive space is on HPSS.
  • Backup means a recent snapshot, not an achive of all data that ever was.
  • $BBUFFER stands for the Burst Buffer, a functionality that is still being setup,

    but this will be a faster parallel storage tier for temporary data.

Moving data

Move amounts less than 10GB through the login nodes.

  • Only Niagara login nodes visible from outside SciNet.
  • Use scp or rsync to niagara.scinet.utoronto.ca or niagara.computecanada.ca (no difference).
  • This will time out for amounts larger than about 10GB.

Move amounts larger than 10GB through the datamover node.

  • From a Niagara login node, ssh to nia-datamover1.
  • Transfers must originate from this datamover.
  • The other side (e.g. your machine) must be reachable from the outside.
  • If you do this often, consider using Globus, a web-based tool for data transfer.

Moving data to HPSS/Archive/Nearline using the scheduler.

  • HPSS is a tape-based storage solution, and is SciNet's nearline a.k.a. archive facility.
  • Storage space on HPSS is controled through the annual RAC allocation.

Software and Libraries

Modules

Once you are on one of the login nodes, what software is already installed?

  • Other than essentials, all software installed using module commands.
  • sets environment variables (PATH, etc.)
  • Allows multiple, conflicting versions of package to be available.
  • module spider shows available software.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
nia-login07:~$ module spider
---------------------------------------------------
The following is a list of the modules currently av
---------------------------------------------------
  CCEnv: CCEnv
 
  NiaEnv: NiaEnv/2018a
 
  anaconda2: anaconda2/5.1.0
 
  anaconda3: anaconda3/5.1.0
 
  autotools: autotools/2017
    autoconf, automake, and libtool
 
  boost: boost/1.66.0
 
  cfitsio: cfitsio/3.430
 
  cmake: cmake/3.10.2 cmake/3.10.3
 
  ...
  • module load <module-name>

    use particular software

  • module purge

    remove currently loaded modules

  • module spider

    (or module spider <module-name>)

    list available software packages

  • module avail

    list loadable software packages

  • module list

    list loaded modules

On Niagara, there are really two software stacks:

  1. A Niagara software stack tuned and compiled for this machine. This stack is available by default, but if not, can be reloaded with

    1
    module load NiaEnv
  2. The same software stack available on Compute Canada's General Purpose clusters Graham and Cedar, compiled (for now) for a previous generation of the CPUs:

    1
    module load CCEnv

    If you want the same default modules loaded as on Cedar and Graham, then afterwards also module load StdEnv.

Note: the *Env modules are sticky; remove them by --force.

Tips for loading software

  • We advise against loading modules in your .bashrc.

    This could lead to very confusing behaviour under certain circumstances.

  • Instead, load modules by hand when needed, or by sourcing a separate script.

  • Load run-specific modules inside your job submission script.

  • Short names give default versions; e.g. intel intel/2018.2.

    It is usually better to be explicit about the versions, for future reproducibility.

  • Handy abbreviations:

        ml → module list
        ml NAME → module load NAME
        ml X → module X
  • Modules sometimes require other modules to be loaded first.

Solve these dependencies by using module spider.

Module spider

Oddly named, the module subcommand spider is the search-and-advice facility for modules.

1
2
3
nia-login07:~$ module load openmpi
Lmod has detected the error:  These module(s) exist but cannot be loaded as requested: "openmpi"
   Try: "module spider openmpi" to see how to load the module(s).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
nia-login07:~$ module spider openmpi
------------------------------------------------------------------------------------------------------
  openmpi:
------------------------------------------------------------------------------------------------------
     Versions:
        openmpi/2.1.3
        openmpi/3.0.1
        openmpi/3.1.0rc3
 
------------------------------------------------------------------------------------------------------
  For detailed information about a specific "openmpi" module (including how to load the modules) use
  the module s full name.
  For example:
 
     $ module spider openmpi/3.1.0rc3
------------------------------------------------------------------------------------------------------
1
2
3
4
5
6
7
8
9
nia-login07:~$ module spider openmpi/3.1.0rc3
------------------------------------------------------------------------------------------------------
  openmpi: openmpi/3.1.0rc3
------------------------------------------------------------------------------------------------------
    You will need to load all module(s) on any one of the lines below before the "openmpi/3.1.0rc3"
    module is available to load.
 
      NiaEnv/2018a  gcc/7.3.0
      NiaEnv/2018a  intel/2018.2
1
2
nia-login07:~$ module load NiaEnv/2018a  intel/2018.2   # note: NiaEnv is usually already loaded
nia-login07:~$ module load openmpi/3.1.0rc3
1
2
3
4
5
6
nia-login07:~$ module list
Currently Loaded Modules:
  1) NiaEnv/2018a (S)   2) intel/2018.2   3) openmpi/3.1.0.rc3
 
  Where:
   S:  Module is Sticky, requires --force to unload or purge

Can I Run Commercial Software?

  • Possibly, but you have to bring your own license for it.
  • SciNet and Compute Canada have an extremely large and broad user base of thousands of users, so we cannot provide licenses for everyone's favorite software.
  • Thus, the only commercial software installed and accessible is software that can benefit everyone: Compilers, math libraries and debuggers.
  • That means no Matlab, Gaussian, IDL,
  • Open source alternatives like Octave, Python, R are available.
  • We are happy to help you to install commercial software for which you have a license.
  • In some cases, if you have a license, you can use software in the Compute Canada stack.

Compiling on Niagara: Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
nia-login07:~$ module list
Currently Loaded Modules:
  1) NiaEnv/2018a (S)
  Where:
   S:  Module is Sticky, requires --force to unload or purge
 
nia-login07:~$ module load intel/2018.2 gsl/2.4
 
nia-login07:~$ ls
main.c module.c
 
nia-login07:~$ icc -c -O3 -xHost -o main.o main.c
nia-login07:~$ icc -c -O3 -xHost -o module.o module.c
nia-login07:~$ icc  -o main module.o main.o -lgsl -mkl
 
nia-login07:~$ ./main

Testing

You really should test your code before you submit it to the cluster to know if your code is correct and what kind of resources you need.

  • Small test jobs can be run on the login nodes.

    Rule of thumb: couple of minutes, taking at most about 1-2GB of memory, couple of cores.

  • You can run the the ddt debugger on the login nodes after module load ddt.

  • Short tests that do not fit on a login node, or for which you need a dedicated node, request an
    interactive debug job with the salloc command

    1
    nia-login07:~$ salloc -pdebug --nodes N --time=1:00:00

    where N is the number of nodes. The duration of your interactive debug session can be at most one hour, can use at most N nodes, and each user can only have one such session at a time.

Submitting jobs

  • Niagara uses SLURM as its job scheduler.

  • You submit jobs from a login node by passing a script to the sbatch command:

    1
    nia-login07:~$ sbatch jobscript.sh
  • This puts the job in the queue. It will run on the compute nodes in due course.

  • Jobs will run under their group's RRG allocation, or, if the group has none, under a RAS allocation (previously called `default' allocation).

Keep in mind:

  • Scheduling is by node, so in multiples of 40-cores.

  • Maximum walltime is 24 hours.

  • Jobs must write to your scratch or project directory (home is read-only on compute nodes).

  • Compute nodes have no internet access.

    Download data you need beforehand on a login node.

Example submission script (OpenMP)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=40
#SBATCH --time=1:00:00
#SBATCH --job-name openmp_job
#SBATCH --output=openmp_output_%j.txt
 
cd $SLURM_SUBMIT_DIR
 
module load intel/2018.2
 
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
 
srun ./openmp_example
# Just "./openmp_example" works too.
1
nia-login07:~$ sbatch openmp_job.sh
  • First line indicates that this is a bash script.
  • Lines starting with #SBATCH go to SLURM.
  • sbatch reads these lines as a job request (which it gives the name openmp_job) .
  • In this case, SLURM looks for one node with 40 cores to be run inside one task, for 1 hour.
  • Once it found such a node, it runs the script:
    • Change to the submission directory;
    • Loads modules;
    • Sets an environment variable;
    • Runs the openmp_example application.

Example submission script (MPI)

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/bash
#SBATCH --nodes=8
#SBATCH --ntasks=320
#SBATCH --time=1:00:00
#SBATCH --job-name mpi_job
#SBATCH --output=mpi_output_%j.txt
 
cd $SLURM_SUBMIT_DIR
 
module load intel/2018.2
module load openmpi/3.1.0rc3
 
srun ./mpi_example
1
nia-login07:~$ sbatch mpi_job.sh
  • First line indicates that this is a bash script.

  • Lines starting with #SBATCH go to SLURM.

  • sbatch reads these lines as a job request (which it gives the name mpi_job)

  • In this case, SLURM looks for 8 nodes with 40 cores on which to run 320 tasks, for 1 hour.

  • Once it found such a node, it runs the script:

    • Change to the submission directory;
    • Loads modules;
    • Runs the mpi_example application.

Monitoring queued jobs

Once the job is incorporated into the queue, there are some command you can use to monitor its progress.

  • squeue to show the job queue (squeue -u $USER for just your jobs);

  • squeue -j JOBID to get information on a specific job

    (alternatively, scontrol show job JOBID, which is more verbose).

  • squeue -j JOBID -o "%.9i %.9P %.8j %.8u %.2t %.10M %.6D %S" to get an estimate for when a job will run.

  • scancel -i JOBID to cancel the job.

  • sinfo -pcompute to look at available nodes.

  • More utilities like those that were available on the GPC are under development.

Data Management and I/O Tips

  • $HOME, $SCRATCH, and $PROJECT all use the parallel file system called GPFS.
  • Your files can be seen on all Niagara login and compute nodes.
  • GPFS is a high-performance file system which provides rapid reads and writes to large data sets in parallel from many nodes.
  • But accessing data sets which consist of many, small files leads to poor performance.
  • Avoid reading and writing lots of small amounts of data to disk.
  • Many small files on the system would waste space and would be slower to access, read and write.
  • Write data out in binary. Faster and takes less space.
  • Burst buffer (to come) is better for i/o heavy jobs and to speed up checkpoints.

Further information

Useful sites

Support

  • support@scinet.utoronto.ca
  • niagara@computecanada.ca