Burst Buffer
The Niagara burst buffer is a very fast, very high performance shared file system, made of solid-state drives (SSD). The ideal use-cases for the burst buffer are jobs which involve a lot of IOPS (Input/Output operations), too many for the /scratch file system, such as certain bio-informatics workflows and quantum chemistry calculations, and codes that have large restart checkpoint files to be saved between jobs.
The setup of the Burst Buffer of the Niagara cluster is evolving as we come to better understand how best to use this resource. The preliminary setup is described below.
Short-term persistent burst buffer space ($BBUFFER)
To get access to space on the burst buffer that is persistent between jobs, a user must first request space on it. If you desire access, send an email, detailing your motivation for desiring access to the burst buffer, to support@scinet.utoronto.ca. A quota of 10 TBs will be set for each burst buffer user.
Users with short-term persistent burst buffer access will have a directory created on that resource. The location is accessible using the $BBUFFER environment variable. Like $SCRATCH, The $BBUFFER directory is accessible from all Niagara login, compute and datamover nodes.
Unlike ramdisk (/dev/shm) or job-specific burst buffer space (explained below), the files will remain on your persistence burst buffer space between jobs. This makes persistent burst buffer ideal for codes that have large restart checkpoint files to be saved between jobs.
The persistence of files on this burst buffer space is very limited, so users should still endeavour to clean up after each job, by staging out final files to $SCRATCH and removing temporary files. A very-short purging policy for the burst buffer (around 48 hours) will be implemented in the future.
Users should test a burst buffer workflow using a short test job before using the burst buffer in production.
Per-job temporary burst buffer space ($SLURM_TMPDIR)
For every job on Niagara, the scheduler creates a temporary directory on the burst buffer called $SLURM_TMPDIR. The $SLURM_TMPDIR directory will be empty when your jobs starts and its content gets deleted after the job has finished.
$SLURM_TMPDIR is intended as a place for temporary files that do not fit in ramdisk (/dev/shm) and would suffer performance issues on the general /scratch file system. It is similar to the $SLURM_TMPDIR variable used on the general purpose Compute Canada systems Cedar and Graham, where this storage lives on a node-local ssd disk (which aren't present on Niagara nodes).