Burst Buffer
The Niagara burst buffer is fast, high performance shared file system, made of solid-state drives (SSD). While the overall bandwidth of the burst buffer is somewhat higher than that of the scratch file system, the true strength of the burst buffer lies in dealing with high I/O operations per seconds (IOPS). The ideal use-cases are therefore jobs which involve a lot of IOPS, too many for the /scratch file system, such as certain bio-informatics workflows and quantum chemistry calculations, and codes that have large restart checkpoint files to be saved between jobs.
The setup of the Burst Buffer of the Niagara cluster is evolving as we come to better understand how best to use this resource. The current setup is described below.
Short-term burst buffer space ($BBUFFER)
To get access to space on the burst buffer a user must first request space on it. If you desire access, send an email, detailing your motivation for desiring access to the burst buffer, to support@scinet.utoronto.ca. A quota of 10 TBs will be set for each burst buffer user.
Users with short-term burst buffer access will have a directory created on that resource. The location is accessible using the $BBUFFER environment variable. Like $SCRATCH, The $BBUFFER directory is accessible from all Niagara login, compute and datamover nodes.
Unlike ramdisk or job-specific burst buffer space (explained below), the files will remain on your burst buffer space between jobs. This makes burst buffer ideal for codes that have large restart checkpoint files to be saved between jobs.
Users should endeavour to clean up after each job, by staging out final files to $SCRATCH and removing temporary files. A very-short purging policy for the burst buffer (around 48 hours) will be implemented in the future.
Users should test a burst buffer workflow using a short test job before using the burst buffer in production.
Note that Niagara compute nodes have no local disks, so $SLURM_TMPDIR lives in memory (ramdisk), in contrast to the general purpose Alliance (formerly Compute Canada) systems Cedar, Graham, Beluga and Narval, where this variable points to a directory on a node-local ssd disk.