Quickstart new

From SciNet Users Documentation
Revision as of 18:21, 7 September 2018 by Pinto (talk | contribs)
Jump to navigation Jump to search

Progressive approach to test and run jobs on niagara

We would like to emphasize the need for users to adopt a more progressive and explicit approach for testing, running and scaling up of jobs on niagara. Here is a set of steps we suggest that you follow.

Once you have compiled and tested your code or workflow on the Niagara login nodes, and confirmed that it behaves correctly, you are ready to submit jobs to the cluster. Your jobs will run on some of Niagara's 1500 compute nodes. When and where your job runs is determined by the scheduler.

Niagara uses SLURM as its job scheduler. More-advanced details of how to interact with the scheduler can be found on the Slurm page.

You submit jobs from a login node by passing a script to the sbatch command:

nia-login07:~$ sbatch jobscript.sh

This puts the job in the queue. It will run on the compute nodes in due course.

Jobs will run under the user's group's RRG allocation, or, if the group has none, under a RAS allocation (previously called `default' allocation).

Keep in mind:

  • Scheduling is by node, so in multiples of 40 cores.
  • For users with an allocation, the maximum walltime is 24 hours. For those without an allocation, the maximum walltime is 12 hours.
  • Jobs must write their output to your scratch or project directory (home is read-only on compute nodes).
  • Compute nodes have no internet access.
  • Move your data to Niagara before you submit your job.