The job scheduler SLURM, version 22.05.8, is installed on the OMNI cluster. The purpose of SLURM is to distribute jobs over the compute nodes in such a way that the available resources are used efficiently and wait times are minimized. More information on queuing jobs can be found here. The SLURM documentation is here.
Tip: SLURM is only available if the SLURM module is loaded (module load slurm
). By default, the SLURM module is always loaded upon login, but it can be unloaded intentionally or by accident (e.g. with module purge
). If you do not have the SLURM commands, type module list
to check that the SLURM module is actually loaded. More information on modules here.
A SLURM job gets its settings from several sources. These are, in descending order of priority:
- Command line parameters of the command used to queue the job.
- Environment variables, if specified.
- The parameters at the beginning of the job script, if specified.
- The default settings of SLURM.
That means, for example, that you can specify in a job script which settings are to be used, and if you need to deviate from these for a single run you can simply specify a command line option.
SLURM terminology
SLURM knows and mirrors the division of the cluster into nodes with several cores. When queuing jobs, there are several ways of requesting resources and it is important to know which term means what in SLURM. Here are some basic SLURM terms:
-
A job is a self-contained computation that may encompass multiple tasks and is given specific resources like individual CPUs, a specific amount of RAM or entire nodes. These resources are said to have been allocated for the job.
-
A task is a single run of a single process. By default, one task is run per node and one CPU is assigned per task.
-
A partition (usually called queue outside SLURM) is a waiting line in which jobs are put by users.
-
A CPU in Slurm means a single core. This is different from the more common terminology, where a CPU (a microprocessor chip) consists of multiple cores. Slurm uses the term “sockets” when talking about CPU chips.
Commands and options
Here are the most common SLURM commands you need as a user:
Command | Function |
---|---|
squeue |
List jobs. |
sinfo |
Show informationen about allocated and free nodes. |
sbatch |
Queues a batch job. |
srun |
Outside a job: queues a job with a single Linux command. Within a job: runs a Linux command once per task in parallel. |
spartition |
Display partition information. ZIMT addition. |
scancel |
Delete a job. |
sview |
Launch graphical user interface. |
scontrol |
More detailed informationen (not all information is available to normal users). |
Many of these commands have options (command line arguments) which you can specify when calling them. The most important ones are summarized here:
Option | Function | OMNI |
---|---|---|
--partition , -p |
Specifies on which partition job is to be queued. | (use spartition to list available partitions, default is maked with *) |
--nodes , -N |
Sets the number of nodes on which job is to be run. | 1 (or more for MPI-applications) |
--ntasks , -n |
Sets the number of tasks for the jobs. | 1-64 (depends on your application setup) |
--ntasks-per-node |
Sets the maximum number of tasks per node. Usually important for MPI programs. | |
--cpus-per-task |
Sets the number of cores per task. Usually important for OpenMP programs. | |
--mem |
Sets the RAM limit per node. The job will be terminated if the limit is exceeded; our nodes prohibit memory swapping. The number is in megabytes. | default: 3750MB, max. 240GB (hpc-node) or 480GB (fat-node), Please use memory sparingly. For very memory consuming applications use the smp partition with 1530GB RAM per node. |
--time , -t |
Sets the time limit for the job. If the time limit is exceeded, the job will be terminated. Format: D-HH:MM:SS | (see spartition for default and min/max times for different partitions) |
--gpus , -G |
Number of GPUs to use. Analogously: --gres=gpu:X where X is the number of GPUs. |
Note that the OMNI cluster has nodes with different numbers of GPUs. |
--output=<Dateiname> |
For the sbatch command this specifies the log file into which the stdout stream is to be directed. By default, this will be a file named slurm-<JobID>.out which will be created in the same directory from which sbatch was run. |
|
--error=<Dateiname> |
For the sbatch command this specifies the log file into which the stderr stream is to be directed. By default, stderr will be redirected into stdout, see above |
|
--mail-type |
Sets the events for which an e-mail is to be sent to the address specified by --mail-user . Possible options are BEGIN, END, FAIL, REQUEUE, ALL . |
|
--mail-user=<Adresse> |
Specifies recipient e-mail address for job notifications. | |
--no-requeue |
Disables automatic job restart. | default: Requeue=1 (--requeue ) |
You can find a full list by entering man <command name>
or <command name> --help
as well as in the SLURM documentation of sbatch
.
Environment variables
SLURM uses a number of environment variables. Some can be used to set SLURM settings (unless they are superseded by command line arguments when the job is queued), for sbatch
they are listed in the documentation under job input variables. Other variables are set by SLURM when queuing the job. These can be used to obtain information about the job from within the job script. Here are only a few examples, a more complete list is in the SLURM documentation under job output variables.
Variable | Function |
---|---|
SLURM_CPUS_PER_TASK |
CPUs per Task. |
SLURM_JOB_ID |
ID number of the job. |
SLURM_JOB_NAME |
Name of the job, for sbatch this is the name of the job script by default. |
SLURM_JOB_NUM_NODES |
Number of nodes allocated for the job. |
SLURM_JOB_PARTITION |
Partition in which the job runs. |
SLURM_NTASKS |
Number of tasks |
SLURM_TASK_PID |
Linux process ID of the corresponding task. |