Graphical Processing Units are specialized processors that can execute some operations with their massively parallel architecture faster than regular CPUs. They are therefore also called accelerators (GPUs are only a subset of accelerators). The OMNI cluster contains 10 nodes with a total of 24 NVIDIA Tesla V100 GPUs.

If a program is to use GPUs, the parts of the program that are most suited for this need to be modified with functions from a GPU library (e.g. CUDA, OpenACC, OpenCL, OpenMP).

On this page we describe how you can request the use of GPUs for your jobs. We also describe which software libraries are available for developing software for GPUs and how to use these libraries.

Requesting GPU nodes on OMNI

To request a GPU node, you need to specify the gpu queue (partition) in your job script. Additionally, the number of required GPUs needs to be specified with the option --gres=gpu:<number> .

The 24 GPUs on the OMNI cluster are distributed over the 10 GPU nodes as follows:

Node Number
of GPUs
gpu-node[001-004] 4
gpu-node[005-008] 1
gpu-node[009-010] 2

Here is an example job script header:

#!/bin/bash
#SBATCH --time=0:30:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --partition=gpu
#SBATCH --gres=gpu:2
...

This will allocate one GPU node with at least two GPUs from the GPU queue. You may vary --grep=gpu:[1|2|4], depending on your needs for multiple GPUs. You can also find a number of additional parameters for GPU control in the Slurm documentation.

Developing for GPUs on OMNI

Most GPU-compatible modules are not immediately available on OMNI (and cannot be listed with module avail right away), because they are located in a separate software stack, the GPU modules. This is necessary for compatibility reasons. To change to the GPU stack, you need to enter the following command:

module load GpuModules

Once the GPU stack is loaded, the command module avail will give you an overview of all available modules in the GPU stack, as usual.

To switch back to the regular software stack, please enter:

module unload GpuModules

Please remember that you need to include the appropriate commands for module loading into your job scripts as well.

Aktualisiert um 15:19 am 8. February 2021 von Jan Steiner