Graphical Processing Units are specialized processors that can execute some operations with their massively parallel architecture faster than regular CPUs. They are therefore also called accelerators (GPUs are only a subset of accelerators). The OMNI cluster contains 10 nodes with a total of 24 NVIDIA Tesla V100 GPUs.
If a program is to use GPUs, the parts of the program that are most suited for this need to be modified with functions from a GPU library (e.g. CUDA, OpenACC, OpenCL, OpenMP).
To request a GPU node, you need to specify the
gpu queue (partition) in your job script. Additionally, the number of required GPUs needs to be specified with the option
The 24 GPUs on the OMNI cluster are distributed over the 10 GPU nodes as follows:
Here is an example job script header:
#!/bin/bash #SBATCH --time=0:30:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=2 #SBATCH --partition=gpu #SBATCH --gres=gpu:2 ...
This will allocate one GPU node with at least two GPUs from the GPU queue. You may vary
--grep=gpu:[1|2|4], depending on your needs for multiple GPUs. You can also find a number of additional parameters for GPU control in the Slurm documentation.
Most GPU-compatible modules are not immediately available on OMNI (and cannot be listed with
module avail right away), because they are located in a separate software stack, the GPU modules. This is necessary for compatibility reasons. To change to the GPU stack, you need to enter the following command:
module load GpuModules
Once the GPU stack is loaded, the command
module avail will give you an overview of all available modules in the GPU stack, as usual.
To switch back to the regular software stack, please enter:
module unload GpuModules
Please remember that you need to include the appropriate commands for module loading into your job scripts as well.