Wiss. Rechnen » GPUs
 

Graphical Processing Units are specialized processors that can execute some operations with their massively parallel architecture faster than regular CPUs. They are therefore also called accelerators (GPUs are only a subset of accelerators). The OMNI cluster contains 10 nodes with a total of 24 NVIDIA Tesla V100 GPUs.

If a program is to use GPUs, the parts of the program that are most suited for this need to be modified with functions from a GPU library (e.g. CUDA, OpenACC, OpenCL, OpenMP).

On this page we desccribe how you can request the use of GPUs for your jobs. We also describe which software libraries are available for developing software for GPUs and how to use these libraries.

Requesting GPU nodes on OMNI

To request a GPU node, you need to specify the gpu queue (partition) in your job script. Additionally, the number of required GPUs needs to be specified with the option --gres=gpu:<number> .

The 24 GPUs on the OMNI cluster are distributed over the 10 GPU nodes as follows:

node #GPUs
gpu-node[001-004] 4
gpu-node[005-008] 1
gpu-node[009-010] 2

Here is example job script header:

#!/bin/bash
#SBATCH --time=0:30:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=2
#SBATCH --partition=gpu
#SBATCH --gres=gpu:2
...

This will allocate one GPU node with at least two GPUs from the GPU queue. You may vary --grep=gpu:[1|2|4], depending on your needs for multiple GPUs.

Developing for GPUs on OMNI

After login, the OpenHPC and OMNI software stack are loaded by default. The OpenHPC Software Stack includes various tools and libraries for scientific computing on HPC systems. The OMNI software stack comprises application software installed by user request. We additionally provide a complete software stack from the Bright Cluster Manager, what provides GPU compiler and packages with GPU support tailored for maschine learning applications. After loading the GPU stack, the common stacks will be invisible to avoid incompatibility.

Load the GPU software stack:
module load GpuModules

This fill add the following components to your environment per default:

$ module list

Currently Loaded Modules:
  1) shared                                  6) openblas/dynamic/0.3.7      11) nccl2-cuda10.1-gcc/2.7.8
  2) slurm/omni/20.02.6                      7) protobuf3-gcc/3.8.0         12) gcc5/5.5.0
  3) python36                                8) cudnn7.6-cuda10.1/7.6.5.32  13) tensorflow2-py36-cuda10.1-gcc/2.0.0
  4) ml-pythondeps-py36-cuda10.1-gcc/3.3.0   9) hdf5_18/1.8.21
  5) keras-py36-cuda10.1-gcc/2.3.1          10) cuda10.1/toolkit/10.1.243

The command module avail gives you an overview of all available modules in GpuModules from the Bright Cluster Manager Maschine Learning Stack. After unloading the GPU software stack, the default environment with the OpenHPC and OMNI software stack are loaded. The corresponding command is:
module unload GpuModules

Please remember to include the module commands into your submission script and allocate GPUs (see above).

Aktualisiert um 15:19 am 8. February 2021 von Jan Steiner