System Information

The cluster is located within 9 watercooled double racks in the New Data Center at the Campus Hölderlinstraße.

This page contains the technical details of the cluster hardware.

Nodes

The cluster is divided into nodes. Each node has multiple cores and local memory (RAM). Access is provided via four login nodes (hpc-login01 –hpc-login04). The hard disk storage is central, the individual nodes do not have hard drives.

Compute nodes

The cluster has 439 regular compute nodes. Each of these nodes has 64 CPU cores and 256 gigabytes of RAM. The nodes are named hpc-node001–hpc-node439. An additional eight compute nodes (fat-node001–fat-node008) are equipped with more memory (512 GB). All compute nodes have the same architecture. Each node holds two AMD EPYC 7452 CPU processors, with 32 cores per EPYC CPU. The CPU frequency is 2.35-3.35 GHz. Each core has a separate L1 (32 kB) and L2 (512 kB) cache. The L3 cache (16 MB) is shared by 4 cores, for a total of 64 MB per node. Each node’s RAM is divided logically into 8 NUMA domains of 32 MB each. Simultaneous multithreading (the AMD equivalent to Intel “Hyperthreading”) is deactivated. The compute nodes are divided into multiple partitions/queues to accomodate different requirements (number of nodes, runtime etc.). On our page about queuing jobs you can find more information about the queues and their use.

GPU Nodes

In addition to the regular compute nodes the cluster has 10 nodes which are each equipped with either 1,2 or 4 GPUs of type NVIDIA Tesla V100.

The Tesla V100 allows vectorizing double-precision floating point numbers. The other details of the GPU nodes (CPus, RAM etc.) is identical to that of the other compute nodes. Further information about GPU usage can be found on the GPU page.

SMP nodes

The OMNI cluster contains 2 nodes for shared multiprocessing (SMP), also called fat nodes. Each of these nodes has 4 Intel CPUs of type Xeon Gold 5218 and 1536 GB RAM. The nodes are called smp-node001 and smp-node002 and are available via the smp queue.

Please note that these nodes, due to having Intel CPUs have a different architecture than all other nodes on the cluster. Please contact us if you have questions or problems concerning compatibility.

Storage

The cluster has a number of central file systems (central in the sense that they are available from each node). The first, with a total of around 10 TB of storage space, includes the home directories of all users which are limited to 100 GB each. The Workspaces are on a separate but also centralized file system with a total size of around 1 PB. Individual workspaces are unlimited in size but have a limited duration of 30 days (extendable three times by 30 days each time). Additionally, the cluster has a so-called burst buffer, which is a filesystem for computations that need to read or write large amounts data very quickly. The burst buffer consists physically of solid state disks (SSDs) and has a total size of 32 TB. We describe its usage here.

Network

The nodes are connected via a fast Infiniband interconnect and are reachable from the outside via the internet.

Technical data:

Nodes:
- Compute nodes: 439
  - CPUs (2 per node): AMD EPYC 7452, 32 cores, 2.35-3.35 GHz, 128 MB Cache
  - RAM: 256 GB DDR4, 3200 MHz
- GPU nodes: 10
  - CPUs (2 per node): AMD EPYC 7452, 32 cores, 2.35-3.35 GHz, 128 MB Cache
  - RAM: 256 GB DDR4, 3200 MHz
  - GPUs (1/2/4 per node): NVIDIA Tesla V100, 5120 CUDA Cores, 16 GB HBM2-Memory 3200 MHz
- SMP nodes (Fat Nodes): 2
  - CPUs (4 per node): Intel Xeon 5218, 16 cores, 2.3-3.9 GHz, 22 MB Cache
  - RAM: 1536 GB DDR4, 2666/2933 MHz
- Fat nodes: 8
  - CPUs (2 per node): AMD EPYC 7452, 32 cores, 2.35-3.35 GHz, 128 MB Cache
  - RAM: 512 GB DDR4, 3200 MHz
- Login nodes: 4
  - CPUs (2 per node): AMD EPYC 7452, 32 cores, 2.35-3.35 GHz, 128 MB Cache
  - RAM: 512 GB DDR4, 3200 MHz
Total compute power: ca. 1044 TFlop/s (Peak)
Storage:
- Home directories: 8.5 TB
- Workspace directories 1 PB IBM Spectrum Scale
- Burst Buffer: 32 TB SSD Storage
Network:
- Infiniband HDR100
- Ethernet
Power consumption: ca. 240 Kilowatts

Operating system

The operating system on the cluster is CentOS Linux Release 8.6 (as of August 2022).

The cluster is being operated with the Bright Cluster Manager (Version 9.1 as of August 2022).

Aktualisiert um 12:46 am 8. February 2021 von Gerd Pokorra

Cluster-News