Wiss. Rechnen » HPE Moonshot
 

ZIMT operates a system for High-Throughput Computing (HTC), which is mostly integrated into the OMNI cluster and appears as the HTC partition there. The partition consists of 4 login nodes and 41 compute nodes, which are physically located in special compact blades in an HPE Moonshot 1500 chassis inside the NDC. This page describes how to use them.

The term High-Throughput Computing (HTC) means the computation of a large number of small compute jobs which are usually independent of each other (trivially/embarrassingly parallel). This is in contrast to High-Performance Computing (HPC), which usually means larger jobs with highly interconnected subtasks.

Access and Login

If you have OMNI access, you can also log into the Moonshot nodes.

There are four login nodes with the designations htc001 and htc002 respectively, just like on OMNI there is an alias htc which will bring you onto one of the four login nodes. You should use this alias whenever possible, since the load balancer will always bring you onto the least busy login node. Connection happens via ssh just like on OMNI, by appending .zimt.uni-siegen.de to the node name or the alias.

Caution: logging in via password is only possible from within the University network or the Uni VPN. If you would like to log in from the outside, you need to set up a password-less access via a public/private key pair first. Since your home directory is the same on OMNI and the Moonshot system, you only need to do this once.

The remaining nodes htc003 through htc007 are compute nodes and are not directly accessible from the outside.

Installed Software

In principle, all modules that are installed on OMNI are also available on the HTC nodes. However, due to the different CPU architectures it is not guaranteed that a module works just because it is available.

Caution: ZIMT has not tested all OMNI modules on the HTC nodes and you should always conduct your own tests with a given module before you use it productively.

Running computations

You can run compute jobs on the nodes htc-node001 through htc-node041 in the same way as on OMNI: by queuing SLURM jobs in the htc queue. Job and nodes status in the htc queue can be monitored as usual with squeue and sinfo, both from OMNI and from the HTC nodes. The individual SLURM commands are described here. The default walltime in the HTC queue is 12 hours, the maximum walltime is 24 hours.

Caution: if you do not specify a queue (queue=partition in SLURM terminology), the job will be put into the default queue (short) and will therefore run on OMNI and not on the HTC nodes. You have to include the following line:

#SBATCH --partition=htc

in your job script (or specify the htc partition when calling sbatch) if you want your job to run on the HTC nodes.

Can HTC jobs be queued from OMNI and vice versa?

Partially yes. You can queue jobs as long as the differences in the CPU architecture make no difference. In particular, ZIMT does not currently support cross-compiling.

For example, it should be easily possible to queue a MATLAB job from OMNI into the HTC queue, because MATLAB is installed on both. However, if you want to compile a C or Fortran program, this has to happen on the HTC front end (htc-login01 through htc-login04).

Aktualisiert um 16:32 am 16. March 2021 von Jan Steiner