Wiss. Rechnen » Transition from HoRUS to OMNI
 

On this page we explain what you need to do to transition from the HoRUS to the OMNI cluster. You can also find a list of the most important differences between the two clusters.

Timetable

The cluster will be usable for all university members starting Monday, March 8. The documentation on the cluster website will be published at the same time. Just like on HoRUS, OMNI access using username and password will only be possible from within the Uni network or Uni VPN. However, if you already have a public key on HoRUS, login from the outside will be possible even on OMNI with your existing key.

Caution: the OMNI cluster has undergone a closed testing phase and we expect it to run in a stable way. However we reserve the right to shut the cluster down partially or completely again if necessary. We may also make some adjustments after a familiarization phase of 1-2 months, depending on the user feedback we get until then.

The compute nodes of HoRUS will be turned off on March 31. The pre1 node on HoRUS will still be reachable after that point, so you can still access your data.

Limitations

Although the cluster per se is usable, some features are still not yet or not completely functional. Those are in particular:

  • Jupyter: this software is not yet available due to technical difficulties. We hope that we can offer you Jupyter fuctionality as soon as possible and will inform you.
  • Tensorflow: the software Tensorflow is so tightly coupled with Jupyter that it will only be available when Jupyter is available.
  • CUDA and GPU development tools: these tools are not yet generally available. If you would like to test these tools, please contact us
  • Burst Buffer: the burst buffer is not completely stable yet and might become temporarily unavailable. It is however generally usable. We describe how to use it here.
  • Compute nodes: in order to not overload the data center’s power supply, 60 out of the 434 nodes of OMNI have not been enabled yet.

We also recommend that you review your .bashrc and similar configuration files, as well as your job scripts. You should especially check for modules that are loaded, as many modules have different names now.

Gaining Access

Even if you already have access on HoRUS, you still need to apply for OMNI use.

As an employee, you can request access by clicking “Ressourcen zum Wissenschaftlichen Rechnen (OMNI)” under “Meine Optionen” in the Nutzerkontenverwaltung. You will be redirected afterwards and have to agree to the Terms of Use. We describe the process in detail here.

Students will be able to use the cluster if they are added by an employee, as before. However the process is slightly different: it also involves the Nutzerkontenverwaltung and is explained here.

You will get a welcome e-mail with the cluster address as before. For security reasons, the address is not posted anywhere on the cluster website. The SSH access works as before and is described here.

The HTTP(S) access with which you will be able to reach the Jupyter portal in the future is not yet available. We will inform you about the address of the Jupyter portal as soon as it is ready.

Data Transfer

The home directories on the OMNI cluster will be the same as on HoRUS, your data will therefore be available immediately. Access rights to the home directory will be changed after HoRUS shutdown in a way that only the owner can access it, whereas currently all users can access all home directories.

Data from your workspaces can be transferred from HoRUS to OMNI via the rsync tool. You should proceed as follows:

  1. Log on to OMNI.
  2. Create a new workspace if necessary and change into the intended target directory.
  3. Use the rsync command to copy data from HoRUS. Here is an example:

    rsync -r <Username>@<Horus address or SSH preset>:<Path to your files on Horus> .

    Of course, you can use another directory as the target instead of .. The option -r ensures that all subdirectories of your data are copied.

  4. The transfer should then start. If the transfer is interrupted, either intentionally or unintentionally, you can resume is by calling the rsync again, this time with the --append-verify option:

    rsync --append-verify -r <Username>@<Horus address or SSH preset>:<Path to your files on Horus> .

    This option of resuming after an interruption is the main reason we recommend rsync instead of scp.

New Hardware

The largest change in hardware terms is the fact the almost all nodes now have AMD CPUs instead of Intel CPUs as on HoRUS. The only exception here are the SMP nodes that contain four Intel CPUs each. Your self-written software will mostly likely need to be recompiled before it runs.

The OMNI cluster contains a number of hardware components that were not available before, especially:

  • A number of GPU nodes
  • A so-called burst buffer, meaning a storage partition which consists of SSDs and is intended for computations that need particularly fast file I/O.

You can find the complete hardware specifications of OMNI here.

New Software

There is new software available, including but not limited to:

  • The package manager Conda
  • The container system Singularity
  • The container orchestration system Kubernetes, please contact us if you want to use that.
  • In the near future: Jupyter
  • In the near future: the machine learning library Tensorflow

You can find an overview of the most important software products here.

Other Changes

  • Access rights to home directories will be changed to user only, instead of all users, starting with the shutdown of HoRUS. If your need specific access groups please contact us (hpc-support@zimt.uni-siegen.de).
  • Changes concerning the SLURM queues:
    • The queue default settings have been tweaked.
    • The short queue is now the default.
    • There is now a debug queue which you can use for very short test runs and for debugging.
    • There is now a queue gpu for GPU jobs. How to use the actual GPUs is described here.
    • There is now an expert queue for particularly large jobs. Access to this queue will be granted by the HPC team only on request. Please contact us if you would like to use the expert queue.
  • Although we decide on a case-by-case basis which software we install centrally, we have created a few guidelines that software has to fulfill before we consider installing it. More on that here.
  • By default, there is a compiler module (gnu9) and an MPI module (openmpi4) already loaded. Note that some modules may not appear with a module avail when you unload the compiler and MPI modules. You can always use the search function of the module system to find modules, this can be done with the command module spider <full or partial module name>.

Aktualisiert um 20:40 am 4. March 2021 von Jan Steiner