The following instructions are intended for new users of the OMNI cluster. Some basic terms are introduced and links to more information are provided. Specifically, it is explained what a cluster is, how usage differs from a normal computer, how you get onto the cluster and where to turn if you have questions.
A cluster is essentially a large computer that is made up of many smaller computers. A cluster consists of so-called nodes, each node has its own RAM (memory) and a specific number of processors, also called cores or CPUs. The OMNI cluster has 434 compute nodes for running computations, 4 login nodes which are directly accessible to users, and a number of specialized nodes. The normal compute nodes have 64 CPUs and 256 GB RAM each, a detailed list is here. On the OMNI cluster, individual nodes do not have their own hard drives, rather they share a central file system.
Our cluster, like almost all clusters, is operated with Linux. If you do not know Linux well (yet), we offer multiple ways of familiarizing yourself.
First, we offer a Linux introduction course each semester, alternating between German and English. Our English-language Linux course usually takes place in mid-July. You can find a list of our courses here.
Finally, we explain some basic terms here as well.
Many things about a cluster are identical to any other (Linux) computer. There are however some differences. The most important one is the fact that you do not sit in front of the cluster, but connect to it from another computer. This is shown in the following image:
As you can see, you always connect to one of the login nodes. You typically only interact with compute nodes via the scheduler SLURM, which is described below. You can also see the shared file system.
Here are some additional differences:
Like on any Linux system, you have a home directory on the cluster. However when you run computations, you should create a so-called workspace. Workspaces are physically on another part of the file system which has a faster connection to the compute nodes. They are also unlimited in size, unlike your home directory. They have a limited duration however and are deleted after expiring. You can find more about workspaces here.
Operating systems like Linux use so-called environments in order to decide which program is run when a specific command is typed. Paths to executable files and various settings, among other things, are stored in environment variables. Since many users with different needs work on a cluster, often different versions of the same software are installed. If everyone then used the same environment, commands would be ambiguous. The environment on the cluster is therefore modular and can be exchanged easily.
For many programs installed on the cluster, environments are pre-defined in so-called modules. You can then simply load a module to obtain the environment. How to do that is described here.
The cluster is available to university members free of charge. You can also enter students who may then use the cluster as well. All you need to do is register your user account for cluster usage and set up a connection to the cluster.
Registering your acount is described for both employees and students here.
Connecting happens via the Secure Shell Protocol (SSH) and is in principle possible from Windows, Linux and Mac OS systems. Setting up an SSH connection is described here.
The address of the cluster will be e-mailed to you upon registration. The cluster is available from both the university network and the internet.
How do I run computations on the cluster?
Computations on the cluster are run in so-called jobs. You define how many resources (CPUs, RAM) your job needs and for how long. You also typically provide a job script which details which program(s) are to be run in the job. You then put the job into a waiting queue und the Scheduler SLURM decides, based on job size and other factors, when to run it. How to create job scripts and queue jobs is explained here, more information on SLURM is here.
The cluster website covers the most important topics and provides links to the documentation of installed software. If you are looking for help for a specific Linux command, built-in help functions are also available, especially the manual function of Linux, which you can access with the command
man <command name>. If the developer has written a man page for their software, it is then displayed.
If the website does not answer your question, you can send an e-mail to
email@example.com or visit our weekly consultation hour. These two are also your best way for reporting problems and software installation requests on the cluster.
ZIMT also offers a training course schedule concerning high performance computing and cluster usage. Upcoming courses are listed here.
Additionally, ZIMT offers consulting for development and optimization of your software. ZIMT experts are available to review your software or otherwise advise you in person. If you would like consulting, you can also send an e-mail to