The cluster has multiple file systems which serve different purposes. For normal users, this means your data is in one of the following directories (and their subdirectories):

  • You home directory in /home/<YourUsername>. This is the default directory for all your data. Home directories are limited to 100 GB per user. You can find more about home directories below.
  • The workspace directory in /work. Workspaces are for short-term storage of large amounts of data. Every time you create a workspace, a subdirectory is created here for you. That subdircetory is called your workspace. Workspaces are not limited in size, but are limited in time. The total disk space available on the cluster is 1 Petabyte. More on this below in the Workspaces section.
  • The burst buffer directory /fast. This directory is physically located on a partition with solid state disks (SSDs). It is intended for computations where large amounts of data need to be moved quickly. The burst buffer is only 32 TB in size. More on using the burst buffer below in the Burst Buffer section.
  • The Groupspaces are directories for teams/departments, which can be found under the /group directory. The leader of the team/department must request for the Groupspaces to HPC-Team. Further information can be found in the Groupspaces section.
  • Your NAS or XNAS (available on OMNI at / nas and / xnas, respectively), if you have gotten one from ZIMT. Note that these directories should only be used for data transfer, they are not fast enough to be used for computations or software installations. See the NAS section below for more information.

The file systems are the same from every login and compute node.

Additionally, every node has local storage where temporary files are stored and removed after job termination. Read about this in the TMP directory section.

Home directory

Any user automatically gets a home directory, into which you can put your data. This directory has the path /home/<YourUsername>. The size of your home directory is limited to 100 GB.

Home directory snapshots

Daily snapshots are made of all home directories, these snapshots are deleted after 30 days. If you lose files in your home directory, you can change to the directory /home/.snapshot. In that directory you can find the daily snapshots, each in its subfolder. Caution: These snspshots should not be regarded as reliable backups. We recommend you backup your data yourself on another computer in addition to the snapshots.

You can simply copy files that you want to restore back to your normal home directory:

cd ./home/.snapshot/daily_<Datum>_0010/<Your Username>
cp <File or files> <Your home directory>

Example:

cd /home/.snapshot/daily.2020-08-04_0010/demo_user/
cp file1 file2 /home/demo_user

To copy entire folders with their content (recursively) use the option -r. Please note that existing data at the target location may potentially be overwritten. The cp command offers options like e.g. -i to give you more control over this. You can show the help with man cp.

Example: Copy folder exampledir back to home and confirm overwriting for every file:

cd /home/.snapshot/daily.2020-08-04_0010/demo_user/
cp -i -r exampledir /home/demo_user

Example: Restore all deleted files in folder exampledir without overwriting any existing files

cd /home/.snapshot/daily.2020-08-04_0010/demo_user/
cp -n -r exampledir /home/demo_user

Caution: On the HoRUS cluster it was possible to read other peoples’ home directories. This is no longer the case on OMNI. You can however continue to make files and directories available for other users with the chmod command (see also Linux basics).

Workspaces

For your compute jobs it is recommended not to use the home directory but rather to create a so-called workspace. This has two advantages: First, there is no size limit for workspaces, and second, the workspaces are located physically on another hard drive with a faster connection to the compute nodes. Workspaces have a limited duration: after the workspace expires, it is deleted. You can extend this duration up to three times.

Caution: There is no automatic backup for workspaces!

Create and extend workspaces

You can create a new workspace with the command

ws_allocate <WS name> <duration>

where the duration has to be given in days. The maximum possible duration without extensions is 30 days.

Caution: if you leave out the duration, the workspace will only be allocated for one day only.

The workspace will be created in a subdirectory of /work/ws-tmp/ and its name consists of your username and the workspace name specified by you. The workspace is available like any other folder with the cd command. In the following example:

$ ws_allocate test1 4
Info: creating workspace.
/work/ws-tmp/demo_user-test1
remaining extensions  : 3
remaining time in days: 4

you can see that a workspace named test1 with an initial duration of 4 days has been created and is available via cd /work/ws-tmp/demo_user-test1.

If you want to extend an existing workspace, you need to enter

ws_extend <WS name> <duration>

with the name of an existing workspace and a new duration. You can extend the duration three times, by a maximum of 30 days each time. If you enter the name of a workspace that does not exist, it will be created as if you had used ws_allocate.

The ws_allocate command also has some additional features which you can see with man ws_allocate.

Selecting the filesystem manually

Unlike HoRUS, you can select the filesystem for your workspace on OMNI. The regular workspace filesystem is called work, the burst buffer described below is called fast. You can use the command ws_list -l to display the available file systems.

When doing a ws_allocate or ws_extend, you can specify the filesystem with the -F option. If you do not use -F , the default (work) will be used:

ws_allocate -F [work|fast] <WS name>  <duration>
ws_extend -F [work|fast] <WS name>  <duration>

Caution: When you extend a workspace, you need to specify the same file system as for the original ws_allocate. That means, if, for example, you allocated a workspace with ws_allocate -F fast, you also need to extend it with ws_extend -F fast. If the workspace was originally created without the -F option, then the default (work) was used and you do not need to specify it again.

E-Mail notifications

The workspace mechanism can send you an e-mail before a workspace expires.

We recommend that you always use this function to avoid data losses.

The corresponding command is then:

ws_allocate <WS name> <duration> -r <number of days> -m <your e-mail address>

With the option -m you can specify the e-mail address and with -r you specify how many days before expiration you want to be warned. If you do not want to re-enter your e-mail address every time you can put a text file named .ws_user.conf in your home directory. In that file you write your address according to the following example:

mail: demo_user@uni-siegen.de

Note that there needs to be a space after the colon (YAML syntax).

You can also create a calender entry with

ws_send_ical <WS name> <e-mail address>

List your workspaces

You can list your existing workspaces by entering

ws_list

Release (delete) a workspace

If you do not need a workspace any more, you can release it. Caution: all data in this workspace will be unavailable from that point on.

To do that you can use the command

ws_release <Workspace name>

Restore a deleted workspace

As mentioned above, expired workspaces are not available any more, but the data are not deleted immediately. The data inside a workspace will be kept for 10 days, even after the workspace expired or was released, before being deleted completely. Therefore it is possible to restore the data if a workspace expired accidentally. To do that, follow these steps:

  1. You can list your expired workspaces via:

    $ ws_restore -l 
    <user>-<old-workspace>-<number>
            unavailable since Tue Jun 12 09:30:01 2018
  2. Create a new workspace:

    ws_allocate <new-workspace> <duration>
  3. Restore the expired workspace with the command ws_restore inside the new workspace. For that, you need the complete name of the old workspace (which includes your user name and an ID number), which you can get via ws_restore -l:

    ws_restore <user>-<old-workspace>-<number> <new-workspace>

    The new workspace will contain the old one in a subdirectory.

  4. Type the displayed text. This serves to make automatic workspace restoration impossible.

Burst Buffer

The OMNI cluster has a so-called burst buffer, meaning a fast storage partition. It consists of SSDs and has a size of 32 TB.

There are two things you need to be aware of in relation to the burst buffer

  • The burst buffer is less stable than the other filesystems, therefore you should move the data to a regular workspace as soon as your computation is complete.
  • The burst buffer with its 32 TB size is not very large and is shared betwen all cluster users. Please only use it if you really need the faster speed.

Creating a burst buffer workspace

Functionally, your directories on the burst buffer are also workspaces. Most commands therefore work identically to the previous section. To create a workspace in the burst buffer you need to use ws_allocate as usual, but you need to specify that you want to use the /fast filesystem.

ws_allocate -F fast <WS name> <duration>

Note that you also need to specify the -F fast option for a ws_extend on the burst buffer.

You can use the command ws_list -l to display a list of all file systems where you can create workspaces:

$ ws_list -l
available filesystems:
fast
work (default)

Temporary directory

Applications often create temporary data. As a user, we are usually not aware of them, because the operating system provides a special TMP directory (/tmp) for these files. Each node has a TMP directory in local storage, not accessable by any other node. Temporary data is not needed after program termination and therefore ordinarily deleted by the application itself.

In the past, it occasionally happened that temporary data was not removed properly after job termination and caused local storage to run full. We have since implemented a mechanism that automatically cleans up the directory without interfering with other jobs.

Most applications make use of well-established environment variables to determine storage location for temporary data. In case your application or self-written program/script does not use these variables, take note of the information below:

  • At job start a distinct directory is created at /tmp/slurm_<user-id>.<job-id> on each participating compute node. After the job has finished, this directory is automatically deleted.

  • Path to the temporary directory is set for the environment variables $TMP, $TEMP, $TMPDIR, $TMP_DIR, $TMPSESS.

    Please modify any applications where a temporary directory is explicitly mentioned, e.g. in a config file or via program option (-tmp_dir=$TMP / -temp=$TMP).

    Furthermore, scripts which directly use /tmp need to be modified to use the environment variable TMP.

  • When starting a Shell in interactive mode (srun --pty ... /bin/bash), the environment variables need to be set manually. You may use the following command: export TMP=/tmp/slurm_${SLURM_JOB_USER}.${SLURM_JOB_ID}

  • To check whether your data is placed at the right location (in /tmp/slurm_<user-id>.<job-id>, and not directly in /tmp), connect to one of the compute nodes of a running job and search for your files.

    List all your jobs and get the corresponding compute nodes:
    squeue -u <user-id>
    Connect to one of the nodes participating in the job:
    ssh hpc-nodeXXX
    Find your data in all readable file under /tmp:
    find /tmp ! -readable -prune -o -user <user-id> -print
    Close connection to compute node: exit
    All data directly located in /tmp will not be removed automatically. Please review your scripts and application to set the temporal storage path properly or manually delete your files at the end of your job (see next bullet).

  • Please delete your temporary files manually, in case redirection does not work for you. For example add the following commands to the end of your job script or in place them in a separate script which you call in your job script:

    job_list=`/cm/shared/apps/slurm/current/bin/squeue --noheader --format=%i --user=$USER --node=localhost` || exit 0
    if [ -z "$job_list" ] ; then
        rm -rf /tmp/.
    fi

    This will remove only your data in /tmp and only if you have no other job running on that node.

Groupspaces

For teams and departments, we offer the possibility to set up a groupspace, which is a directory on the OMNI cluster. Only the members of your group can have access to this directory. This allows you to centrally manage software installations and data for your workgroup on the OMNI. A groupspace behaves similarly to your home directory. It is accessible on both the login and compute nodes and can be accessed from outside via ssh. This differentiates it from groupdrives (XNAS), which can only be used from login nodes. The size of the groupspace is limited to 100 GB.

A group directory can be set up on request via support@zimt.uni-siegen.de. The owner and responsible person of the group directory is the person who leads the team/department (professor, etc.). This person must be authorized to access OMNI as described here. In addition, this person can add or remove group members via the ZIMT Self-Service-Portal. Note: The group leader can request only one groupspace.

If your team/department has already set up its group for XNAS, you can use the same group for your groupspace. However, if your group does not have one, we will create a new group when you apply, using the naming scheme:hpc_<AG-Name>.

You will then find your groupspace on OMNI under /group/<AG-Name>.

A groupspace is suitable in the following cases:

  • If your workgroup uses software that should only be accessible to the group members (e.g. for licensing reasons).
  • If your work group uses software that cannot be installed centrally by the HPC-Team (for example, because it would require a disproportionately high maintenance effort, or because nobody outside the work group uses it).
  • If several people use the same input data or other resources.
  • If one of the group members installs a software but is also used by other group members.

Software-Installation

The installation of software by you as a workgroup is explicitly allowed in the groupspace. The HPC-Team will be pleased to advise and support you. However, we cannot undertake the complete installation on your behalf but, we can provide help for troubleshooting as a part of the usual support (e.g. in the consultation hours or in separate consultation meetings).

You can modify the group ownership of files using the chgrp command. Caution: By default, new files and directories are created with the ownership of the primary group of the creating user. The primary group for the most OMNI users is unix-user. Therefore in general, you need to change the group ownership to your workgroup for all newly created files and directories.

Permissions for the groupspace and its subfolders and files can be changed using the chmod command. Of course, files that are to be used by all the members of the workgroup need the appropriate group permissions. You can set permissions for the group with the g-Option, for example:

chmod g+x <FileName>

would make the file executable for the group. More details can be found in the manual page of chmod command.

University Storage Services (NAS/XNAS)

In order to make data transfer easier for the users of the OMNI cluster, we offer the option to use the storage services NAS and XNAS on the cluster. The access to these network shares on the OMNI is meant to be used for data transfers only, which is why the network shares are only accessible from the login nodes. It is important to note that we do not provide automatic backup of those network shares, users are responsible for their own backups when using them. It is also not possible to make the network shares accessible to third parties. However it is possible to make the network shares accessible to members of the internal group, should you use the XNAS option.

In order to access the booked storage services from the OMNI cluster, users have to execute the command kinit and enter the password of their ZIMT account:

$ kinit
Password for <Username>@UNI-SIEGEN.DE:

Pleas not that your password will not be shown when you enter it.

After executing the kinit command the network shares will be accessible at the path /nas/<Username> for NAS and /xnas/<Username> for the XNAS option respectively.

Aktualisiert um 17:03 am 8. February 2021 von Gerd Pokorra