The HoRUS cluster, like almost all large computers, uses Linux as its operating system (in this case CentOS Linux 7.4.1708). On this page is a brief introduction into Linux with a few common commands, expecially those that concern working on the cluster. A more extensive and very good tutorial can be found here.
Linux also contains a built-in help mechanism which can be reached from the console. The
man command shows the so-called man page (“Man” for “Manual”) which is a help text built into the program (if the program contains one). For example, the command:
$ man sbatch
will show the man page written by the SLURM developers for the
sbatch command. A man page can be scrolled up and down with the arrow keys and is exited by pressing
Many commands also have an internal help function which is often accessible with
<Command name> -h or
<Command name> --help. Often it is identical to the program’s man page.
The directory structure in Linux is a tree structure: there is a top level directory which is called the root directory or
/. All other directories (also called folders) are subdirectories of root or subdirectories of subdirectories.
This structure is slightly different than that of Windows. While individual hard drives are assigned letters in Windows and the division is clear, Linux uses so-called mounting points: the directory structure is (mostly) identical in every Linux and the different hard drives are said to be mounted at a specific point in that structure. The advantage for you as a user is that you generally do not need to worry about the physical hard drives.
In the special case of the HoRUS cluster there are two directories which you will mostly use: your home directory (
/home/<YourUsername>) and your workspaces (
/work/). If you develop your own software, you might need to link libraries. In this case you might need to know the installation directories for some software packages. Most of the software on the cluster is installed in
/cm/shared/apps/. Caution: These directory paths may change at any time! It is always more useful to use environment variables if possible instead of hard-coded file paths. The use of environment variables is explained below.
What you can do in Linux depends on the user with which you are logged in. Usually you will have exactly one user name which is identical to your ZIMT ID (g-number). The user with the highest permissions in Linux is called superuser or root user (
root). You will never be given root permissions on the cluster, these are only for ZIMT administrators.
Users are assigned to groups. Each user has a primary group and may belong to any number of additional groups. You can show the groups to which you belong with the command
Every file and every directory in Linux belongs to a user and only that user (or the
root user) decides what can be done with this file. If you create a new file, you are the owner. Each file is also assigned to a group. By default this is the primary group of the user who created it.
There are three types of access permissions for files and directories: read, write and execute. The commands for displaying and showing these permissions will be explained below. The rights can be set separately for three types of users: for the owner, for the group it belongs to, or for everybody else (Other). That way, you can determine that, for example, other users may read a file but not write to it.
An important way of saving yourself a lot of work is the auto-complete function of Linux. If you only enter part of a command and then press the Tab key, the command will be automatically completed, as long as it is not ambiguous. For example, it is not possible to enter
sb and press Tab to obtain
sbatch because the cluster has multiple commands that start with
sb. In that case you press the Tab key a second time to obtain a list of all commands starting with
$ sb sb sbatch sbcast
As you can see, typing
sba would be unambiguous. If there are a lot of options, you will be asked if you really want to see the complete list (you can test this by entering
s and pressing Tab twice).
The auto-complete function does not only work with commands but also with directory paths.
As with Windows, Linux has a large number of processes running at any time, including the ones that you explicitly started (by typing a command). Sometimes it is necessary to monitor the status of a process or to forcefully terminate it. The
top command is for that purpose.
top is roughly equivalent to the Task Manager in Windows.
The interface of
top typically looks similar to the following example:
top - 10:08:18 up 53 days, 23:05, 14 users, load average: 0,12, 0,21, 0,48 Tasks: 334 total, 1 running, 333 sleeping, 0 stopped, 0 zombie %Cpu(s): 1,6 us, 0,3 sy, 0,0 ni, 98,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st KiB Mem : 14854156+total, 749016 free, 1722716 used, 14606982+buff/cache KiB Swap: 12582908 total, 12360456 free, 222452 used. 14465592+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2610 otheruser 20 0 180156 3936 1240 S 4,3 0,0 43:08.94 sshd 2619 otheruser 20 0 180188 3968 1240 S 4,0 0,0 43:01.12 sshd 2291 otheruser 20 0 67804 2576 1800 S 1,0 0,0 9:34.55 sftp-server 13770 demo_user 20 0 168156 2564 1636 R 0,7 0,0 0:00.07 top 39 root 20 0 0 0 0 S 0,3 0,0 1:25.02 ksoftirqd/6 6133 root 20 0 0 0 0 S 0,3 0,0 0:00.84 kworker/u48:1 1 root 20 0 191612 3340 1672 S 0,0 0,0 14:56.12 systemd 2 root 20 0 0 0 0 S 0,0 0,0 0:02.23 kthreadd 3 root 20 0 0 0 0 S 0,0 0,0 0:22.88 ksoftirqd/0 8 root rt 0 0 0 0 S 0,0 0,0 0:01.71 migration/0 9 root 20 0 0 0 0 S 0,0 0,0 0:00.00 rcubh 10 root 20 0 0 0 0 S 0,0 0,0 81:45.44 rcu_sched 11 root rt 0 0 0 0 S 0,0 0,0 0:11.99 watchdog/0
In this (shortened and anonymized) view the running processes are listed.
top can be operated with commands consisting of individual letters, for example
q will quit
top. On the left you can see the process ID of each process. This is a unique number that Linux assigns to each process. If you need to terminate a process, you can type
k (for “kill) and enter the process ID, and Linux will forcefully terminate it (provided you have the permissions). Next, the owner of the process is shown. As you can see, there are a lot of system processes which are owned by
root, those normally do not concern you. You can filter the processes so only those of a specific use are shown by typing
u and then the name of the user. The next columns show information about how much memory and CPU power the process is using. Finally, the name of the process is displayed.
Tip: The percentage in the
%CPU column is in relation to a single CPU, not the entire computer (or entire node in the case of a cluster). If you run a parallel program (one with multiple threads), the number in this column may be larger than 100%.
You will primarily interact with Linux, especially on remote systems, via a text interface, the so-called console (also referred to as shell or terminal). Note that in the case of the cluster the default console is the Bash console. Like most Linux systems, the cluster has multiple consoles installed which differ slightly. For example, you can alternatively use the C shell (with the command
csh) if you are more experienced with that.
All commands described here can either be entered by hand into the console, or be written into a text file (a so-called script) one after the other. The shell can then execute this script and the result is the same as if you had typed the commands by hand. This is also called shell-scripting and is one of the major reason for the popularity of Linux – repetitive tasks can be easily automated.
- The pound sign
#begins a comment.
- The asterisk sign
*is a so-called wildcard and can be used as a placeholder, when arbitrary characters are required. For example, when searching for all PDF files (the search function will be explained in more detail below), one can search for
- The pipe symbol
|serves to send the output of one command as input into another command. Mulitple commands can be chained in this way.
- The semicolon
;separates independent commands from each other. Entering mulitple commands with semicolons in between is equivalent to entering these commands one after the other.
- The ampersand symbol
&at the end of a command will execute that command in the background. You can continue working with the console while a command is running by pressing Enter again. This is expecially useful if you enter a command that opens a window – otherwise, the console would be blocked as long as the window is open.
Moving around directories
The most common operations in the Linux console are movement between directories and manipulation of files in the current directory. Here are the most important commands that you should know. Independently of the command there are two more special symbols that you should know which are used with file paths: the period
. denotes the current directory and two periods
.. denote the directory one level above the current one.
Note: Linux does differentiate between upper and lower case. A command
test and a command
Test may have different functions. The same is true for file and directory names.
You can change to a different directory with the command
cd (for “change directory”). You can either enter a relative path (meaning relative to your current position) or an absolute one. You can tell the difference by the fact that absolute paths begin with a forward slash
/ (Note: unlike Windows, Linux uses forward slashes in paths). If the current directory has a subdirectory named
example you can change to that with
$ cd Example
.. denoting the directory above, you can also specify a relative path outside the current directory. If you are inside a directory
mydir contains another subdirectory named
mydir/sub2, you can reach the top-level (parent) directory
$ cd ..
oder the sibling directory with
$ cd ../dir2
You can also specify an absolute path, for example the previous command is identical to
$ cd /home/demo_user/mydir/dir2
mydir is inside
demo_user‘s home directory.
Show directory path
With the command
pwd (“Print working directory”) you can display where you are:
$ pwd /home/demo_user
Show directory contents
ls command (for “list”) shows files and subdirectories in the current directory:
$ ls ex1.txt ex2 ex3.dat
In this example the folder contains a subfolder named
ex2 and two files. You can also display the contents of another folder:
$ ls ex2 ex4.txt
Here, the content of
ex2 will be displayed, it contains another text file (it does not have to be a subdirectory of the current one)
You can also display additional details. The option
-l will show a table:
$ ls -l insgesamt 4 -rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:17 ex1.txt drwxr-xr-x 2 demo_user hpc-gpr-hiwis 4096 23. Jul 10:24 ex2 -rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:19 ex3.dat
Here you can see the following additional information: on the left are the permissions which have already been explained. Note that the directory is marked with a
d. Next you can see the owner of the file or folder, in this case
demo_user, and the owning group, in this case
hpc-gpr-hiwis. Then you can see the file size in bytes. The two example text files only contain one character each and are therefore only 2 bytes in size. The displayed size for folders is only the amount of metadata about that folder, the size of files contained in the folder is not considered in this figure. Finally you can see the date of the last change and the file or folder name.
Show hidden files and folders
A hidden file or folder in Linux is one whose name begins with
.. These can also be shown with the
ls command when using the option
$ ls -la insgesamt 24 drwxr-xr-x 4 demo_user hpc-gpr-hiwis 4096 23. Jul 17:06 . drwxr-xr-x 56 demo_user hpc-gpr-hiwis 12288 23. Jul 17:06 .. -rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:17 ex1.txt drwxr-xr-x 2 demo_user hpc-gpr-hiwis 4096 23. Jul 10:24 ex2 -rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:19 ex3.dat -rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 17:06 .hidden_ex.txt drwxr-xr-x 2 demo_user hpc-gpr-hiwis 4096 23. Jul 10:43 test1
As you can see in this example, it is also possible to combine options. In this case,
ls -la is equivalent to
ls -l -a. This works with many Linux programs, although it is not guaranteed.
Create a directory
mkdir command (“Make directory”) will create a directory with the specified name:
$ mkdir test1 $ ls ex1.txt ex2 ex3.dat test1
Rename or copy directory
mv command (“Move”) will move a file or directory. It is also the usual method for renaming things.
$ ls # Urspruenglicher Inhalt ex1.txt ex2 ex3.dat test1 $ $ mv ex1.txt renamed.txt $ $ ls # Geaenderter Inhalt. ex2 ex3.dat renamed.txt test1
cp (“Copy”) will copy a directory or file. Unlike
mv, when a directory including all its contents is to be copied, the option
-r (“recursive”) needs to be specified.
chmod command will change the permission for a directory or file (only if you are allowed to change them, of course). There are several different ways to specify the permissions, the simplest one is:
$ chmod u+x ex1.dat
In this example, a user (
u) adds (
+) the permission for themselves to execute (
x) the file
ex1.dat . Alternatives to
a (all). Possible permissions are
r for read,
w for write and
x for execute. To remove permissions, a minus
- is used instead of the plus.
Search for directories and files
find command will search a folder and all subfolders for files and directories with a specific name. Partial names are also possible.
$ find . -type f -name "ex*" ./ex2/ex4.txt ./ex3.dat
In this example,
-type f is used to only find files, not directories. All files whose name starts with
ex are listed. The
find command also has a number of options to narrow down search results. It also can do things like execute specific commands on each file with
-exec <command name>.
Search text inside files
grep command is used to search text in text files. To use it, enter
$ grep [options] "Text" filename
Strictly speaking, the text only needs to be in quotation marks if it contains spaces. Instead of a single file name, wildcards can also be used (e.g.
*.txt). Important options are for example
-r (searches recursively, i.e. also in subfolders) and
-i (ignores capitalization). A more complete list of options can be shown with
A script in the context of Linux is a file in which a number of commands are listed. As already mentioned, practically all commands that can be entered in the console can also be written into a script, and executing the script is identical to calling the listed commands one by one.
A (shell) script always starts with
#!<console name>, e.g.
#!/bin/bash. This is the complete path to the executable with which to run the script. This does not necessarily have to be a Linux shell. For example, a script could also start with
#!/usr/bin/python and would then be executed like a Python script.
To execute a script it needs to be made executable (see above in the section about file permissions). Then it can, if the absolute or relative path is specified, be executed like any other command.
In addition to arguments that are explicitly typed into the script, variables can be used. Variables in Bash serve basically the same purpose as in any other programming language and work in a similar way. The main difference to most languages is the fact that to get a variable’s contents its name needs to be prefixed by
$. Variables are defined with
var=Value. Note that there may not be a space on either side of the equals sign. Variables in the Linux console are text strings.
In the following example, several operations are being performed on the same file:
# Variable definition file1="/home/demo_user/exampledir/ex1.txt" # Output the entire text file to the console. cat $file1 # Copy the file. cp $file1 copy_example.txt
This demonstrates the usefulness of variables: if the user wants to apply the same operations to another file, the file name only needs to be changed at one point in the script instead of everywhere.
A variable is only ever available inside the console or script in which it was defined. However there are so-called environment variables that are also available in all sub-processes (processes that were launched by the current process). That way, before a script or program is called, settings can be specified for it. Environment variables are set via
export var=Value. A list of all environment variables that are set can be obtained with the
In every Linux system, a large number of environment variables are set, either by the system or by the installed software. For example, there is always a variable
USER which shows the user name of the person who is logged in.
Command line parameters
In the special case of shell scripts, additional variables are also available. For example, the variables
$2 and so on up to
$9 recall the arguments with which the script was started (the command line parameters). If a script is started with:
$ Skript.sh -f 5.0
the variables would be
$2=5.0. That way, you can pass settings directly to the script.
The PATH Variable
The environment variable
PATH has a special function: whenever a command is entered into the console, the directories listed in
PATH are searched for this command. This also means that a command will not be found if its location is not added to
PATH. To add a directory to
PATH, you can prepend or append the new directory, separated by a colon:
$ export PATH=$PATH:/home/demo_user/exampledir
The order of directories is important because identically named commands may be in multiple directories on the
PATH. The first command found is always executed.
Caution: Mistakes when manipulating
PATH may have severe consequences because important commands may not be available any more.
Settings like exported environment variables or loaded modules only exist as long as the current console is open. If you log out from the cluster or the script with the settings has finished running, the settings are lost. It is however possible to make settings permanent. The most important one is the file
.bashrc. It is located in your home directory and is called whenever a new Bash console is launched (including upon login). Commands that you enter in that file will then be executed. The
.bashrc file is good for permanently saving environment variables or other settings you need often.
You can also put settings into a shell script and make them available with the command
source <script name>. Of course you can also put that
source command into the
Caution: Since the
.bashrc file is called at every login, a faulty
.bashrc can lead to you being unable to log in completely! Always make sure that your settings in
.bashrc do not contain mistakes. You can test settings by entering them into the console by hand. If you make a mistake then, you can reset the previous settings by logging out and logging back in.
- The Linux console does not have an “Undo” function. You should always look out for spelling errors and often make backups, especially if you work on a Linux system where you have root permissions. It is entirely possible to destroy an entire Linux installation by accident.
- File extensions are not as important in Linux as they are in Windows, in particular they are not used to determine what type of file it is. It is however recommended to use consistent extensions (a common one is
.shfor shell scripts), so a person can see at a glance what the file type is.
- Every command in Linux is really a program (or script), even the built-in Linux commands. You can see the location of a program executable with
which <program name>. This is especially useful if multiple versions of the same software are installed and you want to make sure you are using the correct one.
- You can define commands as a so-called alias, meaning a shorthand for another command. For example
alias myjobs="squeue -u demo_user"will create a command named
myjobswhich instructs SLURM to list all jobs for
demo_user. Like environment variables, aliases need to be in
.bashrcto be permanently available.
- In addition to wildcards there is another way to specify patterns of characters, so-called regular expressions (regex). These allow very complex patterns but are also extremely hard to learn.
- You can calculate the size of files and directories with
du(for “Disk Usage”). Important options are for example
-h(for “human-readable”), this will show the size with a unit (e.g.
-swill only consider the current directory (otherwise, every file in every subfolder would be listed separately, which may become confusing). Example:
$ du -sh * 5,4M abaqus 8,0K abaqus_plugins 4,0K abaqus.rpy 32K all_users_2018-02-07.txt 4,0K bin 4,0K bsp.f90
- Like Windows, Linux allows symbolic links. With
ls -lyou can see what is a link to another file or directory.
$ ls -l -rw-r--r-- 1 demo_user hpc-gpr-hiwis 56 30. Jul 09:24 ex3.dat lrwxrwxrwx 1 demo_user hpc-gpr-hiwis 7 1. Aug 09:47 ex3_link.dat -> ex3.dat
You can create your own links with the command
ln -s <target file> <link name>.
-loption for the
lscommand is used so often that many Linux systems, including the one on the cluster, offer a command
llwhich is an alias for
cdcommand without any argument changes to your home directory,
cd -changes to the previous directory (the one before the last call of
- The up-arrow and down-arrow keys can be used to scroll through previously entered console commands. The
historycommand will list previously entered commands. This is especially useful if you do not remember the syntax of a command that you previously entered. You can use
history | grep <command name>to find it again. Bash saves this history in the file