The OMNI cluster, like almost all large computers, uses Linux as its operating system (in this case CentOS Linux 8.3). On this page is a brief introduction to Linux with a few general concepts and some common commands, expecially those important for working on the cluster.
The information on this page is mostly identical to the content of our Linux introduction courses as well as the Linux Interactive Video Tutorial which has been co-developed by us. Another very good tutorial can be found here.
This page contains information about the following topics:
- Directory structure of Linux
- File permissions
- Auto-complete function
- Processes
- Some basic console commands
- Special characters
- Commands for navigating directories
- Creating executable scripts
- Custom Linux settings
- General tips and tricks about Linux.
Linux also contains a built-in help mechanism which can be reached from the console. The man
command shows the so-called man page (“Man” for “Manual”) which is a help text built into the program (if the program contains one). For example, the command:
$ man sbatch
will show the man page written by the SLURM developers for the sbatch
command. A man page can be scrolled up and down with the arrow keys and is exited by pressing q
.
Many commands also have an internal help function which is often accessible with <Command name> -h
or <Command name> --help
. Often it is identical to the program’s man page.
Directory structure
The directory structure in Linux is a tree structure, there is a top level directory which is called the root directory or /
. All other directories (also called folders) are subdirectories of root or subdirectories of subdirectories.
This structure is slightly different than that of Windows. While individual hard drives are assigned letters in Windows and the division is clear, Linux uses so-called mounting points: The directory structure is (mostly) identical in every Linux and the different hard drives are said to be mounted at a specific point in that structure. The advantage for you as a user is that you generally do not need to worry about the physical hard drives.
In the special case of the HoRUS cluster there are two directories which you will mostly use: your home directory (/home/<YourUsername>
) and your workspaces (/work/ws-tmp
). If you develop your own software, you might need to link libraries. In this case you might need to know the installation directories for some software packages. Most of the software on the cluster is installed in /cm/shared/apps/
.
Caution: These directory paths may change at any time! It is always more useful to use environment variables if possible instead of hard-coded file paths. The use of environment variables is explained below.
Permissions
What you can do in Linux depends on the user with which you are logged in. Usually you will have exactly one user name which is identical to your ZIMT ID (g-number). The user with the highest permissions in Linux is called superuser or root user (root
). You will never be given root permissions on the cluster, these are only for ZIMT administrators.
Users are assigned to groups. Each user has a primary group and may belong to any number of additional groups. You can show the groups to which you belong with the command id <YourUsername>
.
Every file and every directory in Linux belongs to a user and only that user (or the root
user) decides what can be done with this file. If you create a new file, you are the owner. Each file is also assigned to a group. By default this is the primary group of the user who created it.
There are three types of access permissions for files and directories: read, write and execute. The commands for displaying and showing these permissions will be explained below. The rights can be set separately for three types of users: for the owner, for the group it belongs to, or for everybody else (Other). That way, you can determine that, for example, other users may read a file but not write to it.
Change permissions
The chmod
command will change the permissions for a directory or file (only if you are allowed to change them, of course). There are several different ways to specify the permissions, the simplest one is:
$ chmod u+x ex1.dat
In this example, a user (u
) adds (+
) the permission for themselves to execute (x
) the file ex1.dat
. Alternatives to u
are g
(group), o
(other) and a
(all). Possible permissions are r
for read, w
for write and x
for execute. To remove permissions, a minus -
is used instead of the plus.
Auto-complete
An important feature to save you a lot of work is the auto-complete function of Linux. If you only enter part of a command and then press the Tab key, the command will be automatically completed, as long as it is not ambiguous. For example, it is not possible to enter sb
and press Tab to obtain sbatch
because the cluster has multiple commands that start with sb
. In that case you press the Tab key a second time to obtain a list of all commands starting with sb
:
$ sb
sb sbatch sbcast
As you can see, typing e.g. sba
would be unambiguous. If there are a lot of options, you will be asked if you really want to see the complete list (you can test this by entering s
and pressing Tab twice).
The auto-complete function does not only work with commands but also with directory paths.
Processes
As with Windows, Linux has a large number of processes running at any time, including the ones that you explicitly started (by typing a command). Sometimes it is necessary to monitor the status of a process or to forcefully terminate it. The top
command is for that purpose. top
is roughly equivalent to the Task Manager in Windows.
The interface of top
typically looks similar to the following example:
top - 10:08:18 up 53 days, 23:05, 14 users, load average: 0,12, 0,21, 0,48
Tasks: 334 total, 1 running, 333 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1,6 us, 0,3 sy, 0,0 ni, 98,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st
KiB Mem : 14854156+total, 749016 free, 1722716 used, 14606982+buff/cache
KiB Swap: 12582908 total, 12360456 free, 222452 used. 14465592+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2610 otheruser 20 0 180156 3936 1240 S 4,3 0,0 43:08.94 sshd
2619 otheruser 20 0 180188 3968 1240 S 4,0 0,0 43:01.12 sshd
2291 otheruser 20 0 67804 2576 1800 S 1,0 0,0 9:34.55 sftp-server
13770 demo_user 20 0 168156 2564 1636 R 0,7 0,0 0:00.07 top
39 root 20 0 0 0 0 S 0,3 0,0 1:25.02 ksoftirqd/6
6133 root 20 0 0 0 0 S 0,3 0,0 0:00.84 kworker/u48:1
1 root 20 0 191612 3340 1672 S 0,0 0,0 14:56.12 systemd
2 root 20 0 0 0 0 S 0,0 0,0 0:02.23 kthreadd
3 root 20 0 0 0 0 S 0,0 0,0 0:22.88 ksoftirqd/0
8 root rt 0 0 0 0 S 0,0 0,0 0:01.71 migration/0
9 root 20 0 0 0 0 S 0,0 0,0 0:00.00 rcu_bh
10 root 20 0 0 0 0 S 0,0 0,0 81:45.44 rcu_sched
11 root rt 0 0 0 0 S 0,0 0,0 0:11.99 watchdog/0
In this (shortened and anonymized) view the running processes are listed. top
can be operated with commands consisting of individual letters, for example q
will quit top
. On the left you can see the process ID of each process. This is a unique number that Linux assigns to each process. If you need to terminate a process, you can type k
(for “kill) and enter the process ID, and Linux will forcefully terminate it (provided you have the permissions). Next, the owner of the process is shown. As you can see, there are a lot of system processes which are owned by root
, those normally do not concern you. You can filter the processes so only those of a specific user are shown by typing u
and then the name of the user. The next columns show information about how much memory and CPU power the process is using. Finally, the name of the process is displayed.
Tip: The percentage in the %CPU
column is in relation to a single CPU, not the entire computer. If you run a parallel program (one with multiple threads), the number in this column may be larger than 100%.
Basic console commands
You will primarily interact with Linux, especially on remote systems, via a text interface, the so-called console (also referred to as shell or terminal). Note that in the case of the cluster the default console is the Bash console. Like most Linux systems, the cluster has multiple consoles installed which differ slightly. For example, you can alternatively use the C shell (with the command csh
) if you are more experienced with that.
All commands described here can either be entered by hand into the console, or be written into a text file (a so-called script) one after the other. The shell can then execute this script and the result is the same as if you had typed the commands by hand. This is also called shell-scripting and is one of the major reasons for the popularity of Linux – repetitive tasks can be easily automated.
Special characters
- The pound sign
#
begins a comment. - The asterisk sign
*
is a so-called wildcard and can be used as a placeholder, when arbitrary characters are required. For example, when searching for all PDF files (the search function will be explained in more detail below), one can search for*.pdf
. There are a number of additional wildcards. - The pipe symbol
|
serves to send the output of one command as input into another command. Multiple commands can be chained in this way. - The semicolon
;
separates independent commands from each other. Entering multiple commands with semicolons in between is equivalent to entering these commands one after the other. - The ampersand symbol
&
at the end of a command will execute that command in the background. You can continue working with console while a command is running by pressing Enter again. This is especially useful if you enter a command that opens a window – otherwise, the console would be blocked as long as the window is open.
Navigating around directories
The most common operations in the Linux console are movement between directories and manipulation of files in the current directory. Here are the most important commands that you should know. Independently of the command there are two more special symbols that you should know which are used with file paths: The period .
denotes the current directory and two periods ..
denote the directory one level above the current one.
Note: Linux does differentiate between upper and lower case. A command test
and a command Test
may have different functions. The same is true for file and directory names.
Change directory
You can change to a different directory with the command cd
(for “change directory”). You can either enter a relative (relative to your current position) or an absolute path. You can tell the difference by the fact that absolute paths begin with a slash /
(Note: unlike Windows, Linux uses forward slashes in paths). If the current directory has a subdirectory named Example
you can change to that with
$ cd Example
Due to ..
denoting the directory above, you can also specify a relative path outside the current directory. If you are inside a directory mydir/sub1
, and mydir
contains another subdirectory named mydir/sub2
, you can reach the top-level (parent) directory mydir
with
$ cd ..
or the sibling directory with
$ cd ../dir2
You can also specify an absolute path, for example the previous command is identical to
$ cd /home/demo_user/mydir/dir2
if mydir
is inside demo_user
’s home directory.
Show directory path
With the command pwd
(“Print working directory”) you can display where you are:
$ pwd
/home/demo_user
Show directory contents
The ls
command (for “list”) shows files and subdirectories in the current directory:
$ ls
ex1.txt ex2 ex3.dat
In this example the folder contains a subfolder named ex2
and two files. You can also display the contents of another folder:
$ ls ex2
ex4.txt
Here, the content of ex2
will be displayed, it contains another text file (it does not have to be a subdirectory of the current one)
You can also display additional details. The option -l
will show a table:
$ ls -l
insgesamt 4
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:17 ex1.txt
drwxr-xr-x 2 demo_user hpc-gpr-hiwis 4096 23. Jul 10:24 ex2
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:19 ex3.dat
Here you can see the following additional information: On the left are the permissions which have already been explained. Note that the directory is marked with d
. Next you can see the owner of the file or folder, in this case demo_user
, and the owning group, in this case hpc-gpr-hiwis
. Then you can see the file size in bytes. The two example text files only contain one character each and are therefore only 2 bytes in size. The displayed size for folders is only the amount of metadata about that folder, the size of files contained in the folder is not considered in this figure. Finally you can see the date of the last change and the file or folder name.
Show hidden files and folders
A hidden file or folder in Linux is one whose name begins with .
. These can also be shown with the ls
command when using the option -a
.
$ ls -la
insgesamt 24
drwxr-xr-x 4 demo_user hpc-gpr-hiwis 4096 23. Jul 17:06 .
drwxr-xr-x 56 demo_user hpc-gpr-hiwis 12288 23. Jul 17:06 ..
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:17 ex1.txt
drwxr-xr-x 2 demo_user hpc-gpr-hiwis 4096 23. Jul 10:24 ex2
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 10:19 ex3.dat
-rw-r--r-- 1 demo_user hpc-gpr-hiwis 2 23. Jul 17:06 .hidden_ex.txt
drwxr-xr-x 2 demo_user hpc-gpr-hiwis 4096 23. Jul 10:43 test1
As you can see in this example, it is also possible to combine options. In this case, ls -la
is equivalent to ls -l -a
. This works with many Linux programs, although it is not guaranteed.
Create a directory
The mkdir
command (“Make directory”) will create a directory with the specified name:
$ mkdir test1
$ ls
ex1.txt ex2 ex3.dat test1
Rename or copy directory
The mv
command (“Move”) will move a file or directory. It is also the usual method for renaming things.
$ ls # Original contents
ex1.txt ex2 ex3.dat test1
$
$ mv ex1.txt renamed.txt
$
$ ls # Changed contents
ex2 ex3.dat renamed.txt test1
The command cp
(“Copy”) will copy a directory or file. Unlike mv
, when a directory including all its contents is to be copied, the option -r
(“recursive”) needs to be specified.
Search for directories and files
The find
command will search a folder and all subfolders for files and directories with a specific name. Partial names are also possible.
$ find . -type f -name "ex*"
./ex2/ex4.txt
./ex3.dat
In this example, -type f
is used to only find files, not directories. All files whose name starts with ex
are listed. The find
command also has a number of options to narrow down search results. It also can do things like execute specific commands on each file with -exec <command name> ;
.
Search text inside files
The grep
command is used to search text in text files. To use it, enter
$ grep [Options] "Text" Filename
Strictly speaking, the text only needs to be in quotation marks if it contains spaces. Instead of a single file name, wildcards can also be used (e.g. *.txt
). Important options are for example -r
(searches recursively, i.e. also in subfolders) and -i
(ignores capitalization). A more complete list of options can be shown with grep --help
.
Scripts: creation and execution
A script in the context of Linux is a file in which a number of commands are listed. As already mentioned, practically all commands that can be entered in the console can also be written into a script, and executing the script is identical to calling the listed commands one by one.
A (shell) script always starts with #!<console name>
, e.g. #!/bin/bash
. This is the complete path to the executable with which to run the script. This does not necessarily have to be a Linux shell. For example, a script could also start with #!/usr/bin/python
and would then be executed like a Python script.
To execute a script it needs to be made executable (see the section about file permissions and chmod
). Then it can, if the absolute or relative path is specified, be executed like any other command.
$ ./example.sh
(Environment) variables
In addition to arguments that are explicitly typed into the script, variables can be used. Variables in Bash serve basically the same purpose as in any other programming language and work in a similar way. The main difference to most languages is the fact that to get a variable’s contents, its name needs to be prefixed by $
. Variables are defined with var=Value
Note that there may not be a space on either side of the equals sign. Variables in the Linux console are text strings.
In the following example, several operations are being performed on the same file:
#!/bin/bash
# Variable definition
file1="/home/demo_user/exampledir/ex1.txt"
# Outputs file contents to console.
cat $file1
# Copies the file.
cp $file1 copy_example.txt
This demonstrates the usefulness of variables: If the user wants to apply the same operations to another file, the file name only needs to be changed at one point in the script instead of everywhere.
Environment variables
A variable is only ever available inside the console or script in which it was defined. However there are so-called environment variables that are also available in all sub-processes (processes that were launched by the current process). That way, before a script or program is called, settings can be specified for it. Environment variables are set via export var=Value
. A list of all environment variables that are set can be obtained with the printenv
command.
In every Linux system, a large number of environment variables are set, either by the system or by the installed software. For example, there is always a variable USER
which shows the user name of the person who is logged in.
Command line parameters
In the special case of shell scripts, additional variables are also available. For example, the variables $0
, $1
, $2
and so on recall the arguments with which the script was started (the command line parameters). If a script is started with:
$ Script.sh -f 5.0
the variables would be $0=Script.sh
, $1=-f
, $2=5.0
. That way, you can pass settings directly to the script.
The PATH Variable
The environment variable PATH
has a special function: Whenever a command is entered into the console, the directories listed in PATH
are searched for this command. This also means that a command will not be found if its location is not added to PATH
. To add a directory to PATH
, you can prepend or append the new directory, separated by a colon:
$ export PATH=$PATH:/home/demo_user/exampledir
The order of directories is important because identically named commands may be in multiple directories on the PATH
. The first command found is always executed.
Caution: Mistakes when manipulating PATH
may have severe consequences because important commands may not be available any more.
Custom settings
Settings like exported environment variables or loaded modules only exist as long as the current console is open. If you log out from the cluster, or the script with the settings has finished running, the settings are lost. It is however possible to make settings permanent. The most important way to do that is the file .bashrc
. It is located in your home directory and is called whenever a new Bash console is launched (including upon login). Commands that you enter in that file will then be executed. The .bashrc
file is good for permanently saving environment variables or other settings you need often.
You can also put settings into a shell script and make them available with the command source <script name>
. Of course you can also put that source
command into the .bashrc
file.
Caution: Since the .bashrc
file is called at every login, a faulty .bashrc
can lead to you being unable to log in completely! Always make sure that your settings in .bashrc
do not contain mistakes. You can test settings by entering them into the console by hand. If you make a mistake then, you can reset the previous settings by logging out and logging back in.
General advice
-
The Linux console does not have an “Undo” function. You should always look out for spelling errors and often make backups, especially if you work on a Linux system where you have root permissions. It is entirely possible to destroy an entire Linux installation by accident.
-
File extensions are not as important in Linux as they are in Windows, in particular they are not used to determine what type of file it is. It is however recommended to use consistent extensions (a common one is
.sh
for shell scripts), so a human can see at a glance what the file type is. -
Almost every command in Linux is really a program (or script), even the built-in Linux commands. You can see the location of a program executable with
which <program name>
. This is especially useful if multiple versions of the same software are installed and you want to make sure you are using the correct one. -
You can define commands as a so-called alias, meaning a shorthand for another command. For example
alias myjobs="squeue -u demo_user"
will create a command namedmyjobs
which instructs SLURM to list all jobs fordemo_user
. Like environment variables, aliases need to be in.bashrc
to be permanently available. -
In addition to wildcards there is another way to specify patterns of characters, so-called regular expressions (regex). These allow very complex patterns but are also extremely hard to learn.
-
You can calculate the size of files and directories with
du
(for “Disk Usage”). Important options are for example-h
(for “human-readable”), this will show the size with a unit (e.g.GB
for gigabytes).-s
will only consider the current directory (otherwise, every file in every subfolder would be listed separately, which may become confusing). Example:$ du -sh * 5,4M abaqus 8,0K abaqus_plugins 4,0K abaqus.rpy 32K all_users_2018-02-07.txt 4,0K bin 4,0K bsp.f90
-
Like Windows, Linux allows symbolic links. With
ls -l
you can see what is a link to another file or directory.$ ls -l -rw-r--r-- 1 demo_user hpc-gpr-hiwis 56 30. Jul 09:24 ex3.dat lrwxrwxrwx 1 demo_user hpc-gpr-hiwis 7 1. Aug 09:47 ex3_link.dat -> ex3.dat
You can create your own links with the command
ln -s <target file> <link name>
. -
The
-l
option for thels
command is used so often that many Linux systems, including the one on the cluster, offer a commandll
which is an alias forls -l
. -
The
cd
command without any argument changes to your home directory,cd -
changes to the previous directory (the one before the last call ofcd
). -
The up-arrow and down-arrow keys can be used to scroll through previously entered console commands. The
history
command will list previously entered commands. This is especially useful if you do not remember the syntax of a command that you previously entered. You can usehistory | grep <command name>
to find it again. Bash saves this history in the file~/.bash_history
.