Wiss. Rechnen » Linux Basics
 

The HoRUS cluster, like almost all large computers, uses Linux as its operating system (in this case CentOS Linux 7.4.1708). On this page is a brief introduction into Linux with a few common commands, expecially those that concern working on the cluster. A more extensive and very good tutorial can be found here.

This page contains information about the following topics: the directory structure of Linux, some basic console commands, tips for creating executable scripts und general advice about Linux.

Linux also contains a built-in help mechanism which can be reached from the console. The man command shows the so-called man page (“Man” for “Manual”) which is a help text built into the program (if the program contains one). For example, the command:

$ man sbatch

will show the man page written by the SLURM developers for the sbatch command. A man page can be scrolled up and down with the arrow keys and is exited by pressing q.

Many commands also have an internal help function which is often accessible with <Command name> -h or <Command name> --help. Often it is identical to the program’s man page.

Directory structure

The directory structure in Linux is a tree structure: there is a top level directory which is called the root directory or /. All other directories (also called folders) are subdirectories of root or subdirectories of subdirectories.

This structure is slightly different than that of Windows. While individual hard drives are assigned letters in Windows and the division is clear, Linux uses so-called mounting points: the directory structure is (mostly) identical in every Linux and the different hard drives are said to be mounted at a specific point in that structure. The advantage for you as a user is that you generally do not need to worry about the physical hard drives.

In the special case of the HoRUS cluster there are two directories which you will mostly use: your home directory (/home/<YourUsername>) and your workspaces (/work/). If you develop your own software, you might need to link libraries. In this case you might need to know the installation directories for some software packages. Most of the software on the cluster is installed in /cm/shared/apps/Caution: These directory paths may change at any time! It is always more useful to use environment variables if possible instead of hard-coded file paths. The use of environment variables is explained below.

Permissions

What you can do in Linux depends on the user with which you are logged in. Usually you will have exactly one user name which is identical to your ZIMT ID (g-number). The user with the highest permissions in Linux is called superuser or root user (root). You will never be given root permissions on the cluster, these are only for ZIMT administrators.

Users are assigned to groups. Each user has a primary group and may belong to any number of additional groups. You can show the groups to which you belong with the command id <YourUsername>.

Every file and every directory in Linux belongs to a user and only that user (or the root user) decides what can be done with this file. If you create a new file, you are the owner. Each file is also assigned to a group. By default this is the primary group of the user who created it.

There are three types of access permissions for files and directories: read, write and execute. The commands for displaying and showing these permissions will be explained below. The rights can be set separately for three types of users: for the owner, for the group it belongs to, or for everybody else (Other). That way, you can determine that, for example, other users may read a file but not write to it.

Auto-complete

An important way of saving yourself a lot of work is the auto-complete function of Linux. If you only enter part of a command and then press the Tab key, the command will be automatically completed, as long as it is not ambiguous. For example, it is not possible to enter sb and press Tab to obtain sbatch because the cluster has multiple commands that start with sb. In that case you press the Tab key a second time to obtain a list of all commands starting with sb:

$ sb
sb sbatch sbcast

As you can see, typing sba would be unambiguous. If there are a lot of options, you will be asked if you really want to see the complete list (you can test this by entering s and pressing Tab twice).

The auto-complete function does not only work with commands but also with directory paths.

Processes

As with Windows, Linux has a large number of processes running at any time, including the ones that you explicitly started (by typing a command). Sometimes it is necessary to monitor the status of a process or to forcefully terminate it. The top command is for that purpose. top is roughly equivalent to the Task Manager in Windows.

The interface of top typically looks similar to the following example:

top - 10:08:18 up 53 days, 23:05, 14 users, load average: 0,12, 0,21, 0,48 Tasks: 334 total, 1 running, 333 sleeping, 0 stopped, 0 zombie %Cpu(s): 1,6 us, 0,3 sy, 0,0 ni, 98,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st KiB Mem : 14854156+total, 749016 free, 1722716 used, 14606982+buff/cache KiB Swap: 12582908 total, 12360456 free, 222452 used. 14465592+avail Mem

PID   USER      PR NI VIRT    RES  SHR  S %CPU %MEM TIME+    COMMAND 
2610  otheruser 20 0  180156  3936 1240 S 4,3  0,0  43:08.94 sshd 
2619  otheruser 20 0  180188  3968 1240 S 4,0  0,0  43:01.12 sshd 
2291  otheruser 20 0  67804   2576 1800 S 1,0  0,0  9:34.55  sftp-server 
13770 demo_user 20 0  168156  2564 1636 R 0,7  0,0  0:00.07  top 
39    root      20 0  0       0    0    S 0,3  0,0  1:25.02  ksoftirqd/6 
6133  root      20 0  0       0    0    S 0,3  0,0  0:00.84  kworker/u48:1 
1     root      20 0  191612  3340 1672 S 0,0  0,0  14:56.12 systemd 
2     root      20 0  0       0    0    S 0,0  0,0  0:02.23  kthreadd 
3     root      20 0  0       0    0    S 0,0  0,0  0:22.88  ksoftirqd/0 
8     root      rt 0  0       0    0    S 0,0  0,0  0:01.71  migration/0 
9     root      20 0  0       0    0    S 0,0  0,0  0:00.00  rcubh 
10    root      20 0  0       0    0    S 0,0  0,0  81:45.44 rcu_sched 
11    root      rt 0  0       0    0    S 0,0  0,0  0:11.99  watchdog/0

In this (shortened and anonymized) view the running processes are listed. top can be operated with commands consisting of individual letters, for example q will quit top. On the left you can see the process ID of each process. This is a unique number that Linux assigns to each process. If you need to terminate a process, you can type k (for “kill) and enter the process ID, and Linux will forcefully terminate it (provided you have the permissions). Next, the owner of the process is shown. As you can see, there are a lot of system processes which are owned by root, those normally do not concern you. You can filter the processes so only those of a specific use are shown by typing u and then the name of the user. The next columns show information about how much memory and CPU power the process is using. Finally, the name of the process is displayed.

Tip: The percentage in the %CPU column is in relation to a single CPU, not the entire computer (or entire node in the case of a cluster). If you run a parallel program (one with multiple threads), the number in this column may be larger than 100%.

Basic console commands

You will primarily interact with Linux, especially on remote systems, via a text interface, the so-called console (also referred to as shell or terminal). Note that in the case of the cluster the default console is the Bash console. Like most Linux systems, the cluster has multiple consoles installed which differ slightly. For example, you can alternatively use the C shell (with the command csh) if you are more experienced with that.

All commands described here can either be entered by hand into the console, or be written into a text file (a so-called script) one after the other. The shell can then execute this script and the result is the same as if you had typed the commands by hand. This is also called shell-scripting and is one of the major reason for the popularity of Linux – repetitive tasks can be easily automated.

Special characters

  • The pound sign # begins a comment.
  • The asterisk sign * is a so-called wildcard and can be used as a placeholder, when arbitrary characters are required. For example, when searching for all PDF files (the search function will be explained in more detail below), one can search for *.pdf. There are a number of additional wildcards.
  • The pipe symbol | serves to send the output of one command as input into another command. Mulitple commands can be chained in this way.
  • The semicolon ; separates independent commands from each other. Entering mulitple commands with semicolons in between is equivalent to entering these commands one after the other.
  • The ampersand symbol & at the end of a command will execute that command in the background. You can continue working with the console while a command is running by pressing Enter again. This is expecially useful if you enter a command that opens a window – otherwise, the console would be blocked as long as the window is open.

Moving around directories

The most common operations in the Linux console are movement between directories and manipulation of files in the current directory. Here are the most important commands that you should know. Independently of the command there are two more special symbols that you should know which are used with file paths: the period . denotes the current directory and two periods .. denote the directory one level above the current one.

Note: Linux does differentiate between upper and lower case. A command test and a command Test may have different functions. The same is true for file and directory names.

Change directory

You can change to a different directory with the command cd (for “change directory”). You can either enter a relative path (meaning relative to your current position) or an absolute one. You can tell the difference by the fact that absolute paths begin with a forward slash / (Note: unlike Windows, Linux uses forward slashes in paths). If the current directory has a subdirectory named example you can change to that with

$ cd Example

Due to .. denoting the directory above, you can also specify a relative path outside the current directory. If you are inside a directory mydir/sub1, and mydir contains another subdirectory named mydir/sub2, you can reach the top-level (parent) directory mydir with

$ cd ..

oder the sibling directory with

$ cd ../dir2

You can also specify an absolute path, for example the previous command is identical to

$ cd /home/demo_user/mydir/dir2

if mydir is inside demo_user‘s home directory.

Show directory path

With the command pwd (“Print working directory”) you can display where you are:

$ pwd /home/demo_user

Show directory contents

The ls command (for “list”) shows files and subdirectories in the current directory:

$ ls 
ex1.txt ex2 ex3.dat

In this example the folder contains a subfolder named ex2 and two files. You can also display the contents of another folder:

$ ls ex2 
ex4.txt

Here, the content of ex2 will be displayed, it contains another text file (it does not have to be a subdirectory of the current one)

You can also display additional details. The option -l will show a table:

$ ls -l 
insgesamt 4 
-rw-r--r-- 1 demo_user hpc-gpr-hiwis    2 23. Jul 10:17 ex1.txt 
drwxr-xr-x 2 demo_user hpc-gpr-hiwis 4096 23. Jul 10:24 ex2 
-rw-r--r-- 1 demo_user hpc-gpr-hiwis    2 23. Jul 10:19 ex3.dat

Here you can see the following additional information: on the left are the permissions which have already been explained. Note that the directory is marked with a d. Next you can see the owner of the file or folder, in this case demo_user, and the owning group, in this case hpc-gpr-hiwis. Then you can see the file size in bytes. The two example text files only contain one character each and are therefore only 2 bytes in size. The displayed size for folders is only the amount of metadata about that folder, the size of files contained in the folder is not considered in this figure. Finally you can see the date of the last change and the file or folder name.

Show hidden files and folders

A hidden file or folder in Linux is one whose name begins with .. These can also be shown with the ls command when using the option -a.

$ ls -la 
insgesamt 24 
drwxr-xr-x 4  demo_user hpc-gpr-hiwis 4096  23. Jul 17:06 . 
drwxr-xr-x 56 demo_user hpc-gpr-hiwis 12288 23. Jul 17:06 .. 
-rw-r--r-- 1  demo_user hpc-gpr-hiwis 2     23. Jul 10:17 ex1.txt 
drwxr-xr-x 2  demo_user hpc-gpr-hiwis 4096  23. Jul 10:24 ex2 
-rw-r--r-- 1  demo_user hpc-gpr-hiwis 2     23. Jul 10:19 ex3.dat 
-rw-r--r-- 1  demo_user hpc-gpr-hiwis 2     23. Jul 17:06 .hidden_ex.txt 
drwxr-xr-x 2  demo_user hpc-gpr-hiwis 4096  23. Jul 10:43 test1

As you can see in this example, it is also possible to combine options. In this case, ls -la is equivalent to ls -l -a. This works with many Linux programs, although it is not guaranteed.

Create a directory

The mkdir command (“Make directory”) will create a directory with the specified name:

$ mkdir test1 
$ ls 
ex1.txt ex2 ex3.dat test1

Rename or copy directory

The mv command (“Move”) will move a file or directory. It is also the usual method for renaming things.

$ ls # Urspruenglicher Inhalt 
ex1.txt ex2 ex3.dat test1 
$
$ mv ex1.txt renamed.txt 
$
$ ls # Geaenderter Inhalt. 
ex2 ex3.dat renamed.txt test1

The command cp (“Copy”) will copy a directory or file. Unlike mv, when a directory including all its contents is to be copied, the option -r (“recursive”) needs to be specified.

Change permissions

The chmod command will change the permission for a directory or file (only if you are allowed to change them, of course). There are several different ways to specify the permissions, the simplest one is:

$ chmod u+x ex1.dat

In this example, a user (u) adds (+) the permission for themselves to execute (x) the file ex1.dat . Alternatives to u are g (group), o(other) and a (all). Possible permissions are r for read, w for write and x for execute. To remove permissions, a minus - is used instead of the plus.

Search for directories and files

The find command will search a folder and all subfolders for files and directories with a specific name. Partial names are also possible.

$ find . -type f -name "ex*" 
./ex2/ex4.txt 
./ex3.dat

In this example, -type f is used to only find files, not directories. All files whose name starts with ex are listed. The find command also has a number of options to narrow down search results. It also can do things like execute specific commands on each file with -exec <command name>.

Search text inside files

The grep command is used to search text in text files. To use it, enter

$ grep [options] "Text" filename

Strictly speaking, the text only needs to be in quotation marks if it contains spaces. Instead of a single file name, wildcards can also be used (e.g. *.txt). Important options are for example -r (searches recursively, i.e. also in subfolders) and -i (ignores capitalization). A more complete list of options can be shown with grep --help.

Scripts: creation and execution

A script in the context of Linux is a file in which a number of commands are listed. As already mentioned, practically all commands that can be entered in the console can also be written into a script, and executing the script is identical to calling the listed commands one by one.

A (shell) script always starts with #!<console name>, e.g. #!/bin/bash. This is the complete path to the executable with which to run the script. This does not necessarily have to be a Linux shell. For example, a script could also start with #!/usr/bin/python and would then be executed like a Python script.

To execute a script it needs to be made executable (see above in the section about file permissions). Then it can, if the absolute or relative path is specified, be executed like any other command.

$ ./example.sh

(Environment) variables

In addition to arguments that are explicitly typed into the script, variables can be used. Variables in Bash serve basically the same purpose as in any other programming language and work in a similar way. The main difference to most languages is the fact that to get a variable’s contents its name needs to be prefixed by $. Variables are defined with var=Value. Note that there may not be a space on either side of the equals sign. Variables in the Linux console are text strings.

In the following example, several operations are being performed on the same file:

# Variable definition
file1="/home/demo_user/exampledir/ex1.txt"

# Output the entire text file to the console.
cat $file1

# Copy the file.
cp $file1 copy_example.txt

This demonstrates the usefulness of variables: if the user wants to apply the same operations to another file, the file name only needs to be changed at one point in the script instead of everywhere.

Environment variables

A variable is only ever available inside the console or script in which it was defined. However there are so-called environment variables that are also available in all sub-processes (processes that were launched by the current process). That way, before a script or program is called, settings can be specified for it. Environment variables are set via export var=Value. A list of all environment variables that are set can be obtained with the printenv command.

In every Linux system, a large number of environment variables are set, either by the system or by the installed software. For example, there is always a variable USER which shows the user name of the person who is logged in.

Command line parameters

In the special case of shell scripts, additional variables are also available. For example, the variables $0$1$2 and so on up to $9 recall the arguments with which the script was started (the command line parameters). If a script is started with:

$ Skript.sh -f 5.0

the variables would be $0=Script.sh$1=-f$2=5.0. That way, you can pass settings directly to the script.

The PATH Variable

The environment variable PATH has a special function: whenever a command is entered into the console, the directories listed in PATH are searched for this command. This also means that a command will not be found if its location is not added to PATH. To add a directory to PATH, you can prepend or append the new directory, separated by a colon:

$ export PATH=$PATH:/home/demo_user/exampledir

The order of directories is important because identically named commands may be in multiple directories on the PATH. The first command found is always executed.

Caution: Mistakes when manipulating PATH may have severe consequences because important commands may not be available any more.

Custom settings

Settings like exported environment variables or loaded modules only exist as long as the current console is open. If you log out from the cluster or the script with the settings has finished running, the settings are lost. It is however possible to make settings permanent. The most important one is the file .bashrc. It is located in your home directory and is called whenever a new Bash console is launched (including upon login). Commands that you enter in that file will then be executed. The .bashrc file is good for permanently saving environment variables or other settings you need often.

You can also put settings into a shell script and make them available with the command source <script name>. Of course you can also put that source command into the .bashrc file.

Caution: Since the .bashrc file is called at every login, a faulty .bashrc can lead to you being unable to log in completely! Always make sure that your settings in .bashrc do not contain mistakes. You can test settings by entering them into the console by hand. If you make a mistake then, you can reset the previous settings by logging out and logging back in.

General advice

  • The Linux console does not have an “Undo” function. You should always look out for spelling errors and often make backups, especially if you work on a Linux system where you have root permissions. It is entirely possible to destroy an entire Linux installation by accident.
  • File extensions are not as important in Linux as they are in Windows, in particular they are not used to determine what type of file it is. It is however recommended to use consistent extensions (a common one is .sh for shell scripts), so a person can see at a glance what the file type is.
  • Every command in Linux is really a program (or script), even the built-in Linux commands. You can see the location of a program executable with which <program name>. This is especially useful if multiple versions of the same software are installed and you want to make sure you are using the correct one.
  • You can define commands as a so-called alias, meaning a shorthand for another command. For example alias myjobs="squeue -u demo_user" will create a command named myjobs which instructs SLURM to list all jobs for demo_user. Like environment variables, aliases need to be in .bashrc to be permanently available.
  • In addition to wildcards there is another way to specify patterns of characters, so-called regular expressions (regex). These allow very complex patterns but are also extremely hard to learn.
  • You can calculate the size of files and directories with du (for “Disk Usage”). Important options are for example -h (for “human-readable”), this will show the size with a unit (e.g. GB for gigabytes). -s will only consider the current directory (otherwise, every file in every subfolder would be listed separately, which may become confusing). Example:
    $ du -sh * 
    5,4M    abaqus 
    8,0K    abaqus_plugins 
    4,0K    abaqus.rpy 
    32K all_users_2018-02-07.txt 
    4,0K    bin 
    4,0K    bsp.f90
  • Like Windows, Linux allows symbolic links. With ls -l you can see what is a link to another file or directory.
    $ ls -l 
    -rw-r--r-- 1 demo_user hpc-gpr-hiwis 56 30. Jul 09:24 ex3.dat 
    lrwxrwxrwx 1 demo_user hpc-gpr-hiwis 7   1. Aug 09:47 ex3_link.dat -> ex3.dat

    You can create your own links with the command ln -s <target file> <link name>.
    The -l option for the ls command is used so often that many Linux systems, including the one on the cluster, offer a command ll which is an alias for ls -l.

  • The cd command without any argument changes to your home directory, cd - changes to the previous directory (the one before the last call of cd).
  • The up-arrow and down-arrow keys can be used to scroll through previously entered console commands. The history command will list previously entered commands. This is especially useful if you do not remember the syntax of a command that you previously entered. You can use history | grep <command name> to find it again. Bash saves this history in the file ~/.bash_history.

Aktualisiert um 17:37 am 12. August 2018 von Jan Philipp Stephan