Jupyter is a system of tools for working interactively with data and source code. It is used very often in the field of Data Science but has many more uses.
You can find the Jupyter documentation here.
Overview of Jupyter
Jupyter consists of several components. The most important ones are:
- Jupyter Notebooks: a file format ( with the ending
.ipynb
) which consists of cells that may contain code in various languages, text in the Markdown format and other elements. Jupyter Notebook is also the name of the default user interface for editing files of this type. The Jupyter Notebook interface is completely web-based, meaning it runs in your browser. Individual cells of the notebook can be run and the user-provided input (e.g. Python source code) in the cell is then executed. Below the cell, its output will then appear. The output can be a variety of different things depending on the cell. - Jupyter kernels: the programs that run the code inside the code cells. Typically, each notebook has exactly one kernel assigned to it. Depending on which lanuage(s) and other features are needed inside the notebook, the corresponding kernel needs to be picked. A very common kernel is for example the Python kernel, which runs code in Python cells. Jupyter kernels may run on the same machine as the notebook interface or on another machine (e.g. the compute nodes of a cluster). In the latter case, a JupyterHub server needs to be running.
- JupyterLab: an Integrated Development Environment (IDE) which also serves to edit, run and manage Jupyter notebooks. JupyterLab contains more features than the Notebook interface, but it is also completely web-based. On the OMNI cluster, the JupyterLab interface will be made the default interface.
- JupyterHub: the server component of Jupyter. It allows multiple users to start Jupyter kernels on the same computer.
- Jupter Enterprise Gateway: While the notebooks are edited on the front end (i.e. on the login nodes in the case of a cluster), the execution of the kernels and the actual computations are usually done on a different computer (in the case of a cluster, this is a Slurm job running on a compute node). Jupyter Enterprise Gateway handles connections between those two.
All these components will be available on the OMNI cluster.
Working with Jupyter on OMNI
You can reach the Jupyter portal from the Uni network or Uni VPN by entering the address in your browser (you can find the address in your welcome e-mail).
You can log in normally using your username and password, provided you have access to our systems.
You will then be redirected to the so-called Control Panel of JupyterHub. Usually you will have to start a new JupyterHub server by clicking the “Start My Server” button. Once your server is running, you will get to the JupyterLab interface where you can start editing notebooks.
On the JupyterLab interface you can see your files on the left (by default this is your home directory on the cluster) and a Linux console on the right. This is essentially the same console that you would see if you logged into the cluster via SSH. Here you can do things like for example create workspaces if needed.
Notebooks and kernels on the cluster
When you want to work with notebooks, you either need to open an existing notebook by double-clicking it on the left, or create a new notebook via “File” -> “New”-> “Notebook”.
When you create a new notebook, you need to select a kernel. For an existing notebook you can see in the upper right corner which kernel it uses.
Caution: Only kernels that have “via SLURM” in their name actually run on the compute nodes! You should launch kernels on the compute nodes whenever possible so you do not slow down the front end for everyone. We reserve the right to kill processes that take up too much compute power on the login nodes without prior warning.
When you start a kernel that uses SLURM, a SLURM job is launched for you automatically. In the worst case, this might mean waiting times. You can display and manage these jobs just like any normal SLURM job with the corresponding SLURM commands (e.g. squeue
).
Caution: The SLURM jobs that contain your kernel have a time limit just like any SLURM job. When the time runs out, the kernel will stop. The notebook will stay open however and you can continue to work with it. You can simply restart the kernel via the corresponding button in JuypterLab. By default, the kernel “Python 3.7 via SLURM” will be launched in the short queue, meaning your time limit will be one hour.