The Linaro Forge suite offers a set of tools, available both graphically and via the command line, tailored for debugging and profiling both parallel and sequential programs. It supports many parallel architectures and models, including MPI, CUDA, and OpenMP. It consists of the following tools:

  1. Linaro DDT: This tool specializes in parallel high-performance application debugging.
  2. Linaro MAP: This tool is designed for performance profiling and optimization guidance.
  3. Linaro Performance Reports: This tool serves for summarizing and characterizing both scalar and MPI application performace.

The documentation is available online.

On the OMNI cluster, Linaro Forge version 23.1.1 is installed. In order to use this debugger, load the debugger/linaro-forge module using the following command:

$ module load debugger/linaro_forge

The userguide is also available in PDF form inside the installation directory, specifically in the following folder:

/cm/shared/omni/apps/linaro_forge/23.1.1/doc/userguide-forge.pdf

Linaro DDT

Linaro DDT is a powerful graphical debugger suitable for many different development environments. This tool supports:

  • C, C++, and all derivatives of Fortran, including Fortran 90.
  • Limited support for Python. For more information, see Python debugging.
  • Parallel languages/models including MPI, UPC, and Fortran 2008 Co-arrays.
  • GPU languages such as HMPP, OpenMP Accelerators, CUDA, CUDA Fortran and HIP.

The following steps show the process to use this tool to debug an application.

  1. First, load the corresponding module using the following command:

    $ module load debugger/linaro_forge
  2. Start the Linaro DDT as a graphical application in a new window using the following command :

    $ ddt &

    This opens the following window where Linaro DDT is selected by default.

    Linaro Forge Window

    Note: Like all other graphical applications on the cluster, your SSH connection needs to be established with X11 support, and an X server needs to be running on your PC; see also here.

  3. To run and debug a program, select the RUN option. This opens a new window where the application you want to debug should be selected. Depending upon the type of application, choose the correct options like OpenMP, MPI, CUDA, etc. Then click on the Run button, which opens the DDT interface.

    Linaro DDT New Application Interface

    Note: When compiling the program, add a debug flag to your compile command. For most compilers, this is -g. It is recommended to turn off compiler optimizations as they can produce unexpected results when debugging.

  4. The DDT interface opens a source code viewer where you can set breakpoints, control program execution, trace stacks, etc., for debugging the application. For more details, visit the online documentation.

    Linaro DDT Interface

  5. To save the session, select File ‣ Save Session. This exports HTML and text files, which can be viewed using the following command:

    $ firefox name.html

    In this command, replace the name.html with your own filename. This opens a firefox window where, the overview of the application can be viewed.

    Linaro DDT HTML Output

  6. To end your current session, select File ‣ End Session.

Using command line

To run DDT without GUI, use following command:

  $ ddt --offline ./executable_name.exe

Use ddt --help to learn more about the command. The output is generated as HTML file.

Example:

(base) [g056127]$ ddt --offline --openmp-threads=8 ./dotprod
Linaro Forge 23.1.1 - Linaro DDT

Debugging                : /home/g056127/openmp/dotprod
MPI implementation       : Auto-Detect (Open MPI)
Memory debugging enabled : No

Offline log written to: 'dotprod_1p_1n_4t_2024-02-15_19-51.html'

Linaro MAP

MAP is a source-level profiler that shows how much time was spent on each line of code. It shows the longest-running lines of code and explains why.

The following steps show the process to use this tool to debug an application.

  1. First, load the corresponding module using following command:

    $ module load debugger/linaro_forge/23.1.1 
  2. Start the Linaro MAP as graphical application in a new window using following command :

    $ map &

    This opens the following window where Linaro MAP is selected by default.

    Linaro MAP Window

    Note: Like all other graphical applications on the cluster, your SSH connection needs to be established with X11 support, and an X server needs to be running on your PC; see also here.

  3. To profile a program, select the PROFILE option. This opens an interface to select the program and configure the operation. Depending upon the type of application, choose the correct options like OpenMP, MPI, CUDA, etc. Then click on the Run button, which opens the MAP interface.

    Linaro MAP New Application Interface

  4. The MAP interface displays the source code, application activity, and metrics like memory usage, CPU usage, etc. It shows the memory and CPU usage for each line of the code.

    Linaro MAP Interface

  5. You can save the MAP profile from File ‣ Save Profile Data as. The extension of the file will be .map. The profile can also be exported as JSON from File ‣ Export Profile Data as JSON. The map profile can be re-viewed by using the second option LOAD PROFILE DATA FILE, from the Linaro MAP home interface.

Using command line

To run MAP without GUI, use following command:

$ map --profile ./executable_name.exe

Use map --help to learn more about the command. This command generates MAP output file with the extension .map.

Example:

(base) [g056127]$ map --profile --openmp-threads=8 ./dotprod
Linaro Forge 23.1.1 - Linaro MAP

Profiling            : /home/g056127/openmp/dotprod
Linaro Forge sampler : preload
MPI implementation   : Auto-Detect (Open MPI)

MAP analysing program...
MAP gathering samples...
MAP generated /home/g056127/openmp/dotprod_1p_1n_8t_2024-02-15_19-57.map

Linaro MAP profiling summary
============================
Profiling time:      4 seconds
Peak process memory: 8042729472 B (~7.49 GiB)

Compute:             100.0%     (3.8s) |=========|
MPI:                   0.0%     (0.0s) |
I/O:                   0.0%     (0.0s) |
  (based on time on the main thread)

Linaro Performance Reports

Linaro Performance Reports provides the most effective way to characterize and understand the performance of HPC application runs.

One single page HTML report answers a range of vital questions for any HPC site: – Is this application optimized for the system it is running on? – Does it benefit from running at this scale? – Are there I/O or networking bottlenecks affecting performance? – Which hardware, software, or configuration changes can be made to improve performance further?

There are three different methods to generate the performance report, which are described below. You can use any one of the methods.

  1. Using executable file: To generate the performance report, use the following command:

    $ perf-report ./executable_name.exe

    Replace the executable_name.exe with your program. This command generates the two files with .html and .txt extensions, which can be viewed using the browser.

  2. Using previously generated MAP output file: If you have already generated the .map output file using the MAP tool, then you can use the following command to generate the performance report.

    $ perf-report profile.map

    Replace the profile.map with your MAP output file. This command generates the two files with .html and .txt extensions which can be viewed using browser.

  3. Using MAP interface: You can also view or export performance report from Reports menu available in the menu bar of the MAP interface as shown below:

    Reports Options in MAP Interface

    For more details, visit the online documentation.

Sample bash script

The following is an example of the jobscript, which can be used to offline debug, profile, and generate the performance report while queuing your job to the HPC nodes.

#!/bin/bash
#SBATCH --job-name=profiler         # name for your job
#SBATCH --partition=short           # partition to run in
#SBATCH --ntasks=1                  # total number of tasks across all nodes
#SBATCH --ntasks-per-node=16        # total number of tasks across all nodes<
#SBATCH --time=00:30:00             # total run time limit (HH:MM:SS)
#SBATCH --output=omni_%x_%j.out     # where to save the output ( %j = JobID, %x = scriptname)
#SBATCH --error=omni_%x_%j.err     # where to save error messages ( %j = JobID
#SBATCH --mem=75G                  # 75 GB RAM 

# Purge modules to get a pristine environment:
module purge

# Load default module
module load DefaultModules

# Load Linaro Forge
module load debugger/linaro_forge

# Command to offline debug
# replace `executable_name.exe` with executable program
ddt --offline ./executable_name.exe

# command to generate map profile
# replace `executable_name.exe` with executable program
# replace output file name 'profile'
# this generates a 'profile.map' file which can be viewed through Linaro MAP tool.

map --profile --output=profile ./executable_name.exe

# command to generate performance report
perf-report ./executable_name.exe

We highly recommend reading the online documentation provided by Linaro Limited.

Aktualisiert um 22:29 am 15. February 2024 von Amir Thapa Magar