ARMPerfReports

Arm Performance Reports is a tool to characterize and understand the performance of both scalar and MPI applications. Results are provided in a single page HTML file. Those results can be used to identify performance affecting problems such as optimization and scalability issues as well as I/O or network bottlenecks. A huge advantage of the tool is its low overhead. It uses Arm MAP's adaptive sampling technology which results in an overhead of 5% even for large scale applications with thousands of MPI processes.

Supported Platforms

Supported hardware architectures are:

Intel and AMD (x86_64)
Armv8-A (AArch64)
Intel Xeon Phi (KNL)
IBM Power (ppc64 and ppc64le)

Moreover the following MPI implementations are supported:

Open MPI
MPICH
MVAPICH
Intel MPI
Cray MPT
SGI MPI
HPE MPI
IBM Platform MPI
Bullx MPI
Spectrum MPI

Also lots of different compilers are supported including:

GNU C/C++/Fortran
LLVM Clang
Intel Parallel Studio XE
PGI Compiler
Arm C/C++/Fortran Compiler
Cray Compiling Environment
NVIDIA CUDA Compiler
IBM XL C/C++/Fortran Compiler

On Intel and AMD (x86_64) architectures Nvidia CUDA applications are also supported. Detailed information about specific version numbers of the supported platforms can be found here

Generating a performance report

In order to generate a performance report just wrap the provided perf-report command around your normal (MPI) program startup like in the following example:

$ perf-report mpiexec <mpi-options> a.out

Arm Performance Reports will then generate and link the appropriate wrapper libraries before the program starts. At the end of the program run a performance report is created and saved to your current working directory in text as well as HTML format.

Examining a performance report

The basic structure of the performance report is always the same. So that different reports can easily be compared with each other. In the following the different sections of the performance report are explained.

Report summary

In the report summary the whole wallclock time spent by the program is divided into three parts:

Compute - time spent running application code
MPI - time spent in MPI calls
I/O - time spent in filesystem I/O

ARMPerfReports

Contents

Supported Platforms

Generating a performance report

Examining a performance report

Report summary

Navigation menu

Search