ARMPerfReports

From HPC Wiki
Revision as of 10:17, 22 August 2019 by Fabian-orland-a681@rwth-aachen.de (talk | contribs) (Created page with "Arm Performance Reports is a tool to characterize and understand the performance of both scalar and MPI applications. Results are provided in a single page HTML file. Those re...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Arm Performance Reports is a tool to characterize and understand the performance of both scalar and MPI applications. Results are provided in a single page HTML file. Those results can be used to identify performance affecting problems such as optimization and scalability issues as well as I/O or network bottlenecks. A huge advantage of the tool is its low overhead. It uses Arm MAP's adaptive sampling technology which results in an overhead of 5% even for large scale applications with thousands of MPI processes.

Supported Platforms

Supported hardware architectures are:

  • Intel and AMD (x86_64)
  • Armv8-A (AArch64)
  • Intel Xeon Phi (KNL)
  • IBM Power (ppc64 and ppc64le)

Moreover the following MPI implementations are supported:

  • Open MPI
  • MPICH
  • MVAPICH
  • Intel MPI
  • Cray MPT
  • SGI MPI
  • HPE MPI
  • IBM Platform MPI
  • Bullx MPI
  • Spectrum MPI

Also lots of different compilers are supported including:

  • GNU C/C++/Fortran
  • LLVM Clang
  • Intel Parallel Studio XE
  • PGI Compiler
  • Arm C/C++/Fortran Compiler
  • Cray Compiling Environment
  • NVIDIA CUDA Compiler
  • IBM XL C/C++/Fortran Compiler

On Intel and AMD (x86_64) architectures Nvidia CUDA applications are also supported. Detailed information about specific version numbers of the supported platforms can be found here

Generating a performance report

In order to generate a performance report just wrap the provided perf-report command around your normal (MPI) program startup like in the following example:

$ perf-report mpiexec <mpi-options> a.out

Arm Performance Reports will then generate and link the appropriate wrapper libraries before the program starts. At the end of the program run a performance report is created and saved to your current working directory in text as well as HTML format.

Examining a performance report

The basic structure of the performance report is always the same. So that different reports can easily be compared with each other. In the following the different sections of the performance report are explained.

Report summary

In the report summary the whole wallclock time spent by the program is divided into three parts:

  • Compute - time spent running application code
  • MPI - time spent in MPI calls
  • I/O - time spent in filesystem I/O