Difference between revisions of "Application benchmarking"

From HPC Wiki
Jump to navigation Jump to search
()
()
Line 14: Line 14:
 
== Timing ==
 
== Timing ==
  
For benchmarking an accurate so called wallclock timer (end to end stop watch) is required. Every timer has a minimal time resolution that can be measured. Therefore if the code region to be measured is running shorter the measurement must be extended until it reaches a time duration that can be resolved by the timer used.
+
For benchmarking an accurate so called wallclock timer (end to end stop watch) is required. Every timer has a minimal time resolution that can be measured. Therefore if the code region to be measured is running shorter the measurement must be extended until it reaches a time duration that can be resolved by the timer used. There are OS specific routine (POSIX and Windows) and programming model or programming language specific solution available. The latter have the advantage to be portable across operating systems. In any case one has to read the documentation of the implementation used to ensure the exact properties of the routine used.
  
 
Recommended timing routines are
 
Recommended timing routines are
Line 22: Line 22:
 
* Timing in instrumented Likwid regions based on cycle counters for very short measurements
 
* Timing in instrumented Likwid regions based on cycle counters for very short measurements
  
While there exist programming language specific solutions it is recommended to use above solutions.
+
While there exist also programming language specific solutions (e.g. in C++ and Fortran) it is recommended to use the OS solution. In case of Fortran this requires to provide a wrapper function to the C call (see example below).
  
 
=== Examples ===
 
=== Examples ===
  
 
==== Calling clock_gettime ====
 
==== Calling clock_gettime ====
  #include <time.h>
+
   
static double TimeSpecToSeconds(struct timespec* ts)
 
{
 
    return (double)ts->tv_sec + (double)ts->tv_nsec / 1000000000.0;
 
}
 
 
 
struct timespec start;
 
struct timespec end;
 
double elapsedSeconds;
 
if(clock_gettime(CLOCK_MONOTONIC, &start))
 
    { /* handle error */ }
 
    /* Do stuff */
 
if(clock_gettime(CLOCK_MONOTONIC, &end))
 
{ /* handle error */ }
 
elapsedSeconds = TimeSpecToSeconds(&end) - TimeSpecToSeconds(&start);
 
  
 
==== Fortran example ====
 
==== Fortran example ====

Revision as of 13:30, 17 January 2019

Overview

Application benchmarking is an elementary skill for any performance engineering effort. Because it is the base for any other acitivity it is crucial to measure result in an accurate, deterministic and reproducible way. The following components are required for meaningful application benchmarking:

  • Timing: How to accuratly measure time in software.
  • Documentation: Because there are many influences it is essential to document all possible performance relvant influences.
  • System configuration: Modern systems allow to adjust many performance relevant settings as clock speed, memory settings, cache organisation as well as OS settings.
  • Resource allocation and affinity control: What resources are used and how is worked mapped on resources.

Because so many things can go wrong while benchmarking it is imporatant to have a sceptical attitude against good results. Especially for very good results one has to check if the result is reasonable. Further results must be deterministic and reproducable, if required statistic distribution over multiple runs has to be documented.

In the following all examples use the Likwid Performance Tools for tool support.

Timing

For benchmarking an accurate so called wallclock timer (end to end stop watch) is required. Every timer has a minimal time resolution that can be measured. Therefore if the code region to be measured is running shorter the measurement must be extended until it reaches a time duration that can be resolved by the timer used. There are OS specific routine (POSIX and Windows) and programming model or programming language specific solution available. The latter have the advantage to be portable across operating systems. In any case one has to read the documentation of the implementation used to ensure the exact properties of the routine used.

Recommended timing routines are

  • clock_gettime(), POSIX compliant timing function (man page) which is recommended as a replacement to the widespread gettimeofday()
  • MPI_Wtime and omp_get_wtime, standardized programming model specific timing routine for MPI and OpenMP
  • Timing in instrumented Likwid regions based on cycle counters for very short measurements

While there exist also programming language specific solutions (e.g. in C++ and Fortran) it is recommended to use the OS solution. In case of Fortran this requires to provide a wrapper function to the C call (see example below).

Examples

Calling clock_gettime

Fortran example

Documentation

Without a proper documentation of code generation, system state and runtime modalities it can be difficult to reproduce performance results. Best practice is to automate the automatic logging of build settings, system state and runtime settings using automated benchmark scripts. Still too much automatiuon might also result in errors or hinder a fast workflow due to inflexibilities in benchmarking or intransparency what actually happens. Therefore it is recommended to also execute steps by hand in addition to automated benchmark execution.

System configuration

Affinity control