Difference between revisions of "Performance Engineering"

From HPC Wiki
Jump to navigation Jump to search
Line 7: Line 7:
 
The following factors influence the performance:
 
The following factors influence the performance:
  
* Implementation of the algorithm (programming language, optimisation techniques)
+
* Implementation of the algorithm (programming language, optimisation techniques)
* Compiler used and compiler options
+
* Compiler used and compiler options
* Machine and operating system configuration
+
* Machine and operating system configuration
* Runtime setup (pinning and resource allocation)
+
* Runtime setup (pinning and resource allocation)
  
 
== Generic iterative procedure for performance engineering ==
 
== Generic iterative procedure for performance engineering ==
Line 16: Line 16:
 
The following steps are required for a minimum performance engineering process:
 
The following steps are required for a minimum performance engineering process:
  
* Define a relevant test case which reflects production behavior
+
* Define a relevant test case which reflects production behavior
* Aquire runtime profile to determine on which parts of the code the processing
+
* Aquire runtime profile to determine on which parts of the code the processing time is spent
  time is spent
+
* For all code parts (hot spots) of the runtime profile perform:
* For all code parts (hot spots) of the runtime profile perform:
 
 
     * Static code analysis
 
     * Static code analysis
 
     * Instrumentation based hardware performance counter profiling
 
     * Instrumentation based hardware performance counter profiling
 
     * Application benchmarking (thread and data set scaling)
 
     * Application benchmarking (thread and data set scaling)
* Based on the data aquired by above activities narrow down performance issues
+
* Based on the data aquired by above activities narrow down performance issues
* Improve performance by changing runtime setup or implementation
+
* Improve performance by changing runtime setup or implementation
  
 
Those steps need to be repeated multiple times until a required or good enough
 
Those steps need to be repeated multiple times until a required or good enough
Line 33: Line 32:
 
engineering are required:
 
engineering are required:
  
* Perform application benchmarking
+
* Perform application benchmarking
* Create a runtime profile
+
* Create a runtime profile
* Create a performance profile
+
* Create a performance profile
  
 
Those skills will are are documented in separate articles and will be assumed
 
Those skills will are are documented in separate articles and will be assumed
Line 54: Line 53:
 
Three approaches are described in more detail:
 
Three approaches are described in more detail:
  
* Threshold based performance analysis process based on the proven EU COE POP
+
* Threshold based performance analysis process based on the proven EU COE POP
 
   project approach for a rough initial performance analysis suited also for
 
   project approach for a rough initial performance analysis suited also for
 
   beginners
 
   beginners
* Performance pattern based process for more complicated cases targeted at
+
* Performance pattern based process for more complicated cases targeted at
 
   experienced software developers
 
   experienced software developers
* A instruction count based approach applicable for the special case of
+
* A instruction count based approach applicable for the special case of
 
   instruction based codes on SIMD architecture
 
   instruction based codes on SIMD architecture

Revision as of 09:46, 16 January 2019

Introduction

HPC is about high application performance requirements. There exist many options to improve the performance of an application code. In the following it is assumed that a given algorithm is executed on a given HPC system.

The following factors influence the performance:

* Implementation of the algorithm (programming language, optimisation techniques)
* Compiler used and compiler options
* Machine and operating system configuration
* Runtime setup (pinning and resource allocation)

Generic iterative procedure for performance engineering

The following steps are required for a minimum performance engineering process:

* Define a relevant test case which reflects production behavior
* Aquire runtime profile to determine on which parts of the code the processing time is spent
* For all code parts (hot spots) of the runtime profile perform:
   * Static code analysis
   * Instrumentation based hardware performance counter profiling
   * Application benchmarking (thread and data set scaling)
* Based on the data aquired by above activities narrow down performance issues
* Improve performance by changing runtime setup or implementation

Those steps need to be repeated multiple times until a required or good enough performance is reached. After an optimisation steps must be taken that the optimised variants are used and taking effect in regular production.

To carry out above procedure multiple special skills beyond standard software engineering are required:

* Perform application benchmarking
* Create a runtime profile
* Create a performance profile

Those skills will are are documented in separate articles and will be assumed in the following.

Strategies for performance analysis

After definition of a benchmark case, application benchmarking and performance profiling the interpretation and analysis of the results is the first difficult task in any performance engineering effort. While there is no silver bullet for performance analysis multiple strategies provide guidelines for different levels of expertise. It must be noted that in complicated cases the software developer carrying out the process must possess a certain level of experience to succeed. Therefore it is recommended to consult an experienced HPC consultant in the local HPC center if no progress is achieved using the simpler approaches.

Three approaches are described in more detail:

* Threshold based performance analysis process based on the proven EU COE POP
 project approach for a rough initial performance analysis suited also for
 beginners
* Performance pattern based process for more complicated cases targeted at
 experienced software developers
* A instruction count based approach applicable for the special case of
 instruction based codes on SIMD architecture