HPC is about high application performance requirements. There exist many options to improve the performance of an application code. In the following it is assumed that a given algorithm is executed on a given HPC system.
The following factors influence the performance:
- Implementation of the algorithm (programming language, optimisation
- Compiler used and compiler options
- Machine and operating system configuration
- Runtime setup (pinning and resource allocation)
Generic iterative procedure for performance engineering
The following steps are required for a minimum performance engineering process:
- Define a relevant test case which reflects production behavior
- Aquire runtime profile to determine on which parts of the code the processing
time is spent
- For all code parts (hot spots) of the runtime profile perform:
* Static code analysis * Instrumentation based hardware performance counter profiling * Application benchmarking (thread and data set scaling)
- Based on the data aquired by above activities narrow down performance issues
- Improve performance by changing runtime setup or implementation
Those steps need to be repeated multiple times until a required or good enough performance is reached. After an optimisation steps must be taken that the optimised variants are used and taking effect in regular production.
To carry out above procedure multiple special skills beyond standard software engineering are required:
- Perform application benchmarking
- Create a runtime profile
- Create a performance profile
Those skills will are are documented in separate articles and will be assumed in the following.
Strategies for performance analysis
After definition of a benchmark case, application benchmarking and performance profiling the interpretation and analysis of the results is the first difficult task in any performance engineering effort. While there is no silver bullet for performance analysis multiple strategies provide guidelines for different levels of expertise. It must be noted that in complicated cases the software developer carrying out the process must possess a certain level of experience to succeed. Therefore it is recommended to consult an experienced HPC consultant in the local HPC center if no progress is achieved using the simpler approaches.
Three approaches are described in more detail:
- Threshold based performance analysis process based on the proven EU COE POP
project approach for a rough initial performance analysis suited also for beginners
- Performance pattern based process for more complicated cases targeted at
experienced software developers
- A instruction count based approach applicable for the special case of
instruction based codes on SIMD architecture