Performance Engineering

From HPC Wiki
Jump to navigation Jump to search

Introduction

HPC is about high application performance requirements. There exist many options to improve the performance of an application code. In the following it is assumed that a given algorithm is executed on a given HPC system.

The following factors influence the performance:

  • Implementation of the algorithm (programming language, optimisation techniques)
  • Compiler used and compiler options
  • Machine and operating system configuration
  • Runtime setup (pinning and resource allocation)

Generic iterative procedure for performance engineering

The following steps are required for a minimum performance engineering process:

  • Define a relevant test case which reflects production behavior
  • Aquire runtime profile to determine on which parts of the code the processing time is spent
  • For all code parts (hot spots) of the runtime profile perform:
    • Static code analysis
    • Instrumentation based hardware performance counter profiling
    • Application benchmarking (thread and data set scaling)
  • Based on the data aquired by above activities narrow down performance issues
  • Improve performance by changing runtime setup or implementation

Those steps need to be repeated multiple times until a required or good enough performance is reached. After applying an optimisation it must be ensured that the optimised variants are used and taking effect in regular production.

To carry out above procedure multiple special skills beyond standard software engineering are required:

Those skills are documented in separate articles and will be expected in the following.

Strategies for performance analysis

After definition of a benchmark case, application benchmarking and performance profiling the interpretation and analysis of the results is the first difficult task in any performance engineering effort. While there is no silver bullet for performance analysis multiple strategies provide guidelines for different levels of expertise. It must be noted that in complicated cases the software developer carrying out the process must possess a certain level of experience to succeed. Therefore it is recommended to consult an experienced HPC consultant in the local HPC center if no progress is achieved using the simpler approaches.

Three approaches are described in more detail:

  • ProPE PE Process: Threshold based performance analysis process based on the proven EU COE [POP project] approach for a rough initial performance analysis suited also for beginners
  • Performance Patterns: Performance pattern based process for more complicated cases targeted at experienced software developers
  • Instruction Counting: An instruction count based approach applicable for the special case of instruction throughput limited codes on SIMD architecture