Performance profiling
Introduction
In this context performance profiling is to relate performance metric measurements to source code execution. The data source are typically either the operating system, a runtime system or measurements facilities in the hardware. Every modern processor has support for so called hardware performance monitoring (HPM) units that allow to measure events or metrics. This article focuses on HPM related performance metrics. To use HPM units dedicated profiling tools must be used. HPM metrics allow to get a very detailed view on software-hardware interaction and introduce only small or no overhead. Every serious performance engineering effort should use a HPM tool for profiling. A very good overview about HPM capabilities of many X86 processor architectures can be found in the Likwid Wiki.
HPM units consist of programmable counters in different parts of the chip. Every processor on Intel processors has at least 4 general purpose counters plus many more counters in different parts of the Uncore.
There are two basic ways to use HPM units:
- End-to-end measurements: A counter is programmed and started. It measures everything executed on its part of the hardware. The counter can be read while running or after being stopped. The advantage is that no overhead is introduced during the measurement. The measurement is very accurate but only averages for regions of code can be measured. To measure regions usually an instrumentation API must be used and the code must be pinned to specific processors. Also only one fixed event set can be measured per run. The Likwid tool likwid-perfctr is based on this approach.
- Sampling based measurements: Events are related to source code by statistical sampling. Counters are configured and started and when they exceed some value an interrupt is triggered reading out the program counter. This information is stored and later analysed. Sampling based tools introduce overhead by triggering interrupts and additional book keeping during the measurement. There is also the possibility of measurement errors since the result is based on statistical evaluation. Advantages are that a code does not need to be pinned nor instrumented. The complete application can be measured and analyzed in one run. Also measuring multiple events is no problem. Most advanced tools employ sampling. Sampling is accessible using the Linux Perf interface.