Difference between revisions of "Perf"

From HPC Wiki
Jump to navigation Jump to search
()
Line 79: Line 79:
 
== Event-based Sampling with <code>perf record</code> ==
 
== Event-based Sampling with <code>perf record</code> ==
  
 +
With <code>perf record</code>, you can set-up an event based sampling. The samples will be written to a log file, which is typically later processed with <code>perf report</code> to create a profile.
 +
(TODO)
  
 
=== Recording ===
 
=== Recording ===

Revision as of 09:09, 23 September 2019

Overview and Installation

The Linux Perf tool provides a variety of possibilities to measure, monitor, and present performance data. It builds on top of the Linux perf_event_open system call [1] provided since 2.6.32.

To install Perf, use the linux-tools-common package on Debian based systems and perf on SuSE.

Some of the features might need special permissions to be granted to users. This can be done by tweaking the pseudo file `/proc/sys/kernel/perf_event_paranoid`. According to the perf help text (Linux 4.18) the contents of this file can have the following properties with the respective meanings.

  • -1 : Allow use of (almost) all events by all users. Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
  • >=0 : Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN. Disallow raw tracepoint access by users without CAP_SYS_ADMIN
  • >= 1 : Disallow CPU event access by users without CAP_SYS_ADMIN
  • >= 2 : Disallow kernel profiling by users without CAP_SYS_ADMIN

Introduction

In this section, some basics are described.

Hardware Performance Monitoring Counters (PMCs)

Hardware PMCs are small extensions of processors, which usually consist of a set of at least two registers. In one register, the software (operating system) specifies a processor internal event that shall be counted and provides measures to control start and stop the measurement and other control features. The second register increments each time the event occurs. In addition, the operating system can set-up a threshold after which the PMCs will generate an interrupt. For example, an interrupt shall be generated for each 1 million instructions, or 300,000 cache misses. This will be used for multiple purposes.

Available Events

Use the perf list command to list available events. These can be distinguished in multiple types. (The availability of the events depends on the kernel version and the permissions.


Hardware Events

A small list of Hardware Performance Monitoring Counter events. This set was defined as Architectural performance events (Intel Manual 3, Section 19.1 [2] ) by Intel at some point. While these seem to be more-or-less valid on Intel platforms, they are not necessarily reliable on AMD. For example, some Linux versions counted Level-1-Instruction-Cache-Misses as cache-misses, while Intel usually uses them for Last-Level cache misses. Use them carefully, since the Linux developers (mostly those of the processor vendor) will specify the underlying events.

Software Events

Software events are not counted via Hardware PMCs, but are generated, monitored, and handled by the operating system.

Hardware Cache Events

These events relate to processor events for different caches, TLBs, and branch prediction. As for Hardware Events, these have to be specified by kernel developers. Some of the events might not be available on the system. For example if there is no Hardware PMC event theat relates to a given Hardware Cache Event.

Kernel PMU Events

In addition to Hardware PMCs, which are provided per hardware thread, other components of the processor (or devices) can provide their own performance monitoring counters. This can relate to incremental registers, like RAPL, TSC, MPERF, APERF, uncore components, or iGPUs.

Raw Events

You can specify hardware PMC events also by their actual ID. Refer to the processor manual to find the ID for the event that you want to monitor. Usually, the Umask is defined in Bits 8-15 and the event is specified in Bits 0-7. For the event `LD_BLOCKS.STORE_FORWARD` on 4th Generation Intel Core Processors, the umask and event are 0x02 and 0x03, respectively (Intel Manual 3, Table 19-7 [3] ). Hence, the raw event encoding would be r302.


Tracepoint Events

In addition to increasing counters, the kernel is instrumented, whioch provides you with the possibility to grap any specified event within the kernel. Most of these also provide access to some arguments, which are highly event-specific. For a definition of these events, check your tracefs mountpoint, usually under /sys/kernel/debug/tracing.

Measuring Events with perf stat

Use the perf stat command to measure available events. This will set-up the hardware and software counters and either collect the information for the applied process or the CPU(s) (meaning hardware threads) that are requested. perf stat provides various command line arguments (see perf-stat man page [4] ). Some of the important ones are:

  • -d / --detailed collects more events, can be provided multiple times
  • -I / --interval-print <ms> provides measurement every ms milliseconds
  • -e / --event= specify the event(s) to be measured
  • -x / --field-separator SEP will print statistics CSV like, SEP will be used as separator.
  • `-C / --cpu=<cpu-list> will measure the events on the list of given CPUS
  • -A / --no-aggr Do not aggregate counts across all monitored CPUs
  • -a / --all-cpus Monitor all CPUs
  • -o / --output <file> Specifies the output file, default: `perf.data`

Examples

perf stat make -j Will provide a general overview on how well make -j performed.

perf stat -d -d make -j Will provide a more detailed overview on how well make -j performed.

perf stat -a -I 1000 Will provide statistics for the whole system every second.

perf stat -e instructions -I 1000 -x , -o stat.csv Will provide instructions statistics for the whole system every second and save it in stat.csv. Can be used to monitor IPS over time.

Overprovision of Events

If you specify more events than can be counted on the hardware, the operating system will measure them in time slices where for each time slice a different event is chosen. This information is provided to the user. Watch out for percentage signs in braces after the results.

Child-processes

When monitoring a process, all child processes and threads will also be monitored. This can be avoided by providing the -i, --no-inherit flag.

Event-based Sampling with perf record

With perf record, you can set-up an event based sampling. The samples will be written to a log file, which is typically later processed with perf report to create a profile. (TODO)

Recording

Profile

Trace

Function Top