Micro benchmarking

From HPC Wiki
Jump to navigation Jump to search

Microbenchmarking is about measuring the time or performance of small to very small building blocks of real programs. This can be a common data access pattern, a sequence of operations or even a single instruction.

Introduction

Microbenchmarking is an indispensable tool in performance engineering, which fulfills many purposes. Among other things it :

  • provides upper performance limits for sustained performance
  • creates knowledge about performance behavior
  • helps finding performance bugs in architectures
  • provides undocumented processor performance properties
  • quantifies the cost of programming model constructs or runtime environments
  • provides input for performance models
  • helps to learn how software interacts with the hardware

One important feature of microbenchmarking is that it is not a black box but a tool to create knowledge and deeper understanding.

Recommended Tools

The difficulty in microbenchmarking is to really measure what you are interested in. Because the things you want to measure are usually very small correct timing is a problem. Also separation of influences may be difficult to guarantee. If, e.g., implementing a microbenchmark in a programming language one must assure that the language does not add overhead that influences the results. Therefore it is usually recommended to use available benchmarks or tools which make it easier to produce meaningful results.

STREAM benchmark

The STREAM benchmark is the industry standard for measuring node-level sustained memory bandwidth. It is a very simple single file implementation of simple streaming loop kernels and should reach peak memory bandwidth on any architecture. Threading is implemented using OpenMP. For meaningful results one has to employ thread affinity control. Measuring main memory bandwidth is the sole purpose of this benchmark.

likwid-bench

likwid-bench is a benchmarking application and a framework to enable rapid prototyping of multi-threaded assembly kernels. Adding a new benchmark amounts to creating a simple text file and recompiling. The framework takes care of threaded execution and pinning, data allocation and placement, time measurement and result presentation. likwid-bench comes with a large collection of architecture specific optimized kernels for various SIMD instruction set extensions. At the moment it is only available for X86 processors on the Linux OS (Arm and Power 9 are in beta).

One main advantage of likwid-bench is that kernels are implemented directly in assembly language ruling out any influence of upper abstraction layers. This allows to accurately measure processor performance properties. likwid-bench can be used for all kinds of bandwidth and instruction throughput measurements. The fine grained control about thread and data placement also allows to measure on-board interconnect bandwidth.

The Bandwidth Benchmark

The Bandwidth Benchmark is a new project with the main focus on providing a teaching benchmark application that also can be the base for own developments. It is heavily inspired by John McCalpin's https://www.cs.virginia.edu/stream/ benchmark. In contrast to STREAM has the added benefit that the code is a blueprint for a minimal benchmark application with a generic Makefile and modules for aligned array allocation, accurate timing and affinity settings. Those components can be used standalone in a own benchmark project. The benchmark is as STREAM suited to measure sustained memory bandwidth but comes with more streaming loop kernel providing many basic data access patterns.

EPCC OpenMP micro-benchmark suite

Intel MPI Benchmarks

DGEMM (Linpack) benchmark

IOR Parallel filesystem I/O benchmark

Links and further information