Load Imbalance

Description

The pattern "Load Imbalance" describes a common problem in parallelized applications. It describes the problem when work is not equally distributed over all processing units and consequently some unit(s) do more work than others. This commonly results in wait time for the processing units being faster (less work) until the slower ones (more work) finished their task at a synchronization point.

Symptoms

Saturating/sub-linear speedup

Detection

The detection mechanisms depend on the definition of 'work' for the application. If floating-point calculations are the smallest task of processing, you can use hardware performance monitoring tools:

LIKWID with performance groups FLOPS_DP and FLOPS_SP
PAPI with papi_mflops() or PAPI_SP_OPS and PAPI_DP_OPS events
perf offers fp_arith_inst_retired.* events

If other operations are your smallest task and there are no hardware performance events available to count them, use measurements near to the processing units which regards data transfers, the inputs for your work.

LIKWID with performance groups DATA and L1
PAPI and perf also provide events for load/store counting at each CPU core and data transfers between core and L1 cache

Possible optimizations and/or fixes

Balance the work over all processing units as good as possible.

Load Imbalance

Contents

Description

Symptoms

Detection

Possible optimizations and/or fixes

Navigation menu

Search