Difference between revisions of "SynchronizationOverhead"

From HPC Wiki
Jump to navigation Jump to search
(Created page with "== Description == == Symptoms == == Detection == == Possible optimizations and/or fixes == == Applicable applications or algorithms or kernels ==")
 
m
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
 +
[[Category:Performance Pattern]]
 
== Description ==
 
== Description ==
 +
The pattern "Synchonization overhead" describes the performance limitation caused by frequent synchronization calls in parallel environments. Each synchronization causes threads that finished earlier with their workload have to wait for slower threads.
 +
  
 
== Symptoms ==
 
== Symptoms ==
 +
* Speedup going down as more cores are added
 +
* No speedup with small problem sizes
 +
* Cores are busy but low FP performance
  
  
 
== Detection ==
 
== Detection ==
 +
* Large non-FP instruction count (growing with used number of cores)
 +
* Low (good) CPI although misleading
  
 +
The CPI is measured by LIKWID in all groups. For the instruction/FP ratio, use the FLOPS_DP/FLOPS_SP groups.
  
 
== Possible optimizations and/or fixes ==
 
== Possible optimizations and/or fixes ==
 +
* Reduce number of synchronization points
  
  
 
== Applicable applications or algorithms or kernels ==
 
== Applicable applications or algorithms or kernels ==

Latest revision as of 09:30, 5 September 2019

Description

The pattern "Synchonization overhead" describes the performance limitation caused by frequent synchronization calls in parallel environments. Each synchronization causes threads that finished earlier with their workload have to wait for slower threads.


Symptoms

  • Speedup going down as more cores are added
  • No speedup with small problem sizes
  • Cores are busy but low FP performance


Detection

  • Large non-FP instruction count (growing with used number of cores)
  • Low (good) CPI although misleading

The CPI is measured by LIKWID in all groups. For the instruction/FP ratio, use the FLOPS_DP/FLOPS_SP groups.

Possible optimizations and/or fixes

  • Reduce number of synchronization points


Applicable applications or algorithms or kernels