Difference between revisions of "LatencyBoundDataAccess"

From HPC Wiki
Jump to navigation Jump to search
(Created page with "== Description == == Symptoms == == Detection == == Possible optimizations and/or fixes == == Applicable applications or algorithms or kernels ==")
 
m
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
 +
[[Category:Performance Pattern]]
 
== Description ==
 
== Description ==
 +
The pattern "Latency bound data access" describes the cases when the in-core throughput is limited by the data transfer latencies.
  
 
== Symptoms ==
 
== Symptoms ==
 +
A simple bandwidth model is much too optimistic
 +
* Low bandwidth utilization
 +
* Low cache hit ratio
 +
* Frequent evicts and/or replacements of cache lines in the caches
  
 +
== Detection ==
 +
Use the LIKWID groups CACHE(S), DATA and MEM. Also the L2, L2CACHE, L3 and L3CACHE groups are informative, especially concerning cache hit ratio and evicts/replaces of CLs.
  
== Detection ==
 
  
  
 
== Possible optimizations and/or fixes ==
 
== Possible optimizations and/or fixes ==
 +
You cannot reduce the latency itself but you can try to keep this data as long as possible in caches to avoid frequent reloading from high-latency units.
  
  
 
== Applicable applications or algorithms or kernels ==
 
== Applicable applications or algorithms or kernels ==

Latest revision as of 08:24, 4 September 2019

Description

The pattern "Latency bound data access" describes the cases when the in-core throughput is limited by the data transfer latencies.

Symptoms

A simple bandwidth model is much too optimistic

  • Low bandwidth utilization
  • Low cache hit ratio
  • Frequent evicts and/or replacements of cache lines in the caches

Detection

Use the LIKWID groups CACHE(S), DATA and MEM. Also the L2, L2CACHE, L3 and L3CACHE groups are informative, especially concerning cache hit ratio and evicts/replaces of CLs.


Possible optimizations and/or fixes

You cannot reduce the latency itself but you can try to keep this data as long as possible in caches to avoid frequent reloading from high-latency units.


Applicable applications or algorithms or kernels