The pattern "Latency bound data access" describes the cases when the in-core throughput is limited by the data transfer latencies.
A simple bandwidth model is much too optimistic
- Low bandwidth utilization
- Low cache hit ratio
- Frequent evicts and/or replacements of cache lines in the caches
Use the LIKWID groups CACHE(S), DATA and MEM. Also the L2, L2CACHE, L3 and L3CACHE groups are informative, especially concerning cache hit ratio and evicts/replaces of CLs.
Possible optimizations and/or fixes
You cannot reduce the latency itself but you can try to keep this data as long as possible in caches to avoid frequent reloading from high-latency units.