Difference between revisions of "CodeCompositionIneffective"

From HPC Wiki
Jump to navigation Jump to search
(Created page with "== Description == == Symptoms == == Detection == == Possible optimizations and/or fixes == == Applicable applications or algorithms or kernels ==")
 
m
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
 +
[[Category:Performance Pattern]]
 
== Description ==
 
== Description ==
 +
The pattern "Code composition - inefficient instructions" describes the usage of one kind of instructions although there exists a better kind, e.g. scalar vs. vectorized FP instructions.
  
 
== Symptoms ==
 
== Symptoms ==
 +
Like [[InstructionOverhead|Instruction Overhead]].
 +
 +
For a piece of high-level code, the compiler outputs a lot of instructions although is could be done in less. One common example are non-vectorized instructions.
  
  
 
== Detection ==
 
== Detection ==
 +
For code with FP arithmetic, check whether scalar instructions are dominating in data parallel loops.
 +
 +
LIKWID groups: FLOPS_DP, FLOPS_SP (see vectorization ratio)
  
  
 
== Possible optimizations and/or fixes ==
 
== Possible optimizations and/or fixes ==
 +
* Try to give hints to the compiler so that it can use the more efficient instructions. (E.g. <code>#pragma simd</code>)
 +
* Reorganize access pattern
 +
* Avoid loop carried dependecies
  
  
 
== Applicable applications or algorithms or kernels ==
 
== Applicable applications or algorithms or kernels ==

Latest revision as of 16:17, 3 September 2019

Description

The pattern "Code composition - inefficient instructions" describes the usage of one kind of instructions although there exists a better kind, e.g. scalar vs. vectorized FP instructions.

Symptoms

Like Instruction Overhead.

For a piece of high-level code, the compiler outputs a lot of instructions although is could be done in less. One common example are non-vectorized instructions.


Detection

For code with FP arithmetic, check whether scalar instructions are dominating in data parallel loops.

LIKWID groups: FLOPS_DP, FLOPS_SP (see vectorization ratio)


Possible optimizations and/or fixes

  • Try to give hints to the compiler so that it can use the more efficient instructions. (E.g. #pragma simd)
  • Reorganize access pattern
  • Avoid loop carried dependecies


Applicable applications or algorithms or kernels