Difference between revisions of "CodeCompositionIneffective"
Jump to navigation
Jump to search
(Created page with "== Description == == Symptoms == == Detection == == Possible optimizations and/or fixes == == Applicable applications or algorithms or kernels ==") |
m |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:Performance Pattern]] | ||
== Description == | == Description == | ||
+ | The pattern "Code composition - inefficient instructions" describes the usage of one kind of instructions although there exists a better kind, e.g. scalar vs. vectorized FP instructions. | ||
== Symptoms == | == Symptoms == | ||
+ | Like [[InstructionOverhead|Instruction Overhead]]. | ||
+ | |||
+ | For a piece of high-level code, the compiler outputs a lot of instructions although is could be done in less. One common example are non-vectorized instructions. | ||
== Detection == | == Detection == | ||
+ | For code with FP arithmetic, check whether scalar instructions are dominating in data parallel loops. | ||
+ | |||
+ | LIKWID groups: FLOPS_DP, FLOPS_SP (see vectorization ratio) | ||
== Possible optimizations and/or fixes == | == Possible optimizations and/or fixes == | ||
+ | * Try to give hints to the compiler so that it can use the more efficient instructions. (E.g. <code>#pragma simd</code>) | ||
+ | * Reorganize access pattern | ||
+ | * Avoid loop carried dependecies | ||
== Applicable applications or algorithms or kernels == | == Applicable applications or algorithms or kernels == |
Latest revision as of 15:17, 3 September 2019
Description
The pattern "Code composition - inefficient instructions" describes the usage of one kind of instructions although there exists a better kind, e.g. scalar vs. vectorized FP instructions.
Symptoms
Like Instruction Overhead.
For a piece of high-level code, the compiler outputs a lot of instructions although is could be done in less. One common example are non-vectorized instructions.
Detection
For code with FP arithmetic, check whether scalar instructions are dominating in data parallel loops.
LIKWID groups: FLOPS_DP, FLOPS_SP (see vectorization ratio)
Possible optimizations and/or fixes
- Try to give hints to the compiler so that it can use the more efficient instructions. (E.g.
#pragma simd
) - Reorganize access pattern
- Avoid loop carried dependecies