Difference between revisions of "InstructionOverhead"
Jump to navigation
Jump to search
m |
|||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:Performance Pattern]] | ||
== Description == | == Description == | ||
The pattern "Instruction Overhead" describes the fact that for a piece of high-level code, the compiler outputs a lot of instructions although is could be done in less. One common example are non-vectorized instructions. | The pattern "Instruction Overhead" describes the fact that for a piece of high-level code, the compiler outputs a lot of instructions although is could be done in less. One common example are non-vectorized instructions. | ||
Line 4: | Line 5: | ||
== Symptoms == | == Symptoms == | ||
+ | Instruction Overhead causes a low application performance and a good scaling behavior across cores. The performance is insensitive to the problem size. | ||
== Detection == | == Detection == | ||
+ | * Low CPI value (near to theoretical limit) | ||
+ | * Large non-FP instruction count (constant vs. number of cores) | ||
== Possible optimizations and/or fixes == | == Possible optimizations and/or fixes == | ||
+ | It depends on the kind of instructions. If the code is using scalar FP instructions, activate vectorization to reduce the number of instructions. | ||
== Applicable applications or algorithms or kernels == | == Applicable applications or algorithms or kernels == |
Latest revision as of 07:21, 4 September 2019
Description
The pattern "Instruction Overhead" describes the fact that for a piece of high-level code, the compiler outputs a lot of instructions although is could be done in less. One common example are non-vectorized instructions.
Symptoms
Instruction Overhead causes a low application performance and a good scaling behavior across cores. The performance is insensitive to the problem size.
Detection
- Low CPI value (near to theoretical limit)
- Large non-FP instruction count (constant vs. number of cores)
Possible optimizations and/or fixes
It depends on the kind of instructions. If the code is using scalar FP instructions, activate vectorization to reduce the number of instructions.