Difference between revisions of "MicroArchitecturalAnomalies"
m |
|||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:Performance Pattern]] | ||
== Description == | == Description == | ||
The pattern 'Micro architectural anomalies' describes performance limiting factors that are caused by the hardware design. Assuming your system has one load unit with one load per cycle, your code might be limited by this design decision when your code performs a lot of loads. Other cases are penalties caused in specific situations (mispredicted branches like to drain the pipeline and therefore cause some penalty). | The pattern 'Micro architectural anomalies' describes performance limiting factors that are caused by the hardware design. Assuming your system has one load unit with one load per cycle, your code might be limited by this design decision when your code performs a lot of loads. Other cases are penalties caused in specific situations (mispredicted branches like to drain the pipeline and therefore cause some penalty). | ||
Line 9: | Line 10: | ||
Since these anomalies are all over the chip, there is no common way to detect them. | Since these anomalies are all over the chip, there is no common way to detect them. | ||
− | Mispredicted branches: LIKWID group BRANCH | + | * Mispredicted branches: LIKWID group BRANCH |
− | Stalls: all events which match *STALL* like RESOURCE_STALLS_RS (Stalls at reservation station), RESOURCE_STALLS_SB (Stalls due to store buffer), ... | + | * Stalls: all events which match *STALL* like RESOURCE_STALLS_RS (Stalls at reservation station), RESOURCE_STALLS_SB (Stalls due to store buffer), ... |
− | Penelties: all events which match *CYCLES* like UOPS_ISSUED_STALL_CYCLES, UOPS_EXECUTED_STALL_CYCLES and UOPS_RETIRED_STALL_CYCLES | + | * Penelties: all events which match *CYCLES* like UOPS_ISSUED_STALL_CYCLES, UOPS_EXECUTED_STALL_CYCLES and UOPS_RETIRED_STALL_CYCLES |
Latest revision as of 07:25, 4 September 2019
Description
The pattern 'Micro architectural anomalies' describes performance limiting factors that are caused by the hardware design. Assuming your system has one load unit with one load per cycle, your code might be limited by this design decision when your code performs a lot of loads. Other cases are penalties caused in specific situations (mispredicted branches like to drain the pipeline and therefore cause some penalty).
Symptoms
The symptoms can be various but in order to express it the most general way: Large discrepancy from performance model based on loads/stores and arithmetic throughput.
Detection
Since these anomalies are all over the chip, there is no common way to detect them.
- Mispredicted branches: LIKWID group BRANCH
- Stalls: all events which match *STALL* like RESOURCE_STALLS_RS (Stalls at reservation station), RESOURCE_STALLS_SB (Stalls due to store buffer), ...
- Penelties: all events which match *CYCLES* like UOPS_ISSUED_STALL_CYCLES, UOPS_EXECUTED_STALL_CYCLES and UOPS_RETIRED_STALL_CYCLES
Possible optimizations and/or fixes
If you can add a workaround that does not reduce the performance of your code, try it.