Difference between revisions of "Likwid"
Jump to navigation
Jump to search
m |
|||
Line 1: | Line 1: | ||
[[Category:HPC-Developer]] | [[Category:HPC-Developer]] | ||
+ | [[Category:Benchmarking]]<nowiki /> | ||
+ | |||
LIKWID is a tool suite for performance-oriented programmers and administrators. The term LIKWID stands for 'Like I know what I do'. | LIKWID is a tool suite for performance-oriented programmers and administrators. The term LIKWID stands for 'Like I know what I do'. | ||
Latest revision as of 12:09, 19 July 2024
LIKWID is a tool suite for performance-oriented programmers and administrators. The term LIKWID stands for 'Like I know what I do'.
General
LIKWID provides a set of helpful tools for analysis of systems and applications:
- likwid-topology: Show system topology ranging from thread topology to cache and finally to NUMA topology
- likwid-pin: Pin application threads to specified CPUs
- likwid-perfctr: Measure hardware counters for an application and show derived metrics
- likwid-powermeter: Measure energy consumption of an application
- likwid-bench: Microbenchmarking suite running hand-tuned assembly benchmarks
- likwid-setFrequencies: Manipulate CPU and Uncore frequencies
- likwid-features: Manipulate hardware features (e.g. (de)activate prefetchers)
- likwid-memsweeper: Clean L3 and NUMA domains
- likwid-perfscope: Similar to likwid-perfctr but provides live-plotting of the measured values
- likwid-mpirun: MPI wrapper for likwid-pin and likwid-perfctr
likwid-topology
$ likwid-topology
--------------------------------------------------------------------------------
CPU name: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
CPU type: Intel Core Haswell processor
CPU stepping: 3
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets: 1
Cores per socket: 4
Threads per core: 2
--------------------------------------------------------------------------------
HWThread Thread Core Socket Available
0 0 0 0 *
1 0 1 0 *
2 0 2 0 *
3 0 3 0 *
4 1 0 0 *
5 1 1 0 *
6 1 2 0 *
7 1 3 0 *
--------------------------------------------------------------------------------
Socket 0: ( 0 4 1 5 2 6 3 7 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level: 1
Size: 32 kB
Cache groups: ( 0 4 ) ( 1 5 ) ( 2 6 ) ( 3 7 )
--------------------------------------------------------------------------------
Level: 2
Size: 256 kB
Cache groups: ( 0 4 ) ( 1 5 ) ( 2 6 ) ( 3 7 )
--------------------------------------------------------------------------------
Level: 3
Size: 8 MB
Cache groups: ( 0 4 1 5 2 6 3 7 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains: 1
--------------------------------------------------------------------------------
Domain: 0
Processors: ( 0 4 1 5 2 6 3 7 )
Distances: 10
Free memory: 791.02 MB
Total memory: 7867 MB
--------------------------------------------------------------------------------
likwid-pin
$ likwid-pin -c <cpu-selection> <application>
$ likwid-pin -c 0,1,2 Work/popen-intel-test/testomp
[pthread wrapper]
[pthread wrapper] MAIN -> 0
[pthread wrapper] PIN_MASK: 0->1 1->2
[pthread wrapper] SKIP MASK: 0x0
threadid 139924335359744 -> core 1 - OK
threadid 139924326967040 -> core 2 - OK
Hello World from thread 0 (CPU 0)
Hello World from thread 2 (CPU 2)
Hello World from thread 1 (CPU 1)
Number of threads = 3
likwid-perfctr
$ likwid-perfctr -C <cpu-selection> -g <eventset/group> <application>
$ likwid-perfctr -C 0,1,2 -g DATA Work/popen-intel-test/testomp
--------------------------------------------------------------------------------
CPU name: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
CPU type: Intel Core Haswell processor
CPU clock: 3.39 GHz
--------------------------------------------------------------------------------
Hello World from thread 0 (CPU 0)
Hello World from thread 2 (CPU 2)
Hello World from thread 1 (CPU 1)
Number of threads = 3
--------------------------------------------------------------------------------
Group 1: DATA
+-------------------------+---------+----------+----------+---------+
| Event | Counter | Core 0 | Core 1 | Core 2 |
+-------------------------+---------+----------+----------+---------+
| INSTR_RETIRED_ANY | FIXC0 | 4303562 | 2545665 | 1253780 |
| CPU_CLK_UNHALTED_CORE | FIXC1 | 4236458 | 5044024 | 1519026 |
| CPU_CLK_UNHALTED_REF | FIXC2 | 12003768 | 14291288 | 4304060 |
| MEM_UOPS_RETIRED_LOADS | PMC0 | 1057917 | 414152 | 362871 |
| MEM_UOPS_RETIRED_STORES | PMC1 | 407235 | 42647 | 180565 |
| UOPS_RETIRED_ALL | PMC2 | 5282522 | 4780311 | 1496133 |
+-------------------------+---------+----------+----------+---------+
+------------------------------+---------+----------+---------+----------+--------------+
| Event | Counter | Sum | Min | Max | Avg |
+------------------------------+---------+----------+---------+----------+--------------+
| INSTR_RETIRED_ANY STAT | FIXC0 | 8103007 | 1253780 | 4303562 | 2.701002e+06 |
| CPU_CLK_UNHALTED_CORE STAT | FIXC1 | 10799508 | 1519026 | 5044024 | 3599836 |
| CPU_CLK_UNHALTED_REF STAT | FIXC2 | 30599116 | 4304060 | 14291288 | 1.019971e+07 |
| MEM_UOPS_RETIRED_LOADS STAT | PMC0 | 1834940 | 362871 | 1057917 | 611646.6667 |
| MEM_UOPS_RETIRED_STORES STAT | PMC1 | 630447 | 42647 | 407235 | 210149 |
| UOPS_RETIRED_ALL STAT | PMC2 | 11558966 | 1496133 | 5282522 | 3.852989e+06 |
+------------------------------+---------+----------+---------+----------+--------------+
+----------------------+-----------+-----------+-----------+
| Metric | Core 0 | Core 1 | Core 2 |
+----------------------+-----------+-----------+-----------+
| Runtime (RDTSC) [s] | 0.0123 | 0.0123 | 0.0123 |
| Runtime unhalted [s] | 0.0012 | 0.0015 | 0.0004 |
| Clock [MHz] | 1197.1911 | 1197.2475 | 1197.1954 |
| CPI | 0.9844 | 1.9814 | 1.2116 |
| Load to store ratio | 2.5978 | 9.7112 | 2.0096 |
| Load ratio | 0.2003 | 0.0866 | 0.2425 |
| Store ratio | 0.0771 | 0.0089 | 0.1207 |
+----------------------+-----------+-----------+-----------+
+---------------------------+-----------+-----------+-----------+-----------+
| Metric | Sum | Min | Max | Avg |
+---------------------------+-----------+-----------+-----------+-----------+
| Runtime (RDTSC) [s] STAT | 0.0369 | 0.0123 | 0.0123 | 0.0123 |
| Runtime unhalted [s] STAT | 0.0031 | 0.0004 | 0.0015 | 0.0010 |
| Clock [MHz] STAT | 3591.6340 | 1197.1911 | 1197.2475 | 1197.2113 |
| CPI STAT | 4.1774 | 0.9844 | 1.9814 | 1.3925 |
| Load to store ratio STAT | 14.3186 | 2.0096 | 9.7112 | 4.7729 |
| Load ratio STAT | 0.5294 | 0.0866 | 0.2425 | 0.1765 |
| Store ratio STAT | 0.2067 | 0.0089 | 0.1207 | 0.0689 |
+---------------------------+-----------+-----------+-----------+-----------+
For further tips how to use LIKWID check the Likwid Wiki