Likwid

From HPC Wiki
Revision as of 08:24, 4 September 2019 by Daniel-schurhoff-de23@rwth-aachen.de (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

LIKWID is a tool suite for performance-oriented programmers and administrators. The term LIKWID stands for 'Like I know what I do'.

General

LIKWID provides a set of helpful tools for analysis of systems and applications:

  • likwid-topology: Show system topology ranging from thread topology to cache and finally to NUMA topology
  • likwid-pin: Pin application threads to specified CPUs
  • likwid-perfctr: Measure hardware counters for an application and show derived metrics
  • likwid-powermeter: Measure energy consumption of an application
  • likwid-bench: Microbenchmarking suite running hand-tuned assembly benchmarks
  • likwid-setFrequencies: Manipulate CPU and Uncore frequencies
  • likwid-features: Manipulate hardware features (e.g. (de)activate prefetchers)
  • likwid-memsweeper: Clean L3 and NUMA domains
  • likwid-perfscope: Similar to likwid-perfctr but provides live-plotting of the measured values
  • likwid-mpirun: MPI wrapper for likwid-pin and likwid-perfctr

likwid-topology

$ likwid-topology
--------------------------------------------------------------------------------
CPU name:	Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
CPU type:	Intel Core Haswell processor
CPU stepping:	3
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:		1
Cores per socket:	4
Threads per core:	2
--------------------------------------------------------------------------------
HWThread	Thread		Core		Socket		Available
0		0		0		0		*
1		0		1		0		*
2		0		2		0		*
3		0		3		0		*
4		1		0		0		*
5		1		1		0		*
6		1		2		0		*
7		1		3		0		*
--------------------------------------------------------------------------------
Socket 0:		( 0 4 1 5 2 6 3 7 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:			1
Size:			32 kB
Cache groups:		( 0 4 ) ( 1 5 ) ( 2 6 ) ( 3 7 )
--------------------------------------------------------------------------------
Level:			2
Size:			256 kB
Cache groups:		( 0 4 ) ( 1 5 ) ( 2 6 ) ( 3 7 )
--------------------------------------------------------------------------------
Level:			3
Size:			8 MB
Cache groups:		( 0 4 1 5 2 6 3 7 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:		1
--------------------------------------------------------------------------------
Domain:			0
Processors:		( 0 4 1 5 2 6 3 7 )
Distances:		10
Free memory:		791.02 MB
Total memory:		7867 MB
--------------------------------------------------------------------------------

likwid-pin

$ likwid-pin -c <cpu-selection> <application>
$ likwid-pin -c 0,1,2 Work/popen-intel-test/testomp
[pthread wrapper] 
[pthread wrapper] MAIN -> 0
[pthread wrapper] PIN_MASK: 0->1  1->2  
[pthread wrapper] SKIP MASK: 0x0
	threadid 139924335359744 -> core 1 - OK
	threadid 139924326967040 -> core 2 - OK
Hello World from thread 0 (CPU 0)
Hello World from thread 2 (CPU 2)
Hello World from thread 1 (CPU 1)
Number of threads = 3

likwid-perfctr

$ likwid-perfctr -C <cpu-selection> -g <eventset/group> <application>
$ likwid-perfctr -C 0,1,2 -g DATA Work/popen-intel-test/testomp
--------------------------------------------------------------------------------
CPU name:	Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
CPU type:	Intel Core Haswell processor
CPU clock:	3.39 GHz
--------------------------------------------------------------------------------
Hello World from thread 0 (CPU 0)
Hello World from thread 2 (CPU 2)
Hello World from thread 1 (CPU 1)
Number of threads = 3
--------------------------------------------------------------------------------
Group 1: DATA
+-------------------------+---------+----------+----------+---------+
|          Event          | Counter |  Core 0  |  Core 1  |  Core 2 |
+-------------------------+---------+----------+----------+---------+
|    INSTR_RETIRED_ANY    |  FIXC0  |  4303562 |  2545665 | 1253780 |
|  CPU_CLK_UNHALTED_CORE  |  FIXC1  |  4236458 |  5044024 | 1519026 |
|   CPU_CLK_UNHALTED_REF  |  FIXC2  | 12003768 | 14291288 | 4304060 |
|  MEM_UOPS_RETIRED_LOADS |   PMC0  |  1057917 |   414152 |  362871 |
| MEM_UOPS_RETIRED_STORES |   PMC1  |   407235 |    42647 |  180565 |
|     UOPS_RETIRED_ALL    |   PMC2  |  5282522 |  4780311 | 1496133 |
+-------------------------+---------+----------+----------+---------+

+------------------------------+---------+----------+---------+----------+--------------+
|             Event            | Counter |    Sum   |   Min   |    Max   |      Avg     |
+------------------------------+---------+----------+---------+----------+--------------+
|    INSTR_RETIRED_ANY STAT    |  FIXC0  |  8103007 | 1253780 |  4303562 | 2.701002e+06 |
|  CPU_CLK_UNHALTED_CORE STAT  |  FIXC1  | 10799508 | 1519026 |  5044024 |      3599836 |
|   CPU_CLK_UNHALTED_REF STAT  |  FIXC2  | 30599116 | 4304060 | 14291288 | 1.019971e+07 |
|  MEM_UOPS_RETIRED_LOADS STAT |   PMC0  |  1834940 |  362871 |  1057917 |  611646.6667 |
| MEM_UOPS_RETIRED_STORES STAT |   PMC1  |   630447 |   42647 |   407235 |       210149 |
|     UOPS_RETIRED_ALL STAT    |   PMC2  | 11558966 | 1496133 |  5282522 | 3.852989e+06 |
+------------------------------+---------+----------+---------+----------+--------------+

+----------------------+-----------+-----------+-----------+
|        Metric        |   Core 0  |   Core 1  |   Core 2  |
+----------------------+-----------+-----------+-----------+
|  Runtime (RDTSC) [s] |    0.0123 |    0.0123 |    0.0123 |
| Runtime unhalted [s] |    0.0012 |    0.0015 |    0.0004 |
|      Clock [MHz]     | 1197.1911 | 1197.2475 | 1197.1954 |
|          CPI         |    0.9844 |    1.9814 |    1.2116 |
|  Load to store ratio |    2.5978 |    9.7112 |    2.0096 |
|      Load ratio      |    0.2003 |    0.0866 |    0.2425 |
|      Store ratio     |    0.0771 |    0.0089 |    0.1207 |
+----------------------+-----------+-----------+-----------+

+---------------------------+-----------+-----------+-----------+-----------+
|           Metric          |    Sum    |    Min    |    Max    |    Avg    |
+---------------------------+-----------+-----------+-----------+-----------+
|  Runtime (RDTSC) [s] STAT |    0.0369 |    0.0123 |    0.0123 |    0.0123 |
| Runtime unhalted [s] STAT |    0.0031 |    0.0004 |    0.0015 |    0.0010 |
|      Clock [MHz] STAT     | 3591.6340 | 1197.1911 | 1197.2475 | 1197.2113 |
|          CPI STAT         |    4.1774 |    0.9844 |    1.9814 |    1.3925 |
|  Load to store ratio STAT |   14.3186 |    2.0096 |    9.7112 |    4.7729 |
|      Load ratio STAT      |    0.5294 |    0.0866 |    0.2425 |    0.1765 |
|      Store ratio STAT     |    0.2067 |    0.0089 |    0.1207 |    0.0689 |
+---------------------------+-----------+-----------+-----------+-----------+

For further tips how to use LIKWID check the Likwid Wiki