Likwid

From HPC Wiki
Jump to navigation Jump to search

LIKWID is a tool suite for performance-oriented programmers and administrators. The term LIKWID stands for 'Like I know what I do'.

General

LIKWID provides a set of helpful tools for analysis of systems and applications:

  • likwid-topology: Show system topology ranging from thread topology to cache and finally to NUMA topology
  • likwid-pin: Pin application threads to specified CPUs
  • likwid-perfctr: Measure hardware counters for an application and show derived metrics
  • likwid-powermeter: Measure energy consumption of an application
  • likwid-bench: Microbenchmarking suite running hand-tuned assembly benchmarks
  • likwid-setFrequencies: Manipulate CPU and Uncore frequencies
  • likwid-features: Manipulate hardware features (e.g. (de)activate prefetchers)
  • likwid-memsweeper: Clean L3 and NUMA domains
  • likwid-perfscope: Similar to likwid-perfctr but provides live-plotting of the measured values
  • likwid-mpirun: MPI wrapper for likwid-pin and likwid-perfctr

likwid-topology

$ likwid-topology
--------------------------------------------------------------------------------
CPU name:	Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
CPU type:	Intel Core Haswell processor
CPU stepping:	3
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:		1
Cores per socket:	4
Threads per core:	2
--------------------------------------------------------------------------------
HWThread	Thread		Core		Socket		Available
0		0		0		0		*
1		0		1		0		*
2		0		2		0		*
3		0		3		0		*
4		1		0		0		*
5		1		1		0		*
6		1		2		0		*
7		1		3		0		*
--------------------------------------------------------------------------------
Socket 0:		( 0 4 1 5 2 6 3 7 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:			1
Size:			32 kB
Cache groups:		( 0 4 ) ( 1 5 ) ( 2 6 ) ( 3 7 )
--------------------------------------------------------------------------------
Level:			2
Size:			256 kB
Cache groups:		( 0 4 ) ( 1 5 ) ( 2 6 ) ( 3 7 )
--------------------------------------------------------------------------------
Level:			3
Size:			8 MB
Cache groups:		( 0 4 1 5 2 6 3 7 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:		1
--------------------------------------------------------------------------------
Domain:			0
Processors:		( 0 4 1 5 2 6 3 7 )
Distances:		10
Free memory:		791.02 MB
Total memory:		7867 MB
--------------------------------------------------------------------------------

likwid-pin

$ likwid-pin -c <cpu-selection> <application>
$ likwid-pin -c 0,1,2 Work/popen-intel-test/testomp
[pthread wrapper] 
[pthread wrapper] MAIN -> 0
[pthread wrapper] PIN_MASK: 0->1  1->2  
[pthread wrapper] SKIP MASK: 0x0
	threadid 139924335359744 -> core 1 - OK
	threadid 139924326967040 -> core 2 - OK
Hello World from thread 0 (CPU 0)
Hello World from thread 2 (CPU 2)
Hello World from thread 1 (CPU 1)
Number of threads = 3

likwid-perfctr

$ likwid-perfctr -C <cpu-selection> -g <eventset/group> <application>
$ likwid-perfctr -C 0,1,2 -g DATA Work/popen-intel-test/testomp
--------------------------------------------------------------------------------
CPU name:	Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
CPU type:	Intel Core Haswell processor
CPU clock:	3.39 GHz
--------------------------------------------------------------------------------
Hello World from thread 0 (CPU 0)
Hello World from thread 2 (CPU 2)
Hello World from thread 1 (CPU 1)
Number of threads = 3
--------------------------------------------------------------------------------
Group 1: DATA
+-------------------------+---------+----------+----------+---------+
|          Event          | Counter |  Core 0  |  Core 1  |  Core 2 |
+-------------------------+---------+----------+----------+---------+
|    INSTR_RETIRED_ANY    |  FIXC0  |  4303562 |  2545665 | 1253780 |
|  CPU_CLK_UNHALTED_CORE  |  FIXC1  |  4236458 |  5044024 | 1519026 |
|   CPU_CLK_UNHALTED_REF  |  FIXC2  | 12003768 | 14291288 | 4304060 |
|  MEM_UOPS_RETIRED_LOADS |   PMC0  |  1057917 |   414152 |  362871 |
| MEM_UOPS_RETIRED_STORES |   PMC1  |   407235 |    42647 |  180565 |
|     UOPS_RETIRED_ALL    |   PMC2  |  5282522 |  4780311 | 1496133 |
+-------------------------+---------+----------+----------+---------+

+------------------------------+---------+----------+---------+----------+--------------+
|             Event            | Counter |    Sum   |   Min   |    Max   |      Avg     |
+------------------------------+---------+----------+---------+----------+--------------+
|    INSTR_RETIRED_ANY STAT    |  FIXC0  |  8103007 | 1253780 |  4303562 | 2.701002e+06 |
|  CPU_CLK_UNHALTED_CORE STAT  |  FIXC1  | 10799508 | 1519026 |  5044024 |      3599836 |
|   CPU_CLK_UNHALTED_REF STAT  |  FIXC2  | 30599116 | 4304060 | 14291288 | 1.019971e+07 |
|  MEM_UOPS_RETIRED_LOADS STAT |   PMC0  |  1834940 |  362871 |  1057917 |  611646.6667 |
| MEM_UOPS_RETIRED_STORES STAT |   PMC1  |   630447 |   42647 |   407235 |       210149 |
|     UOPS_RETIRED_ALL STAT    |   PMC2  | 11558966 | 1496133 |  5282522 | 3.852989e+06 |
+------------------------------+---------+----------+---------+----------+--------------+

+----------------------+-----------+-----------+-----------+
|        Metric        |   Core 0  |   Core 1  |   Core 2  |
+----------------------+-----------+-----------+-----------+
|  Runtime (RDTSC) [s] |    0.0123 |    0.0123 |    0.0123 |
| Runtime unhalted [s] |    0.0012 |    0.0015 |    0.0004 |
|      Clock [MHz]     | 1197.1911 | 1197.2475 | 1197.1954 |
|          CPI         |    0.9844 |    1.9814 |    1.2116 |
|  Load to store ratio |    2.5978 |    9.7112 |    2.0096 |
|      Load ratio      |    0.2003 |    0.0866 |    0.2425 |
|      Store ratio     |    0.0771 |    0.0089 |    0.1207 |
+----------------------+-----------+-----------+-----------+

+---------------------------+-----------+-----------+-----------+-----------+
|           Metric          |    Sum    |    Min    |    Max    |    Avg    |
+---------------------------+-----------+-----------+-----------+-----------+
|  Runtime (RDTSC) [s] STAT |    0.0369 |    0.0123 |    0.0123 |    0.0123 |
| Runtime unhalted [s] STAT |    0.0031 |    0.0004 |    0.0015 |    0.0010 |
|      Clock [MHz] STAT     | 3591.6340 | 1197.1911 | 1197.2475 | 1197.2113 |
|          CPI STAT         |    4.1774 |    0.9844 |    1.9814 |    1.3925 |
|  Load to store ratio STAT |   14.3186 |    2.0096 |    9.7112 |    4.7729 |
|      Load ratio STAT      |    0.5294 |    0.0866 |    0.2425 |    0.1765 |
|      Store ratio STAT     |    0.2067 |    0.0089 |    0.1207 |    0.0689 |
+---------------------------+-----------+-----------+-----------+-----------+

For further tips how to use LIKWID check the Wiki