Difference between revisions of "OpenMP in Small Bites/NUMA"
OpenMP in Small Bites/NUMA
Jump to navigation
Jump to search
(Created page with "{{Infobox OpenMP in Small Bites}}") |
m (Tweak page sorting and title) |
||
(10 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | {{ | + | [[Category:Tutorials|Non-Uniform Memory Access (NUMA) Effects in OpenMP]]<nowiki /> |
+ | {{DISPLAYTITLE:Non-Uniform Memory Access (NUMA) Effects in OpenMP}}<nowiki /> | ||
+ | {{Syllabus OpenMP in Small Bites}}<nowiki /> | ||
+ | __TOC__ | ||
+ | |||
+ | This video shows how a non-uniform memory access (NUMA) architecture influences the performance of OpenMP programs. It explains how distribute data and threads across NUMA domains and how to avoid uncontrolled data or thread migration. | ||
+ | |||
+ | === Video === <!--T:5--> | ||
+ | |||
+ | <youtube width="600" height="340" right>MhlM-GiS1EM</youtube> | ||
+ | |||
+ | ([[Media:hpc.nrw_11_NUMA.pdf | Slides as pdf]]) | ||
+ | |||
+ | === Quiz === <!--T:5--> | ||
+ | |||
+ | |||
+ | {{hidden begin | ||
+ | |title = 1. Why is it important to initialize your data in parallel when executing on a NUMA architecture? | ||
+ | }} | ||
+ | <quiz display=simple> | ||
+ | { | ||
+ | |type="()"} | ||
+ | + Click and submit to see the answer | ||
+ | || Initializing the data in parallel distributes the date amoung the different sockets. When accessing the data in a hotspot region in the same pattern you avoid remote memory accesses. | ||
+ | </quiz> | ||
+ | {{hidden end}} | ||
+ | |||
+ | {{hidden begin | ||
+ | |title = 2. Why is it important to bind the threads? | ||
+ | }} | ||
+ | <quiz display=simple> | ||
+ | { | ||
+ | |type="()"} | ||
+ | + Click and submit to see the answer | ||
+ | || Otherwise the operating system might decide to migrate the thread from one core to anothers. This results in a performance penalty due to the context switch and a potential remote data access on NUMA architectures. | ||
+ | </quiz> | ||
+ | {{hidden end}} | ||
+ | |||
+ | {{hidden begin | ||
+ | |title = 3. Given a NUMA architecture with to two sockets with six cores each: How can you place the threads of an OpenMP program running with 4 threads among both sockets and bind them to a core? | ||
+ | }} | ||
+ | <quiz display=simple> | ||
+ | { | ||
+ | |type="()"} | ||
+ | + Click and submit to see the answer | ||
+ | || Set <code>OMP_PROC_BIND=spread</code> and <code>OMP_PLACES=cores</code> | ||
+ | </quiz> | ||
+ | {{hidden end}} |
Latest revision as of 16:42, 4 December 2020
Tutorial | |
---|---|
Title: | OpenMP in Small Bites |
Provider: | HPC.NRW
|
Contact: | tutorials@hpc.nrw |
Type: | Multi-part video |
Topic Area: | Programming Paradigms |
License: | CC-BY-SA |
Syllabus
| |
1. Overview | |
2. Worksharing | |
3. Data Scoping | |
4. False Sharing | |
5. Tasking | |
6. Tasking and Data Scoping | |
7. Tasking and Synchronization | |
8. Loops and Tasks | |
9. Tasking Example: Sudoku Solver | |
10. Task Scheduling | |
11. Non-Uniform Memory Access |
This video shows how a non-uniform memory access (NUMA) architecture influences the performance of OpenMP programs. It explains how distribute data and threads across NUMA domains and how to avoid uncontrolled data or thread migration.
Video
Quiz
1. Why is it important to initialize your data in parallel when executing on a NUMA architecture?
2. Why is it important to bind the threads?
3. Given a NUMA architecture with to two sockets with six cores each: How can you place the threads of an OpenMP program running with 4 threads among both sockets and bind them to a core?