Difference between revisions of "OpenMP in Small Bites"
(Initial draft for OpenMP tutorial) |
|||
(31 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | [[Category:Tutorials]] [[Category: | + | [[Category:Tutorials]] [[Category:HPC-Developer]]<nowiki /> |
− | [[Category:Tutorials | OpenMP in Small Bites]] | + | [[Category:Tutorials | OpenMP in Small Bites]]<nowiki /> |
− | {{ | + | {{Syllabus OpenMP in Small Bites}}<nowiki /> |
− | This | + | |
+ | == Introduction == | ||
+ | |||
+ | Welcome to the HPC.NRW OpenMP Online Tutorial! | ||
+ | |||
+ | [[OpenMP|OpenMP]] (Open Multi-Processing) is the de-facto standard for parallel shared memory programming. With a set of compiler directives and API functions it provides a portable and scalable model to express parallelism. | ||
+ | |||
+ | This tutorial is targeted for novice HPC users as an initial introduction to shared-memory programming with OpenMP. | ||
+ | |||
+ | == How to proceed through this tutorial? == | ||
+ | |||
+ | The tutorial is made up of '''10 sections''' (1 more will follow soon). Each covers a separate stand-alone topic, but they are designed to be worked through '''in order'''. | ||
+ | |||
+ | Each tutorial consists of a '''short video''', followed by a couple of quiz questions for a self control. Everything in the tutorial is platform-independent and works with every operating system with an OpenMP-compatible compiler available. Although most examples are written in C/C++, the fundamental concepts also work with Fortran. | ||
+ | |||
+ | If you have any questions or encounter problems, you can contact us via e-mail at [mailto:tutorials@hpc.nrw tutorials@hpc.nrw]. | ||
+ | |||
+ | == Who created this tutorial? == | ||
+ | |||
+ | This tutorial has been developed within the framework of the [https://hpc.dh.nrw/ HPC.NRW] project. It is part of a [[:Category:Tutorials|series of online tutorials]] on various HPC-related topics, all of which were created by HPC.NRW members. Other topics are for example [[Introduction_to_Linux_in_HPC|Linux]] and [[Gprof_Tutorial|Gprof]], and new tutorials continue to be developed. | ||
+ | |||
+ | The speaker is '''Dr. Christian Terboven''' from RWTH Aachen University. Christian works at the university's IT center and is an active member of the OpenMP language committee for many years. Video editing was done primarily by himself an Marc-André Hermanns (RWTH Aachen University). The quiz section was primarily developed by Tim Cramer (RWTH Aachen University). Other contributions came from practically all HPC.NRW members. | ||
__NOTOC__ | __NOTOC__ | ||
− | = [ | + | == Topics == |
+ | |||
+ | === [[OpenMP_in_Small_Bites/Overview | Overview]] === | ||
+ | This session provides a brief history of OpenMP and then introduces the parallel region, one of the most fundamental concepts of OpenMP, used to mark code regions that are meant to be processed by multiple threads in parallel. | ||
− | = [ | + | === [[OpenMP_in_Small_Bites/Worksharing |Worksharing ]] === |
+ | This session shows the concept of OpenMP worksharing, loop scheduling and synchronization mechanisms. After this tutorial session the programmer already has knowledge about the most common used OpenMP constructs and API functions. | ||
− | = [ | + | === [[OpenMP_in_Small_Bites/Scoping |Date Scoping ]] === |
+ | This session provides an overview on managing one of the most challenging parts of OpenMP: Data Scoping. It discusses the differences between <code>private</code>, <code>firstprivate</code>, <code>lastprivate</code> and <code>shared</code> variables and shows how to implement a scalable reduction. | ||
− | = [ | + | === [[OpenMP_in_Small_Bites/False_Sharing |False Sharing ]] === |
+ | This session explains the concept of caches in parallel computer architectures, discusses the problem of false sharing, shows how it influences the performance of OpenMP programs and how to avoid it. | ||
− | = [ | + | === [[OpenMP_in_Small_Bites/Tasking |Tasking ]] === |
+ | This session introduces another way to to express parallelism in OpenMP: Tasking. This concept enables the programmer to parallelize code regions with non-canonical loop forms or regions which do not use loops at all (including recursive algorithms). The tutorial explains how to use OpenMP tasking, how to synchonize, how to deal with cut-off strategies and how an OpenMP runtime environment manages the tasks in queues. | ||
− | = [ | + | === [[OpenMP_in_Small_Bites/Tasking_and_Data_Scoping |Tasking and Data Scoping ]] === |
+ | This session deepens the knowledge of OpenMP Tasking and Data Scoping by using an example which includes typical scenarios. Furthermore, aspects of the lifetime of a variable are discussed. | ||
− | = [ | + | === [[OpenMP_in_Small_Bites/Tasking_and_Synchronization |Tasking and Synchronization ]] === |
+ | This session discusses different synchonrization mechanisms for OpenMP Tasking. | ||
− | = [ | + | === [[OpenMP_in_Small_Bites/Loops_and_Tasks |Loops and Tasks ]] === |
+ | This session discusses the taskloop constructof OpenMP. | ||
− | = [ | + | === [[OpenMP_in_Small_Bites/Tasking_Sudoku_example |Tasking Example: Sudoko Solver ]] === |
+ | This part shows an example for an Sudoku solver using OpenMP Tasking. | ||
− | = [ | + | === [[OpenMP_in_Small_Bites/Task_Scheduling | Task Scheduling ]] === |
+ | This session discusses how task scheduling works in OpenMP. | ||
− | = [ | + | === [[OpenMP_in_Small_Bites/NUMA |Non-Uniform Memory Access ]] === |
+ | This session shows how a non-uniform memory access (NUMA) architecture influences the performance of OpenMP programs. It explains how distribute data and threads across NUMA domains and how to avoid uncontrolled data or thread migration. |
Latest revision as of 13:25, 11 February 2022
Tutorial | |
---|---|
Title: | OpenMP in Small Bites |
Provider: | HPC.NRW
|
Contact: | tutorials@hpc.nrw |
Type: | Multi-part video |
Topic Area: | Programming Paradigms |
License: | CC-BY-SA |
Syllabus
| |
1. Overview | |
2. Worksharing | |
3. Data Scoping | |
4. False Sharing | |
5. Tasking | |
6. Tasking and Data Scoping | |
7. Tasking and Synchronization | |
8. Loops and Tasks | |
9. Tasking Example: Sudoku Solver | |
10. Task Scheduling | |
11. Non-Uniform Memory Access |
Introduction
Welcome to the HPC.NRW OpenMP Online Tutorial!
OpenMP (Open Multi-Processing) is the de-facto standard for parallel shared memory programming. With a set of compiler directives and API functions it provides a portable and scalable model to express parallelism.
This tutorial is targeted for novice HPC users as an initial introduction to shared-memory programming with OpenMP.
How to proceed through this tutorial?
The tutorial is made up of 10 sections (1 more will follow soon). Each covers a separate stand-alone topic, but they are designed to be worked through in order.
Each tutorial consists of a short video, followed by a couple of quiz questions for a self control. Everything in the tutorial is platform-independent and works with every operating system with an OpenMP-compatible compiler available. Although most examples are written in C/C++, the fundamental concepts also work with Fortran.
If you have any questions or encounter problems, you can contact us via e-mail at tutorials@hpc.nrw.
Who created this tutorial?
This tutorial has been developed within the framework of the HPC.NRW project. It is part of a series of online tutorials on various HPC-related topics, all of which were created by HPC.NRW members. Other topics are for example Linux and Gprof, and new tutorials continue to be developed.
The speaker is Dr. Christian Terboven from RWTH Aachen University. Christian works at the university's IT center and is an active member of the OpenMP language committee for many years. Video editing was done primarily by himself an Marc-André Hermanns (RWTH Aachen University). The quiz section was primarily developed by Tim Cramer (RWTH Aachen University). Other contributions came from practically all HPC.NRW members.
Topics
Overview
This session provides a brief history of OpenMP and then introduces the parallel region, one of the most fundamental concepts of OpenMP, used to mark code regions that are meant to be processed by multiple threads in parallel.
Worksharing
This session shows the concept of OpenMP worksharing, loop scheduling and synchronization mechanisms. After this tutorial session the programmer already has knowledge about the most common used OpenMP constructs and API functions.
Date Scoping
This session provides an overview on managing one of the most challenging parts of OpenMP: Data Scoping. It discusses the differences between private
, firstprivate
, lastprivate
and shared
variables and shows how to implement a scalable reduction.
False Sharing
This session explains the concept of caches in parallel computer architectures, discusses the problem of false sharing, shows how it influences the performance of OpenMP programs and how to avoid it.
Tasking
This session introduces another way to to express parallelism in OpenMP: Tasking. This concept enables the programmer to parallelize code regions with non-canonical loop forms or regions which do not use loops at all (including recursive algorithms). The tutorial explains how to use OpenMP tasking, how to synchonize, how to deal with cut-off strategies and how an OpenMP runtime environment manages the tasks in queues.
Tasking and Data Scoping
This session deepens the knowledge of OpenMP Tasking and Data Scoping by using an example which includes typical scenarios. Furthermore, aspects of the lifetime of a variable are discussed.
Tasking and Synchronization
This session discusses different synchonrization mechanisms for OpenMP Tasking.
Loops and Tasks
This session discusses the taskloop constructof OpenMP.
Tasking Example: Sudoko Solver
This part shows an example for an Sudoku solver using OpenMP Tasking.
Task Scheduling
This session discusses how task scheduling works in OpenMP.
Non-Uniform Memory Access
This session shows how a non-uniform memory access (NUMA) architecture influences the performance of OpenMP programs. It explains how distribute data and threads across NUMA domains and how to avoid uncontrolled data or thread migration.