OpenMP in Small Bites

From HPC Wiki
Revision as of 13:25, 11 February 2022 by (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Title: OpenMP in Small Bites
Provider: HPC.NRW

Type: Multi-part video
Topic Area: Programming Paradigms
License: CC-BY-SA

1. Overview
2. Worksharing
3. Data Scoping
4. False Sharing
5. Tasking
6. Tasking and Data Scoping
7. Tasking and Synchronization
8. Loops and Tasks
9. Tasking Example: Sudoku Solver
10. Task Scheduling
11. Non-Uniform Memory Access


Welcome to the HPC.NRW OpenMP Online Tutorial!

OpenMP (Open Multi-Processing) is the de-facto standard for parallel shared memory programming. With a set of compiler directives and API functions it provides a portable and scalable model to express parallelism.

This tutorial is targeted for novice HPC users as an initial introduction to shared-memory programming with OpenMP.

How to proceed through this tutorial?

The tutorial is made up of 10 sections (1 more will follow soon). Each covers a separate stand-alone topic, but they are designed to be worked through in order.

Each tutorial consists of a short video, followed by a couple of quiz questions for a self control. Everything in the tutorial is platform-independent and works with every operating system with an OpenMP-compatible compiler available. Although most examples are written in C/C++, the fundamental concepts also work with Fortran.

If you have any questions or encounter problems, you can contact us via e-mail at

Who created this tutorial?

This tutorial has been developed within the framework of the HPC.NRW project. It is part of a series of online tutorials on various HPC-related topics, all of which were created by HPC.NRW members. Other topics are for example Linux and Gprof, and new tutorials continue to be developed.

The speaker is Dr. Christian Terboven from RWTH Aachen University. Christian works at the university's IT center and is an active member of the OpenMP language committee for many years. Video editing was done primarily by himself an Marc-André Hermanns (RWTH Aachen University). The quiz section was primarily developed by Tim Cramer (RWTH Aachen University). Other contributions came from practically all HPC.NRW members.



This session provides a brief history of OpenMP and then introduces the parallel region, one of the most fundamental concepts of OpenMP, used to mark code regions that are meant to be processed by multiple threads in parallel.


This session shows the concept of OpenMP worksharing, loop scheduling and synchronization mechanisms. After this tutorial session the programmer already has knowledge about the most common used OpenMP constructs and API functions.

Date Scoping

This session provides an overview on managing one of the most challenging parts of OpenMP: Data Scoping. It discusses the differences between private, firstprivate, lastprivate and shared variables and shows how to implement a scalable reduction.

False Sharing

This session explains the concept of caches in parallel computer architectures, discusses the problem of false sharing, shows how it influences the performance of OpenMP programs and how to avoid it.


This session introduces another way to to express parallelism in OpenMP: Tasking. This concept enables the programmer to parallelize code regions with non-canonical loop forms or regions which do not use loops at all (including recursive algorithms). The tutorial explains how to use OpenMP tasking, how to synchonize, how to deal with cut-off strategies and how an OpenMP runtime environment manages the tasks in queues.

Tasking and Data Scoping

This session deepens the knowledge of OpenMP Tasking and Data Scoping by using an example which includes typical scenarios. Furthermore, aspects of the lifetime of a variable are discussed.

Tasking and Synchronization

This session discusses different synchonrization mechanisms for OpenMP Tasking.

Loops and Tasks

This session discusses the taskloop constructof OpenMP.

Tasking Example: Sudoko Solver

This part shows an example for an Sudoku solver using OpenMP Tasking.

Task Scheduling

This session discusses how task scheduling works in OpenMP.

Non-Uniform Memory Access

This session shows how a non-uniform memory access (NUMA) architecture influences the performance of OpenMP programs. It explains how distribute data and threads across NUMA domains and how to avoid uncontrolled data or thread migration.