Difference between revisions of "Building LLVM/Clang with OpenMP Offloading to NVIDIA GPUs"
m |
|||
Line 1: | Line 1: | ||
[[Category:HPC-Developer]] | [[Category:HPC-Developer]] | ||
+ | <!-- | ||
Clang 7.0, released in September 2018, has support for offloading to NVIDIA GPUs. | Clang 7.0, released in September 2018, has support for offloading to NVIDIA GPUs. | ||
These instructions will guide you through the process of building the Clang compiler on Linux. | These instructions will guide you through the process of building the Clang compiler on Linux. | ||
While this page refers to version 7.0, it should be applicable (with possibly minor adaptions) to later versions. | While this page refers to version 7.0, it should be applicable (with possibly minor adaptions) to later versions. | ||
It's recommended to get the latest release from https://releases.llvm.org/! | It's recommended to get the latest release from https://releases.llvm.org/! | ||
− | + | --> | |
+ | This guide describes how to build the Clang compiler with OpenMP support for offloading computational task to Nvidia GPUs. A working Linux environment with GCC (8.3.0) and CMake (3.15.6) is assumed for the build process. LLVM/Clang ([https://github.com/llvm/llvm-project/releases 10.0.0] or later) is recommended, because some bugs relevant to OpenMP GPU-Offloading were found in earlier versions of LLVM/Clang in [https://github.com/pc2/OMP-Offloading our tests]. | ||
+ | <!-- | ||
== Determine GPU Architectures == | == Determine GPU Architectures == | ||
Line 14: | Line 17: | ||
A clearly structured table can be found on [https://en.wikipedia.org/wiki/CUDA#GPUs_supported Wikpedia] or in NVIDIA's [https://developer.nvidia.com/cuda-gpus developer documentation]. | A clearly structured table can be found on [https://en.wikipedia.org/wiki/CUDA#GPUs_supported Wikpedia] or in NVIDIA's [https://developer.nvidia.com/cuda-gpus developer documentation]. | ||
As an example, the "Tesla P100" has compute capability 6.0 while the more recent Volta GPU "Tesla V100" is listed with 7.0. | As an example, the "Tesla P100" has compute capability 6.0 while the more recent Volta GPU "Tesla V100" is listed with 7.0. | ||
+ | --> | ||
+ | |||
+ | == Determine GPU(s) on Compute Node == | ||
+ | |||
+ | First of all, we need to determine whether the GPU(s) on a compute node can be correctly identified by using the command <code>nvidia-smi</code>. As an example, the output below shows two Nvidia RTX 2080 Ti GPUs on one compute node in the OCuLUS system at [https://pc2.uni-paderborn.de/ Paderborn Center for Parallel Computing], Paderborn University, Germany. | ||
+ | |||
+ | <pre> | ||
+ | +-----------------------------------------------------------------------------+ | ||
+ | | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | | ||
+ | |-------------------------------+----------------------+----------------------+ | ||
+ | | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | ||
+ | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | ||
+ | |===============================+======================+======================| | ||
+ | | 0 GeForce RTX 208... Off | 00000000:03:00.0 Off | N/A | | ||
+ | | 31% 35C P0 64W / 250W | 0MiB / 11019MiB | 0% Default | | ||
+ | +-------------------------------+----------------------+----------------------+ | ||
+ | | 1 GeForce RTX 208... Off | 00000000:84:00.0 Off | N/A | | ||
+ | | 35% 34C P0 35W / 250W | 0MiB / 11019MiB | 0% Default | | ||
+ | +-------------------------------+----------------------+----------------------+ | ||
+ | |||
+ | +-----------------------------------------------------------------------------+ | ||
+ | | Processes: GPU Memory | | ||
+ | | GPU PID Type Process name Usage | | ||
+ | |=============================================================================| | ||
+ | | No running processes found | | ||
+ | +-----------------------------------------------------------------------------+ | ||
+ | </pre> | ||
+ | As can be seen, the Nvidia driver version is 440.33.01 and CUDA version is 10.2. Then, we're ready to build LLVM/Clang with OpenMP supporting for GPU-offloading. | ||
+ | <!-- | ||
== Install Prerequisites == | == Install Prerequisites == | ||
Line 64: | Line 96: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Again the last step is optional if you are skipping <code>compiler-rt</code>. | Again the last step is optional if you are skipping <code>compiler-rt</code>. | ||
+ | --> | ||
+ | |||
+ | == Download LLVM/Clang (10.0.0 or later) == | ||
+ | |||
+ | LLVM/Clang (10.0.0) can be obtained by running: | ||
+ | |||
+ | <syntaxhighlight lang="bash"> | ||
+ | curl -Ls https://github.com/llvm/llvm-project/archive/llvmorg-10.0.0.tar.gz | tar zxf - | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | Whereas the latest version on GitHub can be downloaded by running: | ||
+ | |||
+ | <syntaxhighlight lang="bash"> | ||
+ | git clone https://github.com/llvm/llvm-project.git | ||
+ | </syntaxhighlight> | ||
== Build the Compiler == | == Build the Compiler == | ||
− | + | <!-- | |
With the sources in place let's proceed to configure and build the compiler. | With the sources in place let's proceed to configure and build the compiler. | ||
Projects using CMake are usually built in a separate directory: | Projects using CMake are usually built in a separate directory: | ||
Line 166: | Line 213: | ||
This should give you some <code>libomptarget-nvptx-sm_??.bc</code> libraries as mentioned in the warning message. | This should give you some <code>libomptarget-nvptx-sm_??.bc</code> libraries as mentioned in the warning message. | ||
+ | --> | ||
+ | To support OpenMP GPU-offloading two building steps for LLVM/Clang are required: first compile LLVM/Clang with GCC and then bootstrap LLVM/Clang itself. | ||
+ | |||
+ | === Build LLVM/Clang with GCC === | ||
+ | |||
+ | The following commands can be used to compile and install Clang as well as necessary libraries. See https://llvm.org/docs/ for the explanation of the cmake options. | ||
+ | <pre> | ||
+ | cmake \ | ||
+ | -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;libcxx;libcxxabi;lld;openmp" \ | ||
+ | -DCMAKE_BUILD_TYPE=Release \ | ||
+ | -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \ | ||
+ | -DLLVM_ENABLE_ASSERTIONS=ON \ | ||
+ | -DLLVM_ENABLE_BACKTRACES=ON \ | ||
+ | -DLLVM_ENABLE_WERROR=OFF \ | ||
+ | -DBUILD_SHARED_LIBS=OFF \ | ||
+ | -DLLVM_ENABLE_RTTI=ON \ | ||
+ | -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_61 \ | ||
+ | -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=35,37,50,52,60,61,70,75 \ | ||
+ | -DCMAKE_C_COMPILER=gcc \ | ||
+ | -DCMAKE_CXX_COMPILER=g++ \ | ||
+ | -G "Unix Makefiles" the-llvm-project-directory/llvm | ||
+ | make -j 64 | ||
+ | make install | ||
+ | </pre> | ||
+ | |||
+ | === Bootstrap LLVM/Clang === | ||
+ | |||
+ | The following commands can be used to bootstrap Clang by itself. Please note GNU's libstdc++ (instead of libc++ from LLVM) is used during linking. | ||
+ | <pre> | ||
+ | cmake \ | ||
+ | -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;libcxx;libcxxabi;lld;openmp" \ | ||
+ | -DCMAKE_BUILD_TYPE=Release \ | ||
+ | -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \ | ||
+ | -DLLVM_ENABLE_ASSERTIONS=ON \ | ||
+ | -DLLVM_ENABLE_BACKTRACES=ON \ | ||
+ | -DLLVM_ENABLE_WERROR=OFF \ | ||
+ | -DBUILD_SHARED_LIBS=OFF \ | ||
+ | -DLLVM_ENABLE_RTTI=ON \ | ||
+ | -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_61 \ | ||
+ | -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=35,37,50,52,60,61,70,75 \ | ||
+ | -DCMAKE_C_COMPILER=clang \ | ||
+ | -DCMAKE_CXX_COMPILER=clang++ \ | ||
+ | -G "Unix Makefiles" the-llvm-project-directory/llvm | ||
+ | make -j 64 | ||
+ | make install | ||
+ | </pre> | ||
== Done == | == Done == | ||
+ | Now, we have successfully installed the Clang compiler with OpenMP GPU-offloading support. Code samples of OpenMP GPU-offloading and more information can be found at https://github.com/pc2/OMP-Offloading. | ||
+ | |||
+ | <!-- | ||
Following the instructions up to this point you should now have a fully working Clang compiler with support for OpenMP offloading! | Following the instructions up to this point you should now have a fully working Clang compiler with support for OpenMP offloading! | ||
<span style="font-size:85%;">This guide was originally published as a blog post: https://www.hahnjo.de/blog/2018/10/08/clang-7.0-openmp-offloading-nvidia.html</span> | <span style="font-size:85%;">This guide was originally published as a blog post: https://www.hahnjo.de/blog/2018/10/08/clang-7.0-openmp-offloading-nvidia.html</span> | ||
+ | --> |
Revision as of 23:28, 27 March 2020
This guide describes how to build the Clang compiler with OpenMP support for offloading computational task to Nvidia GPUs. A working Linux environment with GCC (8.3.0) and CMake (3.15.6) is assumed for the build process. LLVM/Clang (10.0.0 or later) is recommended, because some bugs relevant to OpenMP GPU-Offloading were found in earlier versions of LLVM/Clang in our tests.
Determine GPU(s) on Compute Node
First of all, we need to determine whether the GPU(s) on a compute node can be correctly identified by using the command nvidia-smi
. As an example, the output below shows two Nvidia RTX 2080 Ti GPUs on one compute node in the OCuLUS system at Paderborn Center for Parallel Computing, Paderborn University, Germany.
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:03:00.0 Off | N/A | | 31% 35C P0 64W / 250W | 0MiB / 11019MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce RTX 208... Off | 00000000:84:00.0 Off | N/A | | 35% 34C P0 35W / 250W | 0MiB / 11019MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
As can be seen, the Nvidia driver version is 440.33.01 and CUDA version is 10.2. Then, we're ready to build LLVM/Clang with OpenMP supporting for GPU-offloading.
Download LLVM/Clang (10.0.0 or later)
LLVM/Clang (10.0.0) can be obtained by running:
curl -Ls https://github.com/llvm/llvm-project/archive/llvmorg-10.0.0.tar.gz | tar zxf -
Whereas the latest version on GitHub can be downloaded by running:
git clone https://github.com/llvm/llvm-project.git
Build the Compiler
To support OpenMP GPU-offloading two building steps for LLVM/Clang are required: first compile LLVM/Clang with GCC and then bootstrap LLVM/Clang itself.
Build LLVM/Clang with GCC
The following commands can be used to compile and install Clang as well as necessary libraries. See https://llvm.org/docs/ for the explanation of the cmake options.
cmake \ -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;libcxx;libcxxabi;lld;openmp" \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_ENABLE_BACKTRACES=ON \ -DLLVM_ENABLE_WERROR=OFF \ -DBUILD_SHARED_LIBS=OFF \ -DLLVM_ENABLE_RTTI=ON \ -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_61 \ -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=35,37,50,52,60,61,70,75 \ -DCMAKE_C_COMPILER=gcc \ -DCMAKE_CXX_COMPILER=g++ \ -G "Unix Makefiles" the-llvm-project-directory/llvm make -j 64 make install
Bootstrap LLVM/Clang
The following commands can be used to bootstrap Clang by itself. Please note GNU's libstdc++ (instead of libc++ from LLVM) is used during linking.
cmake \ -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;libcxx;libcxxabi;lld;openmp" \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \ -DLLVM_ENABLE_ASSERTIONS=ON \ -DLLVM_ENABLE_BACKTRACES=ON \ -DLLVM_ENABLE_WERROR=OFF \ -DBUILD_SHARED_LIBS=OFF \ -DLLVM_ENABLE_RTTI=ON \ -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_61 \ -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=35,37,50,52,60,61,70,75 \ -DCMAKE_C_COMPILER=clang \ -DCMAKE_CXX_COMPILER=clang++ \ -G "Unix Makefiles" the-llvm-project-directory/llvm make -j 64 make install
Done
Now, we have successfully installed the Clang compiler with OpenMP GPU-offloading support. Code samples of OpenMP GPU-offloading and more information can be found at https://github.com/pc2/OMP-Offloading.