Difference between revisions of "User:Robert-schade-e757@uni-paderborn.de"

@@ Line 1: / Line 1: @@
-[[Category:HPC-Developer]]
-= Clang 11.0 and Cuda 10.1 =
-= Clang 7.0 and Cuda 9.2 =
-Clang 7.0, released in September 2018, has support for offloading to NVIDIA GPUs.
-These instructions will guide you through the process of building the Clang compiler on Linux.
-While this page refers to version 7.0, it should be applicable (with possibly minor adaptions) to later versions.
-It's recommended to get the latest release from https://releases.llvm.org/!
-== Determine GPU Architectures ==
-As of writing Clang's OpenMP implementation for NVIDIA GPUs doesn't support multiple GPU architectures in a single binary.
-This means that you have to know the target GPU when compiling an OpenMP application.
-Additionally Clang needs compatible runtime libraries for every architecture that you'll want to use in the future.
-So first of all you need to gather a list of GPU models that you are going to run on and map them to a list of architectures.
-A clearly structured table can be found on [https://en.wikipedia.org/wiki/CUDA#GPUs_supported Wikpedia] or in NVIDIA's [https://developer.nvidia.com/cuda-gpus developer documentation].
-As an example, the "Tesla P100" has compute capability 6.0 while the more recent Volta GPU "Tesla V100" is listed with 7.0.
-== Install Prerequisites ==
-Building LLVM requires some software:
-* First you'll need some standard tools like <code>make</code>, <code>tar</code>, and <code>xz</code>. If you don't have them installed, please consult your distribution's instructions on how to get them.
-* For the build process a compiler already needs to be installed. Most Linux systems default to the [https://gcc.gnu.org/ GNU Compiler Collection (gcc)]. Please ensure that you have at least version 4.8 or refer to some online tutorials on how to install one for your system. If you happen to have an older installation of Clang, any version greater than version 3.1 should be fine.
-* Additionally LLVM requires a (more or less) recent CMake, at least version 3.4.3. If your distribution doesn't provide an adequate version, see https://cmake.org/ on how to get it.
-* For the runtime libraries the system needs both <code>libelf</code> and its development headers.
-* Last but not least, you'll need the CUDA toolkit by NVIDIA. However the latest CUDA 10.0 is not yet compatible with Clang 7.0. For that release it's recommended to use [https://developer.nvidia.com/cuda-92-download-archive version 9.2]. This release also has support for Volta GPUs which may already be found in some HPC systems.
-== Download and Extract Sources ==
-The [https://llvm.org/ LLVM project] consists of multiple components.
-For the purpose of this guide, you need at least the LLVM Core libraries, Clang and the OpenMP project.
-Download their tarballs from https://releases.llvm.org/:
-<syntaxhighlight lang="bash">
- $ wget https://releases.llvm.org/7.0.0/llvm-7.0.0.src.tar.xz
- $ wget https://releases.llvm.org/7.0.0/cfe-7.0.0.src.tar.xz
- $ wget https://releases.llvm.org/7.0.0/openmp-7.0.0.src.tar.xz
-</syntaxhighlight>
-You might also want to download and build <code>compiler-rt</code>:
-<syntaxhighlight lang="bash">
- $ wget https://releases.llvm.org/7.0.0/compiler-rt-7.0.0.src.tar.xz
-</syntaxhighlight>
-This will give you some runtime libraries that are required to use Clang's sanitizers.
-A detailed explanation would go beyond the scope of this page, but you can take a look at the documentation of
-[https://clang.llvm.org/docs/AddressSanitizer.html ASan], [https://clang.llvm.org/docs/LeakSanitizer.html LSan],
-[https://clang.llvm.org/docs/MemorySanitizer.html MSan], and [https://clang.llvm.org/docs/ThreadSanitizer.html TSan].
-(Please keep in mind that these links document the current development, so not all features might be available in a released version!)
-It's highly recommended to verify the integrity of the downloaded archives.
-Each file has been signed by the release manager and you can find both the public key and <code>.sig</code> files next to the files you have just downloaded.
-The next step is to unpack the tarballs: (the last step may be skipped if you don't want to build <code>compiler-rt</code>)
-<syntaxhighlight lang="bash">
- $ tar xf llvm-7.0.0.src.tar.xz
- $ tar xf cfe-7.0.0.src.tar.xz
- $ tar xf openmp-7.0.0.src.tar.xz
- $ tar xf compiler-rt-7.0.0.src.tar.xz
-</syntaxhighlight>
-This should leave you with 3 / 4 directories named <code>llvm-7.0.0.src</code>, <code>cfe-7.0.0.src</code>, <code>openmp-7.0.0.src</code>, and (optionally) <code>compiler-rt-7.0.0.src</code>.
-All these components can be built together if the directories are correctly nested:
-<syntaxhighlight lang="bash">
- $ mv cfe-7.0.0.src llvm-7.0.0.src/tools/clang
- $ mv openmp-7.0.0.src llvm-7.0.0.src/projects/openmp
- $ mv compiler-rt-7.0.0.src llvm-7.0.0.src/projects/compiler-rt
-</syntaxhighlight>
-Again the last step is optional if you are skipping <code>compiler-rt</code>.
-== Build the Compiler ==
-With the sources in place let's proceed to configure and build the compiler.
-Projects using CMake are usually built in a separate directory:
-<syntaxhighlight lang="bash">
- $ mkdir build
- $ cd build
-</syntaxhighlight>
-The next steps will be pretty IO-intensive, so it might be a good idea to put the build directory on a locally attached disk (or even an SSD).
-Next CMake needs to generate <code>Makefile</code>s which will eventually be used for compilation:
-<syntaxhighlight lang="bash">
- $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$(pwd)/../install \
-	-DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_60 \
-	-DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=35,60,70 ../llvm-7.0.0.src
-</syntaxhighlight>
-Of course you can use any other Generator that CMake supports.
-The first two flags are standard for CMake projects:
-<code>CMAKE_BUILD_TYPE=Release</code> turns on optimizations and disables debug information.
-<code>CMAKE_INSTALL_PREFIX</code> specifies where the final binaries and libraries will be installed.
-Be sure to choose a permanent location if you are building in a temporary directory.
-The other two options are related to the GPU architectures as mentioned above.
-<code>CLANG_OPENMP_NVPTX_DEFAULT_ARCH</code> sets the default architecture when not passing the value during compilation.
-You should adjust the default to match the environment you'll be using most of the time.
-The architecture must be prefix with <code>sm_</code>, so Clang configured with the above command will build for the Tesla P100 by default.<br/>
-<code>LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES</code> applies to the runtime libraries:
-It specifies a list of architectures that the libraries will be built for.
-As you cannot run on GPUs without a compatible runtime, you should pass all architectures you care about.
-Also, please note that the values are passed without the dot, so compute capability 7.0 becomes <code>70</code>.
-If everything went right you should see something like the following towards the end of the output:
-<syntaxhighlight lang="bash">
--- Found LIBOMPTARGET_DEP_LIBELF: /usr/lib64/libelf.so
--- Found PkgConfig: /usr/bin/pkg-config (found version "0.27.1")
--- Found LIBOMPTARGET_DEP_LIBFFI: /usr/lib64/libffi.so
--- Found LIBOMPTARGET_DEP_CUDA_DRIVER: <<<REDACTED>>>/libcuda.so
--- LIBOMPTARGET: Building offloading runtime library libomptarget.
--- LIBOMPTARGET: Not building aarch64 offloading plugin: machine not found in the system.
--- LIBOMPTARGET: Building CUDA offloading plugin.
--- LIBOMPTARGET: Not building PPC64 offloading plugin: machine not found in the system.
--- LIBOMPTARGET: Not building PPC64le offloading plugin: machine not found in the system.
--- LIBOMPTARGET: Building x86_64 offloading plugin.
--- LIBOMPTARGET: Building CUDA offloading device RTL.
-</syntaxhighlight>
-In this case the system also has <code>libffi</code> installed which allows building a plugin that offloads to the host (here: <code>x86_64</code>).
-This is mostly used for testing and not required for offloading to GPUs.
-Now comes the time-consuming part:
-<syntaxhighlight lang="bash">
- $ make -j8
-</syntaxhighlight>
-Using the <code>-j</code> parameter (short for <code>--jobs</code>) you can allow <code>make</code> to run multiple commands concurrently.
-Usually the number of cores in your server is a reasonable choice which can speed up the compilation by a good deal.
-Afterwards the built libraries and binaries need to be installed:
-<syntaxhighlight lang="bash">
- $ make -j8 install
-</syntaxhighlight>
-== Rebuild the OpenMP Runtime Libraries with Clang ==
-If you tried to compile an application with OpenMP offloading right now, Clang would print the following message:
- clang-7: warning: No library 'libomptarget-nvptx-sm_60.bc' found in the default clang lib directory or in LIBRARY_PATH. Expect degraded performance due to no inlining of runtime functions on target devices. [-Wopenmp-target]
-As you'd expect from a warning you can run perfectly fine without these "bitcode libraries".
-However GPUs are meant as an accelerator so you want your application to run as fast as possible.
-To get the missing libraries you'll need to recompile the OpenMP project, using Clang built in the previous step.
-Instead of only rebuilding the OpenMP project, it's also possible to repeat step 3 entirely.
-That's usually referred to as "bootstrapping" because Clang is compiling its own source code.
-This is usually preferred when installing a released version of a compiler.<br/>
-Anyway, the following will explain building only the OpenMP runtime libraries which will get you the required files much faster.
-To do so, first create a new build directory:
-<syntaxhighlight lang="bash">
- $ cd ..
- $ mkdir build-openmp
- $ cd build-openmp
-</syntaxhighlight>
-Now configure the project with CMake using the Clang compiler built in the previous step:
-<syntaxhighlight lang="bash">
- $ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$(pwd)/../install \
-	-DCMAKE_C_COMPILER=$(pwd)/../install/bin/clang \
-	-DCMAKE_CXX_COMPILER=$(pwd)/../install/bin/clang++ \
-	-DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=35,60,70 \
-	../llvm-7.0.0.src/projects/openmp
-</syntaxhighlight>
-The flags are the same as above except that we want to use a different compiler.
-With CMake this can be adjusted with <code>CMAKE_C_COMPILER</code> and <code>CMAKE_CXX_COMPILER</code>.
-If you installed the binaries to a different location, you need to adapt their values accordingly.
-Build and install the OpenMP runtime libraries:
-<syntaxhighlight lang="bash">
- $ make -j8
- $ make -j8 install
-</syntaxhighlight>
-This should give you some <code>libomptarget-nvptx-sm_??.bc</code> libraries as mentioned in the warning message.
-== Done ==
-Following the instructions up to this point you should now have a fully working Clang compiler with support for OpenMP offloading!
-<span style="font-size:85%;">This guide was originally published as a blog post: https://www.hahnjo.de/blog/2018/10/08/clang-7.0-openmp-offloading-nvidia.html</span>

Difference between revisions of "User:Robert-schade-e757@uni-paderborn.de"

Latest revision as of 17:06, 2 November 2020

Navigation menu

Search