Difference between revisions of "GPU Tutorial/SAXPY CUDA C"
GPU Tutorial/SAXPY CUDA C
Jump to navigation
Jump to search
m |
|||
(4 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
This video discusses the SAXPY via NVIDIA CUDA C/C++. | This video discusses the SAXPY via NVIDIA CUDA C/C++. | ||
+ | CUDA is an application programming interface (API) for NVIDIA GPUs. In general, CUDA works with many programming languages, but this tutorial is going to focus on C/C++. CUDA gives access to a GPUs instruction set, which means we have to go through everything step-by-step, since many things do not happen automatically. | ||
=== Video === <!--T:5--> | === Video === <!--T:5--> | ||
Line 68: | Line 69: | ||
{ | { | ||
|type="()"} | |type="()"} | ||
− | - MyKernel() | + | - MyKernel(); |
|| Wrong. This would just execute an ordinary function. | || Wrong. This would just execute an ordinary function. | ||
− | - CUDA.run(NoBlocks, NoThreads, MyKernel()) | + | - CUDA.run(NoBlocks, NoThreads, MyKernel()); |
|| Wrong. There is no CUDA.run() | || Wrong. There is no CUDA.run() | ||
− | + <<<NoBlocks, NoThreads>>>MyKernel() | + | + <<<NoBlocks, NoThreads>>>MyKernel(); |
|| Correct | || Correct | ||
− | - __global(NoBlocks, NoThreads)__ MyKernel() | + | - __global(NoBlocks, NoThreads)__ MyKernel(); |
|| Wrong. __global__ and other modifiers cant have arguments and are part of a function definition, not launch. | || Wrong. __global__ and other modifiers cant have arguments and are part of a function definition, not launch. | ||
</quiz> | </quiz> | ||
Line 81: | Line 82: | ||
{{hidden begin | {{hidden begin | ||
− | |title = 5. Inside your kernel function, how do you distribute your data over the threads?}} | + | |title = 5. Inside your kernel function, how do you distribute your data over the GPU threads?}} |
<quiz display=simple> | <quiz display=simple> | ||
{ | { | ||
Line 89: | Line 90: | ||
+ Each thread has has an index attached to it, which is addressed via threadIdx.x | + Each thread has has an index attached to it, which is addressed via threadIdx.x | ||
|| Correct | || Correct | ||
− | - If you use array-element-wise operations | + | - If you use array-element-wise operations, e.g.: y.=a.*x.+b . This is managed by the NVIDIA preprocessor. |
|| Wrong. There are no element-wise operators in C/C++ | || Wrong. There are no element-wise operators in C/C++ | ||
- You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b | - You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b | ||
|| Wrong. These modifiers are used at function definitions. | || Wrong. These modifiers are used at function definitions. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
</quiz> | </quiz> | ||
{{hidden end}} | {{hidden end}} |
Latest revision as of 11:17, 3 January 2022
Tutorial | |
---|---|
Title: | Introduction to GPU Computing |
Provider: | HPC.NRW
|
Contact: | tutorials@hpc.nrw |
Type: | Multi-part video |
Topic Area: | GPU computing |
License: | CC-BY-SA |
Syllabus
| |
1. Introduction | |
2. Several Ways to SAXPY: CUDA C/C++ | |
3. Several Ways to SAXPY: OpenMP | |
4. Several Ways to SAXPY: Julia | |
5. Several Ways to SAXPY: NUMBA |
This video discusses the SAXPY via NVIDIA CUDA C/C++. CUDA is an application programming interface (API) for NVIDIA GPUs. In general, CUDA works with many programming languages, but this tutorial is going to focus on C/C++. CUDA gives access to a GPUs instruction set, which means we have to go through everything step-by-step, since many things do not happen automatically.
Video
Quiz
1. Which features does CUDA add to C/C++?
2. What is a kernel?
3. How do you flag a function to be a kernel?
4. Let's say you coded your kernel function called "MyKernel". How do you run it?
5. Inside your kernel function, how do you distribute your data over the GPU threads?