Difference between revisions of "GPU Tutorial/SAXPY CUDA C"
GPU Tutorial/SAXPY CUDA C
Jump to navigation
Jump to search
Line 11: | Line 11: | ||
([[Media:GPU_tutorial_saxpy_cuda_c.pdf |Slides as pdf]]) | ([[Media:GPU_tutorial_saxpy_cuda_c.pdf |Slides as pdf]]) | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Line 106: | Line 64: | ||
{{hidden begin | {{hidden begin | ||
− | |title = 4. Let's say you coded | + | |title = 4. Let's say you coded your kernel function called "MyKernel". How do you run it?}} |
<quiz display=simple> | <quiz display=simple> | ||
{ | { | ||
Line 112: | Line 70: | ||
- MyKernel() | - MyKernel() | ||
|| Wrong | || Wrong | ||
− | - | + | - CUDA.run(NoBlocks, NoThreads, MyKernel()) |
|| Wrong | || Wrong | ||
− | + <<<NoBlocks,NoThreads>>>MyKernel() | + | + <<<NoBlocks, NoThreads>>>MyKernel() |
|| Correct | || Correct | ||
− | - __global(NoBlocks,NoThreads)__ MyKernel() | + | - __global(NoBlocks, NoThreads)__ MyKernel() |
+ | || Wrong | ||
+ | </quiz> | ||
+ | {{hidden end}} | ||
+ | |||
+ | |||
+ | {{hidden begin | ||
+ | |title = 5. Inside your kernel function, how do you distribute your data over the threads?}} | ||
+ | <quiz display=simple> | ||
+ | { | ||
+ | |type="()"} | ||
+ | - You don't have to, CUDA does that automatically for you. | ||
|| Wrong | || Wrong | ||
+ | + Each thread has has an index attached to it, which is addressed via threadIdx.x | ||
+ | || Correct | ||
+ | - If you use array-element-wise operations (like y.=a.*x.+b ), this is managed by the NVIDIA preprocessor. | ||
+ | || Wrong | ||
+ | - You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b | ||
+ | || Wrong | ||
+ | </quiz> | ||
+ | {{hidden end}} | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | === Introduction Quiz === <!--T:5--> | ||
+ | {{hidden begin | ||
+ | |title = 1. For which kind of program can we expect improvements with GPUs?}} | ||
+ | <quiz display=simple> | ||
+ | { | ||
+ | |type="()"} | ||
+ | - serial programs | ||
+ | || Correct: CPU: optimized for low latency (strong single thread); GPU: optimized for throughput (massive parallelism) | ||
+ | + parallel programs | ||
+ | || Wrong: CPU: optimized for low latency (strong single thread); GPU: optimized for throughput (massive parallelism) | ||
+ | </quiz> | ||
+ | {{hidden end}} | ||
+ | |||
+ | |||
+ | {{hidden begin | ||
+ | |title = 2. What does GPU stands for?}} | ||
+ | <quiz display=simple> | ||
+ | { | ||
+ | |type="()"} | ||
+ | + graphics processing unit | ||
+ | || Correct | ||
+ | - grand powerful unit | ||
+ | || Wrong | ||
+ | </quiz> | ||
+ | {{hidden end}} | ||
+ | |||
+ | |||
+ | {{hidden begin | ||
+ | |title = 3. Why do we expect an onverhead in the GPU timings?}} | ||
+ | <quiz display=simple> | ||
+ | { | ||
+ | |type="()"} | ||
+ | - The data must be copied to an extra device first and has to be transferred back later | ||
+ | || Correct, but his is not the whole answer. | ||
+ | - A GPU core is "weaker" than a CPU core | ||
+ | || Correct, but his is not the whole answer. | ||
+ | - For "small" problems like the SAXPY, the whole power of a GPU is rarely used | ||
+ | || Correct, but his is not the whole answer. | ||
+ | + All of the above | ||
+ | || Correct! | ||
</quiz> | </quiz> | ||
{{hidden end}} | {{hidden end}} |
Revision as of 12:20, 11 November 2021
Tutorial | |
---|---|
Title: | Introduction to GPU Computing |
Provider: | HPC.NRW
|
Contact: | tutorials@hpc.nrw |
Type: | Multi-part video |
Topic Area: | GPU computing |
License: | CC-BY-SA |
Syllabus
| |
1. Introduction | |
2. Several Ways to SAXPY: CUDA C/C++ | |
3. Several Ways to SAXPY: OpenMP | |
4. Several Ways to SAXPY: Julia | |
5. Several Ways to SAXPY: NUMBA |
This video discusses the SAXPY via NVIDIA CUDA C/C++.
Video
Quiz
1. Which features does CUDA add to C/C++?
2. What is a kernel?
3. How do you flag a function to be a kernel?
4. Let's say you coded your kernel function called "MyKernel". How do you run it?
5. Inside your kernel function, how do you distribute your data over the threads?
Introduction Quiz
1. For which kind of program can we expect improvements with GPUs?
2. What does GPU stands for?
3. Why do we expect an onverhead in the GPU timings?