Difference between revisions of "GPU Tutorial/SAXPY CUDA C"

Latest revision as of 12:17, 3 January 2022

Tutorial
Title:	Introduction to GPU Computing
Provider:	HPC.NRW
Contact:	tutorials@hpc.nrw
Type:	Multi-part video
Topic Area:	GPU computing
License:	CC-BY-SA
Syllabus
1. Introduction
2. Several Ways to SAXPY: CUDA C/C++
3. Several Ways to SAXPY: OpenMP
4. Several Ways to SAXPY: Julia
5. Several Ways to SAXPY: NUMBA

This video discusses the SAXPY via NVIDIA CUDA C/C++. CUDA is an application programming interface (API) for NVIDIA GPUs. In general, CUDA works with many programming languages, but this tutorial is going to focus on C/C++. CUDA gives access to a GPUs instruction set, which means we have to go through everything step-by-step, since many things do not happen automatically.

Video

(Slides as pdf)

Quiz

1. Which features does CUDA add to C/C++?

2. What is a kernel?

3. How do you flag a function to be a kernel?

4. Let's say you coded your kernel function called "MyKernel". How do you run it?

5. Inside your kernel function, how do you distribute your data over the GPU threads?

@@ Line 5: / Line 5: @@
 This video discusses the SAXPY via NVIDIA CUDA C/C++.
+CUDA is an application programming interface (API) for NVIDIA GPUs. In general, CUDA works with many programming languages, but this tutorial is going to focus on C/C++. CUDA gives access to a GPUs instruction set, which means we have to go through everything step-by-step, since many things do not happen automatically.
 === Video === <!--T:5-->
@@ Line 10: / Line 11: @@
 <youtube width="600" height="340" right>rgqORzT1oCw</youtube>
-([[Media:GPU_tutorial_saxpy_cuda_c.pdf | Slides as pdf]])
+([[Media:GPU_tutorial_saxpy_cuda_c.pdf |Slides as pdf]])
 === Quiz === <!--T:5-->
 {{hidden begin
-|title = 1. For which kind of programm can we expect improvements with GPUs?}}
+|title = 1. Which features does CUDA add to C/C++?}}
 <quiz display=simple>
 {
 |type="()"}
-+ serial programs
+- new functions
-|| Correct: CPU: optimized for low latency (strong single thread); GPU: optimized for throughput (massive parallelism)
+|| CUDA does not only add new functions, but all of these features.
-- parallel programs
+- new syntax
-|| Wrong: CPU: optimized for low latency (strong single thread); GPU: optimized for throughput (massive parallelism)
+|| CUDA does not only add new syntax, but all of these features.
+- GPU support
+|| CUDA does not only add GPU support, but all of these features.
++ All of the above
+|| Correct
 </quiz>
 {{hidden end}}
@@ Line 27: / Line 34: @@
 {{hidden begin
-|title = 2. What does GPU stands for?}}
+|title = 2. What is a kernel?}}
 <quiz display=simple>
 {
 |type="()"}
-+ graphics processing unit
+- It's a flag you can set to automatically parallelize any function.
+|| Unfortunately, life is not that easy.
++ It's the part of your code that is run on the GPU.
 || Correct
--  grand powerful unit
+- It's a new CUDA function that activates the GPU.
 || Wrong
 </quiz>
@@ Line 40: / Line 49: @@
 {{hidden begin
-|title = 3. Why do we expect an onverhead in the GPU timings?}}
+|title = 3. How do you flag a function to be a kernel?}}
+<quiz display=simple>
+{
+|type="()"}
+- __host__
+|| Wrong. This specifies a function that runs on the CPU.
+- __device__
+|| Wrong. This indeed does specify a function that runs on the GPU, but it also needs to be called from the GPU, while we want a kernel to be launched by the CPU.
++ __global__
+|| Correct
+- __GPU__
+|| Wrong. This modifier doesn't exist.
+</quiz>
+{{hidden end}}
+{{hidden begin
+|title = 4. Let's say you coded your kernel function called "MyKernel". How do you run it?}}
+<quiz display=simple>
+{
+|type="()"}
+- MyKernel();
+|| Wrong. This would just execute an ordinary function.
+- CUDA.run(NoBlocks, NoThreads, MyKernel());
+|| Wrong. There is no CUDA.run()
++ <<<NoBlocks, NoThreads>>>MyKernel();
+|| Correct
+- __global(NoBlocks, NoThreads)__ MyKernel();
+|| Wrong. __global__ and other modifiers cant have arguments and are part of a function definition, not launch.
+</quiz>
+{{hidden end}}
+{{hidden begin
+|title = 5. Inside your kernel function, how do you distribute your data over the GPU threads?}}
 <quiz display=simple>
 {
 |type="()"}
-- The data must be copied to an extra device first and has to be transferred back later
+- You don't have to, CUDA does that automatically for you.
-|| Correct, but his is not the whole answer.
+|| Wrong
-- A GPU core is "weaker" than a CPU core
++ Each thread has has an index attached to it, which is addressed via threadIdx.x
-|| Correct, but his is not the whole answer.
+|| Correct
-- For "small" problems like the SAXPY, the whole power of a GPU is rarely used
+- If you use array-element-wise operations, e.g.: y.=a.*x.+b . This is managed by the NVIDIA preprocessor.
-|| Correct, but his is not the whole answer.
+|| Wrong. There are no element-wise operators in C/C++
-+ All of the above
+- You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b
-|| Correct!
+|| Wrong. These modifiers are used at function definitions.
 </quiz>
 {{hidden end}}

	new functions
	new syntax
	GPU support
	All of the above

	__host__
	__device__
	__global__
	__GPU__

	MyKernel();
	CUDA.run(NoBlocks, NoThreads, MyKernel());
	<<<NoBlocks, NoThreads>>>MyKernel();
	__global(NoBlocks, NoThreads)__ MyKernel();

	You don't have to, CUDA does that automatically for you.
	Each thread has has an index attached to it, which is addressed via threadIdx.x
	If you use array-element-wise operations, e.g.: y.=a.*x.+b . This is managed by the NVIDIA preprocessor.
	You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b

Difference between revisions of "GPU Tutorial/SAXPY CUDA C"

Latest revision as of 12:17, 3 January 2022

Contents

Video

Quiz

Navigation menu

Search

	It's a flag you can set to automatically parallelize any function.
	It's the part of your code that is run on the GPU.
	It's a new CUDA function that activates the GPU.