Difference between revisions of "GPU Tutorial/SAXPY CUDA C"

Latest revision as of 12:17, 3 January 2022

Tutorial
Title:	Introduction to GPU Computing
Provider:	HPC.NRW
Contact:	tutorials@hpc.nrw
Type:	Multi-part video
Topic Area:	GPU computing
License:	CC-BY-SA
Syllabus
1. Introduction
2. Several Ways to SAXPY: CUDA C/C++
3. Several Ways to SAXPY: OpenMP
4. Several Ways to SAXPY: Julia
5. Several Ways to SAXPY: NUMBA

This video discusses the SAXPY via NVIDIA CUDA C/C++. CUDA is an application programming interface (API) for NVIDIA GPUs. In general, CUDA works with many programming languages, but this tutorial is going to focus on C/C++. CUDA gives access to a GPUs instruction set, which means we have to go through everything step-by-step, since many things do not happen automatically.

Video

(Slides as pdf)

Quiz

Collapse

1. Which features does CUDA add to C/C++?

Collapse

2. What is a kernel?

Collapse

3. How do you flag a function to be a kernel?

Collapse

4. Let's say you coded your kernel function called "MyKernel". How do you run it?

Collapse

5. Inside your kernel function, how do you distribute your data over the GPU threads?

@@ Line 5: / Line 5: @@
 This video discusses the SAXPY via NVIDIA CUDA C/C++.
+CUDA is an application programming interface (API) for NVIDIA GPUs. In general, CUDA works with many programming languages, but this tutorial is going to focus on C/C++. CUDA gives access to a GPUs instruction set, which means we have to go through everything step-by-step, since many things do not happen automatically.
 === Video === <!--T:5-->
@@ Line 10: / Line 11: @@
 <youtube width="600" height="340" right>rgqORzT1oCw</youtube>
-([[Media:GPU_tutorial_saxpy_cuda_c.pdf | Slides as pdf]])
+([[Media:GPU_tutorial_saxpy_cuda_c.pdf |Slides as pdf]])
 === Quiz === <!--T:5-->
 {{hidden begin
-|title = 1. For which kind of programm can we expect improvements with GPUs?}}
+|title = 1. Which features does CUDA add to C/C++?}}
 <quiz display=simple>
 {
 |type="()"}
-+ serial programs
+- new functions
-|| Correct: CPU: optimized for low latency (strong single thread); GPU: optimized for throughput (massive parallelism)
+|| CUDA does not only add new functions, but all of these features.
-- parallel programs
+- new syntax
-|| Wrong: CPU: optimized for low latency (strong single thread); GPU: optimized for throughput (massive parallelism)
+|| CUDA does not only add new syntax, but all of these features.
+- GPU support
+|| CUDA does not only add GPU support, but all of these features.
++ All of the above
+|| Correct
 </quiz>
 {{hidden end}}
@@ Line 27: / Line 34: @@
 {{hidden begin
-|title = 2. What does GPU stands for?}}
+|title = 2. What is a kernel?}}
 <quiz display=simple>
 {
 |type="()"}
-+ graphics processing unit
+- It's a flag you can set to automatically parallelize any function.
+|| Unfortunately, life is not that easy.
++ It's the part of your code that is run on the GPU.
 || Correct
--  grand powerful unit
+- It's a new CUDA function that activates the GPU.
 || Wrong
 </quiz>
@@ Line 40: / Line 49: @@
 {{hidden begin
-|title = 3. Why do we expect an onverhead in the GPU timings?}}
+|title = 3. How do you flag a function to be a kernel?}}
+<quiz display=simple>
+{
+|type="()"}
+- __host__
+|| Wrong. This specifies a function that runs on the CPU.
+- __device__
+|| Wrong. This indeed does specify a function that runs on the GPU, but it also needs to be called from the GPU, while we want a kernel to be launched by the CPU.
++ __global__
+|| Correct
+- __GPU__
+|| Wrong. This modifier doesn't exist.
+</quiz>
+{{hidden end}}
+{{hidden begin
+|title = 4. Let's say you coded your kernel function called "MyKernel". How do you run it?}}
+<quiz display=simple>
+{
+|type="()"}
+- MyKernel();
+|| Wrong. This would just execute an ordinary function.
+- CUDA.run(NoBlocks, NoThreads, MyKernel());
+|| Wrong. There is no CUDA.run()
++ <<<NoBlocks, NoThreads>>>MyKernel();
+|| Correct
+- __global(NoBlocks, NoThreads)__ MyKernel();
+|| Wrong. __global__ and other modifiers cant have arguments and are part of a function definition, not launch.
+</quiz>
+{{hidden end}}
+{{hidden begin
+|title = 5. Inside your kernel function, how do you distribute your data over the GPU threads?}}
 <quiz display=simple>
 {
 |type="()"}
-- The data must be copied to an extra device first and has to be transferred back later
+- You don't have to, CUDA does that automatically for you.
-|| Correct, but his is not the whole answer.
+|| Wrong
-- A GPU core is "weaker" than a CPU core
++ Each thread has has an index attached to it, which is addressed via threadIdx.x
-|| Correct, but his is not the whole answer.
+|| Correct
-- For "small" problems like the SAXPY, the whole power of a GPU is rarely used
+- If you use array-element-wise operations, e.g.: y.=a.*x.+b . This is managed by the NVIDIA preprocessor.
-|| Correct, but his is not the whole answer.
+|| Wrong. There are no element-wise operators in C/C++
-+ All of the above
+- You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b
-|| Correct!
+|| Wrong. These modifiers are used at function definitions.
 </quiz>
 {{hidden end}}

	new functions
	new syntax
	GPU support
	All of the above

	__host__
	__device__
	__global__
	__GPU__

	MyKernel();
	CUDA.run(NoBlocks, NoThreads, MyKernel());
	<<<NoBlocks, NoThreads>>>MyKernel();
	__global(NoBlocks, NoThreads)__ MyKernel();

	You don't have to, CUDA does that automatically for you.
	Each thread has has an index attached to it, which is addressed via threadIdx.x
	If you use array-element-wise operations, e.g.: y.=a.*x.+b . This is managed by the NVIDIA preprocessor.
	You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b

Difference between revisions of "GPU Tutorial/SAXPY CUDA C"

Latest revision as of 12:17, 3 January 2022

Contents

Video

Quiz

Navigation menu

Search

	It's a flag you can set to automatically parallelize any function.
	It's the part of your code that is run on the GPU.
	It's a new CUDA function that activates the GPU.