Difference between revisions of "GPU Tutorial/SAXPY CUDA C"

Latest revision as of 11:17, 3 January 2022

Tutorial
Title:	Introduction to GPU Computing
Provider:	HPC.NRW
Contact:	tutorials@hpc.nrw
Type:	Multi-part video
Topic Area:	GPU computing
License:	CC-BY-SA
Syllabus
1. Introduction
2. Several Ways to SAXPY: CUDA C/C++
3. Several Ways to SAXPY: OpenMP
4. Several Ways to SAXPY: Julia
5. Several Ways to SAXPY: NUMBA

This video discusses the SAXPY via NVIDIA CUDA C/C++. CUDA is an application programming interface (API) for NVIDIA GPUs. In general, CUDA works with many programming languages, but this tutorial is going to focus on C/C++. CUDA gives access to a GPUs instruction set, which means we have to go through everything step-by-step, since many things do not happen automatically.

Video

(Slides as pdf)

Quiz

1. Which features does CUDA add to C/C++?

2. What is a kernel?

3. How do you flag a function to be a kernel?

4. Let's say you coded your kernel function called "MyKernel". How do you run it?

5. Inside your kernel function, how do you distribute your data over the GPU threads?

@@ Line 5: / Line 5: @@
 This video discusses the SAXPY via NVIDIA CUDA C/C++.
+CUDA is an application programming interface (API) for NVIDIA GPUs. In general, CUDA works with many programming languages, but this tutorial is going to focus on C/C++. CUDA gives access to a GPUs instruction set, which means we have to go through everything step-by-step, since many things do not happen automatically.
 === Video === <!--T:5-->
@@ Line 20: / Line 21: @@
 {
 |type="()"}
-- functions
+- new functions
 || CUDA does not only add new functions, but all of these features.
-- syntax
+- new syntax
 || CUDA does not only add new syntax, but all of these features.
 - GPU support
@@ Line 53: / Line 54: @@
 |type="()"}
 - __host__
-|| Wrong
+|| Wrong. This specifies a function that runs on the CPU.
 - __device__
-|| Wrong
+|| Wrong. This indeed does specify a function that runs on the GPU, but it also needs to be called from the GPU, while we want a kernel to be launched by the CPU.
 + __global__
 || Correct
 - __GPU__
-|| Wrong
+|| Wrong. This modifier doesn't exist.
 </quiz>
 {{hidden end}}
@@ Line 68: / Line 69: @@
 {
 |type="()"}
-- MyKernel()
+- MyKernel();
-|| Wrong
+|| Wrong. This would just execute an ordinary function.
-- CUDA.run(NoBlocks, NoThreads, MyKernel())
+- CUDA.run(NoBlocks, NoThreads, MyKernel());
-|| Wrong
+|| Wrong. There is no CUDA.run()
-+ <<<NoBlocks, NoThreads>>>MyKernel()
++ <<<NoBlocks, NoThreads>>>MyKernel();
 || Correct
-- __global(NoBlocks, NoThreads)__ MyKernel()
+- __global(NoBlocks, NoThreads)__ MyKernel();
-|| Wrong
+|| Wrong. __global__ and other modifiers cant have arguments and are part of a function definition, not launch.
 </quiz>
 {{hidden end}}
@@ Line 81: / Line 82: @@
 {{hidden begin
-|title = 5. Inside your kernel function, how do you distribute your data over the threads?}}
+|title = 5. Inside your kernel function, how do you distribute your data over the GPU threads?}}
 <quiz display=simple>
 {
@@ Line 89: / Line 90: @@
 + Each thread has has an index attached to it, which is addressed via threadIdx.x
 || Correct
-- If you use array-element-wise operations (like y.=a.*x.+b ), this is managed by the NVIDIA preprocessor.
+- If you use array-element-wise operations, e.g.: y.=a.*x.+b . This is managed by the NVIDIA preprocessor.
-|| Wrong
+|| Wrong. There are no element-wise operators in C/C++
 - You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b
-|| Wrong
+|| Wrong. These modifiers are used at function definitions.
-</quiz>
-{{hidden end}}
-=== Introduction Quiz === <!--T:5-->
-{{hidden begin
-|title = 1. For which kind of program can we expect improvements with GPUs?}}
-<quiz display=simple>
-{
-|type="()"}
-- serial programs
-|| Correct: CPU: optimized for low latency (strong single thread); GPU: optimized for throughput (massive parallelism)
-+ parallel programs
-|| Wrong: CPU: optimized for low latency (strong single thread); GPU: optimized for throughput (massive parallelism)
-</quiz>
-{{hidden end}}
-{{hidden begin
-|title = 2. What does GPU stands for?}}
-<quiz display=simple>
-{
-|type="()"}
-+ graphics processing unit
-|| Correct
--  grand powerful unit
-|| Wrong
-</quiz>
-{{hidden end}}
-{{hidden begin
-|title = 3. Why do we expect an overhead in the GPU timings?}}
-<quiz display=simple>
-{
-|type="()"}
-- The data must be copied to an extra device first and has to be transferred back later
-|| Correct, but his is not the whole answer.
-- A GPU core is "weaker" than a CPU core
-|| Correct, but his is not the whole answer.
-- For "small" problems like the SAXPY, the whole power of a GPU is rarely used
-|| Correct, but his is not the whole answer.
-+ All of the above
-|| Correct!
 </quiz>
 {{hidden end}}

	new functions
	new syntax
	GPU support
	All of the above

	__host__
	__device__
	__global__
	__GPU__

	MyKernel();
	CUDA.run(NoBlocks, NoThreads, MyKernel());
	<<<NoBlocks, NoThreads>>>MyKernel();
	__global(NoBlocks, NoThreads)__ MyKernel();

	You don't have to, CUDA does that automatically for you.
	Each thread has has an index attached to it, which is addressed via threadIdx.x
	If you use array-element-wise operations, e.g.: y.=a.*x.+b . This is managed by the NVIDIA preprocessor.
	You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b

Difference between revisions of "GPU Tutorial/SAXPY CUDA C"

Latest revision as of 11:17, 3 January 2022

Contents

Video

Quiz

Navigation menu

Search

	It's a flag you can set to automatically parallelize any function.
	It's the part of your code that is run on the GPU.
	It's a new CUDA function that activates the GPU.