Difference between revisions of "GPU Tutorial/SAXPY CUDA C"

Latest revision as of 12:17, 3 January 2022

Tutorial
Title:	Introduction to GPU Computing
Provider:	HPC.NRW
Contact:	tutorials@hpc.nrw
Type:	Multi-part video
Topic Area:	GPU computing
License:	CC-BY-SA
Syllabus
1. Introduction
2. Several Ways to SAXPY: CUDA C/C++
3. Several Ways to SAXPY: OpenMP
4. Several Ways to SAXPY: Julia
5. Several Ways to SAXPY: NUMBA

This video discusses the SAXPY via NVIDIA CUDA C/C++. CUDA is an application programming interface (API) for NVIDIA GPUs. In general, CUDA works with many programming languages, but this tutorial is going to focus on C/C++. CUDA gives access to a GPUs instruction set, which means we have to go through everything step-by-step, since many things do not happen automatically.

Video

(Slides as pdf)

Quiz

1. Which features does CUDA add to C/C++?

2. What is a kernel?

3. How do you flag a function to be a kernel?

4. Let's say you coded your kernel function called "MyKernel". How do you run it?

5. Inside your kernel function, how do you distribute your data over the GPU threads?

@@ Line 94: / Line 94: @@
 - You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b
 || Wrong. These modifiers are used at function definitions.
-</quiz>
-{{hidden end}}
-=== Introduction Quiz === <!--T:5-->
-{{hidden begin
-|title = 1. For which kind of program can we expect improvements with GPUs?}}
-<quiz display=simple>
-{
-|type="()"}
-- serial programs
-|| Correct: CPU: optimized for low latency (strong single thread); GPU: optimized for throughput (massive parallelism)
-+ parallel programs
-|| Wrong: CPU: optimized for low latency (strong single thread); GPU: optimized for throughput (massive parallelism)
-</quiz>
-{{hidden end}}
-{{hidden begin
-|title = 2. What does GPU stands for?}}
-<quiz display=simple>
-{
-|type="()"}
-+ graphics processing unit
-|| Correct
--  grand powerful unit
-|| Wrong
-</quiz>
-{{hidden end}}
-{{hidden begin
-|title = 3. Why do we expect an overhead in the GPU timings?}}
-<quiz display=simple>
-{
-|type="()"}
-- The data must be copied to an extra device first and has to be transferred back later
-|| Correct, but his is not the whole answer.
-- A GPU core is "weaker" than a CPU core
-|| Correct, but his is not the whole answer.
-- For "small" problems like the SAXPY, the whole power of a GPU is rarely used
-|| Correct, but his is not the whole answer.
-+ All of the above
-|| Correct!
 </quiz>
 {{hidden end}}

	new functions
	new syntax
	GPU support
	All of the above

	__host__
	__device__
	__global__
	__GPU__

	MyKernel();
	CUDA.run(NoBlocks, NoThreads, MyKernel());
	<<<NoBlocks, NoThreads>>>MyKernel();
	__global(NoBlocks, NoThreads)__ MyKernel();

	You don't have to, CUDA does that automatically for you.
	Each thread has has an index attached to it, which is addressed via threadIdx.x
	If you use array-element-wise operations, e.g.: y.=a.*x.+b . This is managed by the NVIDIA preprocessor.
	You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b

Difference between revisions of "GPU Tutorial/SAXPY CUDA C"

Latest revision as of 12:17, 3 January 2022

Contents

Video

Quiz

Navigation menu

Search

	It's a flag you can set to automatically parallelize any function.
	It's the part of your code that is run on the GPU.
	It's a new CUDA function that activates the GPU.