GPU Computing (CUDA C)

From HPC Wiki
GPU Tutorial/SAXPY CUDA C /
Jump to navigation Jump to search

Tutorial
Title: Introduction to GPU Computing
Provider: HPC.NRW

Contact: tutorials@hpc.nrw
Type: Multi-part video
Topic Area: GPU computing
License: CC-BY-SA
Syllabus

1. Introduction
2. Several Ways to SAXPY: CUDA C/C++
3. Several Ways to SAXPY: OpenMP
4. Several Ways to SAXPY: Julia
5. Several Ways to SAXPY: NUMBA

This video discusses the SAXPY via NVIDIA CUDA C/C++.

Video

(Slides as pdf)


Quiz

1. Which features does CUDA add to C/C++?

new functions
new syntax
GPU support
All of the above


2. What is a kernel?

It's a flag you can set to automatically parallelize any function.
It's the part of your code that is run on the GPU.
It's a new CUDA function that activates the GPU.


3. How do you flag a function to be a kernel?

__host__
__device__
__global__
__GPU__

4. Let's say you coded your kernel function called "MyKernel". How do you run it?

MyKernel();
CUDA.run(NoBlocks, NoThreads, MyKernel());
<<<NoBlocks, NoThreads>>>MyKernel();
__global(NoBlocks, NoThreads)__ MyKernel();


5. Inside your kernel function, how do you distribute your data over the GPU threads?

You don't have to, CUDA does that automatically for you.
Each thread has has an index attached to it, which is addressed via threadIdx.x
If you use array-element-wise operations, e.g.: y.=a.*x.+b . this is managed by the NVIDIA preprocessor.
You flag a line to be parallelized via keywords, e.g.: __device__ y=a*x+b



Introduction Quiz

1. For which kind of program can we expect improvements with GPUs?

serial programs
parallel programs


2. What does GPU stands for?

graphics processing unit
grand powerful unit


3. Why do we expect an overhead in the GPU timings?

The data must be copied to an extra device first and has to be transferred back later
A GPU core is "weaker" than a CPU core
For "small" problems like the SAXPY, the whole power of a GPU is rarely used
All of the above