Difference between revisions of "Benchmarking & Scaling Tutorial/Introduction"

Revision as of 12:40, 24 September 2021

Tutorial
Title:	Benchmarking & Scaling
Provider:	HPC.NRW
Contact:	tutorials@hpc.nrw
Type:	Online
Topic Area:	Performance Analysis
License:	CC-BY-SA
Syllabus
1. Introduction & Theory
2. Interactive Manual Benchmarking
3. Automated Benchmarking using a Job Script
4. Automated Benchmarking using JUBE
5. Plotting & Interpreting Results

Scalability

Often users who start running applications on an HPC system tend to assume the more resources (compute nodes / cores) they use, the faster their code will run (i.e. they expect a linear behaviour). Unfortunately this is not the case for the majority of applications. How fast a program runs with different amounts of resources is referred to as scalability. For parallel programs a limiting factor is defined by Amdahl's law. It takes into account the fact, that a certain amount of work of your code is done in parallel but the speedup is ultimately limited by the sequential part of the program.

Strong Scaling

Amdahl's Law

We assume that the total execution time $T$ of a program is comprised of

$t_{s}$ , a part of the code which can only run in serial
$t_{p}$ , a part of the code which can be parallelized
$t_{o}$ , parallel overheads due to, e.g. communication

The execution time of a serial code would then be

 $T(1)=t_{s}+t_{p}$

The time for a parallel code, where the work would be perfectly divided by $P$ processors, would be given by

 $T(P)=t_{s}+{\frac {t_{p}}{P}}+t_{o}(P)$

$t_{p}/P$ is the speed up amount of time due to the usage of multiple CPUs. The total speedup $S$ is defined as the ratio of the sequential to the parallel runtime:

 $S={\frac {T(1)}{T(P)}}$

The efficiency is the speedup per processor, i.e.

 $E={\frac {S}{P}}={\frac {T(1)}{P\cdot T(P)}}$

Speedup and Efficiency

Knowing that $t_{o}>0$ , and writing $F={\frac {t_{s}}{t_{s}+t_{p}}}$ as the fraction of the serial code, we can rewrite this to

 $S\leq {\frac {1}{F+(1-F)/P}}$

 $E\leq {\frac {1}{PF+(1-F)}}$

This places an upper limit on the strong scalability i.e. how quickly can we solve a problem of fixed size $N$ by increasing $P$ . It is known as Amdahl's Law.

Consequences

The speedup is never linear in P, therefore the efficiency is never 100%
Examples:
- For P=2 processors, to achieve E=0.9, you have to parallelize 89% of the code
- For P=10 processors, to achieve E=0.9, you have to parallelize 99% of the code
- For P=10 processors, to achieve E=0.5, you have to parallelize 89% of the code

Weak Scaling

Gustafson's Law

Speedup and Efficiency

Consequences

Knowing about speedup and efficiency we can now try to measure this ourselves.

Next: Interactive Manual Benchmarking

Previous: Overview

@@ Line 1: / Line 1: @@
 {{DISPLAYTITLE:Introduction & Theory}}<nowiki />
 {{Syllabus Benchmarking & Scaling}}<nowiki />
 __TOC__
 == Scalability ==
@@ Line 7: / Line 8: @@
-== Speedup and Efficiency ==
+== Strong Scaling ==
+=== Amdahl's Law ===
 We assume that the total execution time <math>T</math> of a program is comprised of
 * <math>t_s</math>, a part of the code which can only run in serial
@@ Line 39: / Line 42: @@
-=== Amdahl's Law ===
+==== Speedup and Efficiency ====
 Knowing that <math>t_o > 0</math>, and writing <math>F = \frac{t_s}{t_s + t_p}</math> as the fraction of the serial code, we can rewrite this to
@@ Line 65: / Line 68: @@
 [[File:amdahl.png|600px]]
 [[File:amdahl_overhead.png|600px]]
+== Weak Scaling ==
+=== Gustafson's Law ===
+==== Speedup and Efficiency ====
+=== Consequences ===
 Knowing about speedup and efficiency we can now try to measure this ourselves.

Difference between revisions of "Benchmarking & Scaling Tutorial/Introduction"

Revision as of 12:40, 24 September 2021

Contents

Scalability

Strong Scaling

Amdahl's Law

Speedup and Efficiency

Consequences

Weak Scaling

Gustafson's Law

Speedup and Efficiency

Consequences

Navigation menu

Search