Scaling

From HPC Wiki
Jump to navigation Jump to search

In the most general sense, scalability is defined as the ability to handle more work as the size of the computer or application grows. scalability or scaling is widely used to indicate the ability of hardware and software to deliver greater computational power when the amount of resources is increased. For HPC clusters, it is important that they are scalable, in other words the capacity of the whole system can be proportionally increased by adding more hardware. For software, scalability is sometimes referred to as parallelization efficiency — the ratio between the actual speedup and the ideal speedup obtained when using a certain number of processors. For this tutorial, we focus on software scalability and discuss two common types of scaling. The speedup in parallel computing can be straightforwardly defined as


Speedup

where t1 is the computational time for running the software using one processor, and tN is the computational time running the same software with N processors. Ideally, we would like software to have a linear speedup that is equal to the number of processors (speedup = N), as that would mean that every processor would be contributing 100% of its computational power. Unfortunately, this is a very challenging goal for real world applications to attain.


Scaling tests

As we have already indicated, the primary challenge of parallel computing is deciding how best to break up a problem into individual pieces that can each be computed separately. Large applications are usually not developed and tested using the full problem size and/or number of processor right from the start, as this comes with long waits and a high usage of resources. It is therefore advisable to scale these factors down at first which also enables one to estimate the required resources for the full run more accurately in terms of Resource planning . Scalability testing measures the ability of an application to perform well or better with varying problem sizes and numbers of processors. It does not test the applications general funcionality or correctness.

Strong or Weak Scaling

Applications can generally be divided into strong scaling and weak scaling applications. Please note that the terms strong and weak themselves do not give any information whatsoever on how well an application actually scales. We restate the definitions mentioned in Scaling tests of both strong/weak scaling and elaborate more details for calculating the efficiency and speedup for them below.

Strong Scaling

In case of strong scaling, the number of processors is increased while the problem size remains constant. This also results in a reduced workload per processor. Strong scaling is mostly used for long-running CPU-bound applications to find a setup which results in a reasonable runtime with moderate resource costs. The individual workload must be kept high enough to keep all processors fully occupied. The speedup achieved by increasing the number of processes usually decreases more or less continuously.

In an idealworld a problem would scale in a linear fashion, that is, the program would speed up by a factor of N when it runs on a machine having N nodes. (Of course, as N→ ∞ the proportionality cannot hold because communication time must then dominate. Clearly then, the goal when solving a problem that scales strongly is to decrease the amount of time it takes to solve the problem by using a more powerful computer. These are typically CPU-bound problems and are the hardest ones to yield something close to a linear speedup.

Amdahl’s law and strong scaling

In 1967, Amdahl pointed out that the speedup is limited by the fraction of the serial part of the software that is not amenable to parallelization. Amdahl’s law can be formulated as follows

                                                                               Speedup = 1 / (s + p / N)

where s is the proportion of execution time spent on the serial part, p is the proportion of execution time spent on the part that can be parallelized, and N is the number of processors. Amdahl’s law states that, for a fixed problem, the upper limit of speedup is determined by the serial fraction of the code. This is called strong scaling. In this case the problem size stays fixed but the number of processing elements are increased. This is used as justification for programs that take a long time to run (something that is cpu-bound). The goal in this case is to find a "sweet spot" that allows the computation to complete in a reasonable amount of time, yet does not waste too many cycles due to parallel overhead. In strong scaling, a program is considered to scale linearly if the speedup (in terms of work units completed per unit time) is equal to the number of processing elements used ( N ). In general, it is harder to achieve good strong-scaling at larger process counts since the communication overhead for many/most algorithms increases in proportion to the number of processes used.