Large applications are usually not developed and tested using the full problem size and/or number of processor right from the start, as this comes with long waits and a high usage of resources. It is therefore advisable to scale these factors down at first which also enables one to estimate the required resources for the full run more accurately in terms of Resource planning. Scalability testing measures the ability of an application to perform well or better with varying problem sizes and numbers of processors. It does not test the applications general funcionality or correctness.
Strong or Weak Scaling
Applications can generally be divided into strong scaling and weak scaling applications. Please note that the terms strong and weak themselves do not give any information whatsoever on how well an application actually scales.
In case of strong scaling, the number of processors is increased while the problem size remains constant. This also results in a reduced workload per processor.
Strong scaling is mostly used for long-running CPU-bound applications to find a setup which results in a reasonable runtime with moderate resource costs. The individual workload must be kept high enough to keep all processors fully occupied. The speedup achieved by increasing the number of processes usually decreases more or less continuously.
In case of weak scaling, both the number of processors and the problem size are increased. This also results in a constant workload per processor.
Weak scaling is mostly used for large memory-bound applications where the required memory cannot be satisfied by a single node. They usually scale well to higher core counts as memory access strategies often focus on the nearest neighboring nodes while ignoring those further away and therefore scale well themselves. The upscaling is usually restricted only by the available resources or the maximum problem size.
For scalability tests one needs to pay close attention to the testing environment as with increasing problem size and number of cores one is bound to reach the limitations of the system at some point. It is also helpful to test the application on different systems, as various factors must be taken into account when more than one node is used:
- Interconnect speed and latency
- Max memory per node
- processors per node
- max processors (nodes)
- system variables and restrictions (e.g. stacksize)
For applications using MPI the optimization of the MPI settings can also dramatically improve the application performance. MPI applications also require a certain amount of memory for each MPI process, which obbiouvlsy increases with the number of processors and MPI processes used.