Difference between revisions of "Batch-Scheduler"

From HPC Wiki
Jump to: navigation, search
m
Line 1: Line 1:
This page gives an overview of what a Batch-Scheduler can do and what pitfalls may exist. A more general description of why Batch-Schedulers are needed can be found [[Scheduling_Basics|here]]. There are different Schedulers around, e.g. [[SLURM]], [[LSF]] and [[Torque]]. Click [[Schedulers|here]] to figure out which one you need.
+
This page gives an overview of how to use a Batch-Scheduler and what pitfalls may exist. A more general description of why Batch-Schedulers are needed can be found [[Scheduling_Basics|here]]. There are different Schedulers around, e.g. [[SLURM]], [[LSF]] and [[Torque]]. Click [[Schedulers|here]] to figure out which one you need.
  
 
__TOC__
 
__TOC__
  
 
== Usage ==
 
== Usage ==
If you want to execute a program in the batch system, you need to submit a [[Jobscript]] you have written for the Scheduler used on the batch system. In this Jobscript, the Scheduler needs to learn:
+
If you want to execute a program in the batch system, you need to submit a [[Jobscript|jobscript]] tailored for the used scheduler. With the help of this script the scheduler needs to learn:
* how many resources your program needs (e.g. time and memory)
+
* How many resources your program needs (e.g. time and memory)
* how you want to parallelize your program
+
* Which [[Parallel_Programming|parallelization]] you are using for your program
There are in general four types of parallelization:
+
While the specifics of how to provide this information depends on the used scheduler, some general rules apply to most of them. How to apply these rules should be answered in the [[Jobscript-examples|example scripts]] (or referenced there). Not all pitfalls apply to all batch systems.
* serial (no parallelization)
+
 
* shared memory only (e.g. OpenMP)
+
== Pitfalls ==
* distributed memory only (e.g. MPI)
+
There are some general problems one needs to keep in mind:
* hybrid parallelisation
+
* If you request more resources than the hardware can offer, the scheduler might not reject the job (so that the job will be stuck in the queue forever).
 +
* Be careful about whether the memory limit is per process or in total.  
 +
* The scheduler might not support [[Binding/Pinning|pinning]] so you might want to do this manually.
 +
* There might be per-user quotas for the usage of the cluster.
  
 
== Serial Jobs ==
 
== Serial Jobs ==
== Shared Memory Parallelization ==
+
Serial jobs execute programs which do not use any kind of parallelism. Thus, you typically only need to specify the time and memory resources your job needs. However, some batch systems allow exclusive and non-exclusive usage of nodes. Pay attention that you do not block a whole node for a program which just needs one core!
== Distributed Memory Parallelization ==
+
 
== Hybrid Parallelization ==
+
== Shared Memory Jobs ==
 +
Speaking in hardware, shared memory parallelization means that you use multiple cores which are on the same node (and therefore share the memory). This means that you need to tell the scheduler that the requested cores should actually be on the same node. Furthermore, you should synchronize the number of threads spawned with the number of cores you requested (e.g. by explicitly setting the [[OpenMP]] environment variable).
 +
 
 +
== Distributed Memory Jobs ==
 +
This is usually done via [[MPI]] since it handles the correct start-up of the program. Again, pay attention that the MPI library and the resource requests match.
 +
 
 +
== Hybrid Jobs ==
 +
Hybrid parallelization means that you run a job on different nodes (e.g. using [[MPI]]) while using shared memory parallelization (e.g. [[OpenMP]]) on each of them. This means that you need to specify at least the number of nodes as well as that you want to use more than one core per node. Distributing the job across different nodes is usually handled by the scheduler. However, not all schedulers fully support the parallelization on each node. In this case, this has to be done manually.
 +
 
 
== Advanced Usage ==
 
== Advanced Usage ==
Here is stuff about
+
Apart from the aforementioned types of jobs, the scheduler might offer even more types:
*brief mentioning of non-mpi-multi-noding
+
*Jobs across multiple nodes (distributed jobs or hybrid jobs) can also be parallelized without MPI. This goes beyond the scope of this page.
*that you should split long-runners (aka Chain Jobs) and why
+
*Jobs running several days should be split into smaller packages. Among the advantages are reduced queuing times and a higher stability (e.g. against node failure). The splitting can either be done by manually submitting or by using chain jobs.
*a brief mentioning of array jobs
+
*Sometimes it may be necessary to run the same program with different arguments (e.g. determining hyperparameters). In this case an array job may be used.

Revision as of 16:57, 14 November 2018

This page gives an overview of how to use a Batch-Scheduler and what pitfalls may exist. A more general description of why Batch-Schedulers are needed can be found here. There are different Schedulers around, e.g. SLURM, LSF and Torque. Click here to figure out which one you need.

Usage

If you want to execute a program in the batch system, you need to submit a jobscript tailored for the used scheduler. With the help of this script the scheduler needs to learn:

  • How many resources your program needs (e.g. time and memory)
  • Which parallelization you are using for your program

While the specifics of how to provide this information depends on the used scheduler, some general rules apply to most of them. How to apply these rules should be answered in the example scripts (or referenced there). Not all pitfalls apply to all batch systems.

Pitfalls

There are some general problems one needs to keep in mind:

  • If you request more resources than the hardware can offer, the scheduler might not reject the job (so that the job will be stuck in the queue forever).
  • Be careful about whether the memory limit is per process or in total.
  • The scheduler might not support pinning so you might want to do this manually.
  • There might be per-user quotas for the usage of the cluster.

Serial Jobs

Serial jobs execute programs which do not use any kind of parallelism. Thus, you typically only need to specify the time and memory resources your job needs. However, some batch systems allow exclusive and non-exclusive usage of nodes. Pay attention that you do not block a whole node for a program which just needs one core!

Shared Memory Jobs

Speaking in hardware, shared memory parallelization means that you use multiple cores which are on the same node (and therefore share the memory). This means that you need to tell the scheduler that the requested cores should actually be on the same node. Furthermore, you should synchronize the number of threads spawned with the number of cores you requested (e.g. by explicitly setting the OpenMP environment variable).

Distributed Memory Jobs

This is usually done via MPI since it handles the correct start-up of the program. Again, pay attention that the MPI library and the resource requests match.

Hybrid Jobs

Hybrid parallelization means that you run a job on different nodes (e.g. using MPI) while using shared memory parallelization (e.g. OpenMP) on each of them. This means that you need to specify at least the number of nodes as well as that you want to use more than one core per node. Distributing the job across different nodes is usually handled by the scheduler. However, not all schedulers fully support the parallelization on each node. In this case, this has to be done manually.

Advanced Usage

Apart from the aforementioned types of jobs, the scheduler might offer even more types:

  • Jobs across multiple nodes (distributed jobs or hybrid jobs) can also be parallelized without MPI. This goes beyond the scope of this page.
  • Jobs running several days should be split into smaller packages. Among the advantages are reduced queuing times and a higher stability (e.g. against node failure). The splitting can either be done by manually submitting or by using chain jobs.
  • Sometimes it may be necessary to run the same program with different arguments (e.g. determining hyperparameters). In this case an array job may be used.