Batch-Scheduler
This page gives an overview of how to use a Batch-Scheduler and what pitfalls may exist. A more general description of why Batch-Schedulers are needed can be found here. There are different Schedulers around, e.g. SLURM, LSF and Torque. Click here to figure out which one you need.
Usage
If you want to execute a program in the batch system, you need to submit a jobscript tailored for the used scheduler. With the help of this script the scheduler needs to learn:
- How many resources your program needs (e.g. time and memory)
- Which parallelization you are using for your program
While the specifics of how to provide this information depends on the used scheduler, some general rules apply to most of them. How to apply these rules should be answered in the example scripts (or referenced there). Not all pitfalls apply to all batch systems.
Pitfalls
There are some general problems one needs to keep in mind:
- If you request more resources than the hardware can offer, the scheduler might not reject the job (so that the job will be stuck in the queue forever).
- Be careful about whether the memory limit is per process or in total.
- The scheduler might not support pinning so you might want to do this manually.
- There might be per-user quotas for the usage of the cluster.
Serial Jobs
Serial jobs execute programs which do not use any kind of parallelism. Thus, you typically only need to specify the time and memory resources your job needs. However, some batch systems allow exclusive and non-exclusive usage of nodes. Pay attention that you do not block a whole node for a program which just needs one core!
Speaking in hardware, shared memory parallelization means that you use multiple cores which are on the same node (and therefore share the memory). This means that you need to tell the scheduler that the requested cores should actually be on the same node. Furthermore, you should synchronize the number of threads spawned with the number of cores you requested (e.g. by explicitly setting the OpenMP environment variable).
Distributed Memory Jobs
This is usually done via MPI since it handles the correct start-up of the program. Again, pay attention that the MPI library and the resource requests match.
Hybrid Jobs
Hybrid parallelization means that you run a job on different nodes (e.g. using MPI) while using shared memory parallelization (e.g. OpenMP) on each of them. This means that you need to specify at least the number of nodes as well as that you want to use more than one core per node. Distributing the job across different nodes is usually handled by the scheduler. However, not all schedulers fully support the parallelization on each node. In this case, this has to be done manually.
Advanced Usage
Apart from the aforementioned types of jobs, the scheduler might offer even more types:
- Jobs across multiple nodes (distributed jobs or hybrid jobs) can also be parallelized without MPI. This goes beyond the scope of this page.
- Jobs running several days should be split into smaller packages. Among the advantages are reduced queuing times and a higher stability (e.g. against node failure). The splitting can either be done by manually submitting or by using chain jobs.
- Sometimes it may be necessary to run the same program with different arguments (e.g. determining hyperparameters). In this case an array job may be used.