If you are writing a jobscript for a SLURM batch system, the magic cookie is "#SBATCH". To use it, start a new line in your script with "#SBATCH". Following that, you can put one of the parameters shown below, where the word written in <...> should be replaced with a value.
|--output=<path>||path to the file where the job (error) output is written to|
|--time=<runlimit>||runtime limit in the format hours:min:sec; once the time specified is up, the job will be killed by the scheduler|
|--mem=<memlimit>||job memory request per node, usually an integer followed by a prefix for the unit (e. g. --mem=1G for 1 GB)|
Parallel programming (read more here):
Settings for OpenMP:
|--nodes=1||start a parallel job for a shared-memory system on only one node|
|--cpus-per-task=<num_threads>||number of threads to execute OpenMP application with|
|--ntasks-per-core=<num_hyperthreads>||number of hyperthreads per core; i. e. any value greater than 1 will turn on hyperthreading (the possible maximum depends on your CPU)|
|--ntasks-per-node=1||for OpenMP, use one task per node only|
Settings for MPI:
|--nodes=<num_nodes>||start a parallel job for a distributed-memory system on several nodes|
|--cpus-per-task=1||for MPI, use one task per CPU|
|--ntasks-per-node=<num_procs>||number of processes per node (the possible maximum depends on your nodes)|
|--mail-type=<type>||type can be one of BEGIN, END, FAIL, REQUEUE or ALL (where a mail will be sent each time the status of your process changes)|
|--mail-user=<email_address>||email address to send notifications to|
A more complete List of sbatch Stettings can be found in the Official SBATCH documentation.
This command submits the job you defined in your jobscript to the batch system:
$ sbatch jobscript.sh
Just like any other incoming job, your job will first be queued. Then, the scheduler decides when your job will be run. The more resources your job requires, the longer it may be waiting to execute.
You can check the current status of your submitted jobs and their job ids with the following shell command. A job can either be pending
PD (waiting for free nodes to run on) or running
R (the jobscript is currently being executed). This command will also print the time (hours:min:sec) that your job has been running for.
$ squeue -u <user_id>
Please add the parameter
--start to the
squeue command in order to report the expected start time and resources to be allocated for pending jobs. Please note that this start time is not guaranteed and might be changed due to high priority jobs or job backfilling.
In case you submitted a job on accident or realised that your job might not be running correctly, you can always remove it from the queue or terminate it when running by typing:
$ scancel <job_id>
Furthermore, Information about current and past jobs can be accessed via:
with more detailed information at the Slurm documentation of this command
Array and Chain Jobs
This creates an array job with *2* subjobs (numbered 1..4 with step of 2) where only *one* may be executed at a time in a random order. An explicit order can be forced by either submitting each subjob at the end of the one before (which may prolong pending) or using the dependencies feature, which results in a chain job.
The available conditions for chain jobs are
|after:<jobID>||job can start once job <jobID> has started execution|
|afterany:<jobID>||job can start once job <jobID> has terminated|
|afterok:<jobID>||job can start once job <jobID> has terminated successfully|
|afternotok:<jobID>||job can start once job <jobID> has terminated upon failure|
|singleton||job can start once any previous job with identical name and user has terminated|
This serial job will run a given executable, in this case "myapp.exe".
#!/bin/bash ### Job name #SBATCH --job-name=MYJOB ### File for the output #SBATCH --output=MYJOB_OUTPUT ### Time your job needs to execute, e. g. 15 min 30 sec #SBATCH --time=00:15:30 ### Memory your job needs per node, e. g. 1 GB #SBATCH --mem=1G ### The last part consists of regular shell commands: ### Change to working directory cd /home/usr/workingdirectory ### Execute your application myapp.exe
If you'd like to run a parallel job on a cluster that is managed by SLURM, you have to clarify that. Therefore, use the command "srun <my_executable>" in your jobscript.
This OpenMP job will start the parallel program "myapp.exe" with 24 threads.
#!/bin/bash ### Job name #SBATCH --job-name=OMPJOB ### File for the output #SBATCH --output=OMPJOB_OUTPUT ### Time your job needs to execute, e. g. 30 min #SBATCH --time=00:30:00 ### Memory your job needs per node, e. g. 500 MB #SBATCH --mem=500M ### Number of threads to use, e. g. 24 #SBATCH --cpus-per-task=24 ### The last part consists of regular shell commands: ### Set the number of threads in your cluster environment to the value specified above export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ### Change to working directory cd /home/usr/workingdirectory ### Run your parallel application srun myapp.exe
This MPI job will start the parallel program "myapp.exe" with 12 processes.
#!/bin/bash ### Job name #SBATCH --job-name=MPIJOB ### File for the output #SBATCH --output=MPIJOB_OUTPUT ### Time your job needs to execute, e. g. 50 min #SBATCH --time=00:50:00 ### Memory your job needs per node, e. g. 250 MB #SBATCH --mem=250M ### Use more than one node for parallel jobs on distributed-memory systems, e. g. 2 #SBATCH --nodes=2 ### Number of CPUS per task (for distributed-memory parallelisation, use 1) #SBATCH --cpus-per-task=1 ### Disable hyperthreading by setting the tasks per core to 1 #SBATCH --ntasks-per-core=1 ### Number of processes per node, e. g. 6 (6 processes on 2 nodes = 12 processes in total) #SBATCH --ntasks-per-node=6 ### The last part consists of regular shell commands: ### Set the number of threads in your cluster environment to 1, as specified above export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ### Change to working directory cd /home/usr/workingdirectory ### Run your parallel application srun myapp.exe
Please find more elaborate SLURM job scripts for running a hybrid MPI+OpenMP program in a batch job and for running multiple shared-memory / OpenMP programs at a time in one batch job.
Site specific notes
--output=should not be used on RRZE's clusters; the submit filter already sets suitable defaults automatically
--mem=<memlimit>must not be used on RRZE's clusters
- the first line of the job script should be
modulecommands won't work in te job script
- to have a clean environment in job scripts, it is recommended to add
unset SLURM_EXPORT_ENVto the job script. Otherwise, the job will inherit some settings from the submitting shell.
- access to the parallel file system has to be specified by
#SBATCH ---constraint=parfsor the command line shortcut
- access to hardware performance counters (e.g. to be able to use
likwid-perfctr) has to be requested by
#SBATCH ---constraint=hwperfor the command line shortcut
-C hwperf. Only request that feature if you really want to access the hardware performance counters as the feature interferes with the automatic system monitoring.
- multiple features have to be requested in a single
--constraint=statement, listing all required features separated by ampersand, e.g.
- for Intel MPI, RRZE recommends the usage of
srunshall be used, the additional command line argument
--mpi=pmi2is required. The command line option
mpirunonly works if you
-u userdoes not have any effect as you always only see your own jobs
--mem=<memlimit>must not be used on RWTH's clusters
- OMP_NUM_THREADS envvar must not be set/overwritten on RWTH's clusters in OpenMP and Hybrid jobs; this envvar is set by the system automatically.
- access to hardware performance counters in order to use
likwid-perfctror Intel VTune is available using the
- in order to start MPI or Hybrid application please use $MPIEXEC $FLAGS_MPI_BATCH ./a.out instead of srun sommand; these envvars are set accorfingly to used MPI vendor by the module system.
- the shebang of batch script must be
#!/usr/local_rwth/bin/zsh(otherwise the modules are not accesssible)