Jobscript

From HPC Wiki
Jump to navigation Jump to search

General

A jobscript can be used to submit the job you wish to execute to a batch system. It is very similar to a sh-file and generally uses the same format, but is more powerful. Besides shell commands, you can put the so called magic cookie, e. g. #BSUB for LSF systems. This allows you to specify a lot of parameters, e. g. the time and memory your application requires or - if your code runs in parallel - the number of compute slots to employ.


Structure

Like a regular sh-file, your jobscript should start with a shebang (#!), e. g. in case you are using a z-shell:

#!/usr/bin/env zsh

Usually, this first line is followed by several directives using magic cookies, that are explained in more depth in the next section. The third part of a jobscript consists of shell commands, for example, to change to your working directory and to execute your application.


Magic Cookies

The magic cookie differs from scheduler to scheduler. Click here to figure out which one you're going to need. Depending on your batch system, these pages provide more information on how to use magic cookies in your jobscript: LSF, SLURM, Torque.

The big advantage of jobscripts is that the parameters that are prefixed with magic cookies are treated just like command line arguments. By setting them inside your jobscript already, it's easier to adjust them or look them up later.


Job Submission

Depending on your scheduler, proceed to one of the links below to find out how to submit your job to a batch system that controls the resources for computation: LSF, SLURM, Torque.

Scheduler Shell command
LSF $ bsub < jobscript.sh
SLURM $ sbatch jobscript.sh
Torque $ qsub jobscript.sh

Note that all incoming jobs (defined in a jobscript) are added to a queue. When to run a job, is decided by the scheduler. The waiting time depends on various factors, e. g. the time and memory you asked for in your jobscript. The rule of thumb is: the more resources your job needs, the longer it will be queued.

You can always check the current status of your submitted jobs and their ids with shell commands that are also explained on the pages linked above.

Scheduler Shell command Output
LSF $ bjobs Your job's status is either "pend" or "run". Once your jobs have finished, the command will print "No unfinished jobs found".
SLURM $ squeue -u <user_id> Your job's status is either "pend" (P) or "run" (R). It also prints the time (hours:min:sec) that your job has been running for.
Torque $ qstat -u <user_id> The most common states are "run", "queue" (Que) or "on hold" (Hold). The command shows the elapsed time since your job has started running and the time limit.

In order to remove a job that you submitted, you can type one of these commands:

Scheduler Shell command
LSF $ bkill <job_id>
SLURM $ scancel <job_id>
Torque $ qdel <job_id>

References

Jobscript examples