Jobscript

From HPC Wiki
Jump to navigation Jump to search

General

A jobscript can be used to submit the job you wish to execute to a batch system. It is very similar to a sh-file and generally uses the same format, but is more powerful. Besides shell commands, you can put the so called magic cookie #BSUB. This allows you to specify a lot of parameters, e. g. the time and memory your application requires or - if your code runs in parallel - the number of compute slots to employ.


#BSUB Usage

To use the magic cookie, start a new line in your script with "#BSUB". Following that, you can put one of the parameters shown below, where the word written in <...> should be replaced with a value.

Basic settings:

Parameter Function
-J <name> job name
-o <path> path to the file where the job output is written
-e <path> path to the file for the job error output

Requesting resources:

Parameter Function Default
-W <runlimit> runtime limit in the format [hour:]minute; once the time specified is up, the job will be killed by the scheduler 00:15
-M <memlimit> memory limit per process in MB 512
-S <stacklimit> limit of stack size per process in MB 10

Parallel programming (read more here):

Parameter Function
-a openmp start a parallel job for a shared-memory system
-n <num_threads> number of threads to execute OpenMP application with
-a openmpi start a parallel job for a distributed-memory system
-np <num_procs> number of processes to execute MPI application with

The big advantage of jobscripts is that the parameters that are prefixed with "#BSUB" are treated just like command line arguments. By setting them inside your jobscript already, it's easier to adjust them or look them up later.


Job Submission

$ bsub < jobscript.sh

Note that, when you submit your job to the batch system as shown above, it may take some time before it leaves the queue and starts running. When to run a job is decided by the scheduler. The waiting time depends on various factors, e. g. the time and memory you asked for in your jobscript. The rule of thumb is: the more resources your job needs (execution time, memory), the longer it will be queued.

You can always check the current status (pend or run) of your submitted jobs and their ids. with the following shell command. Once your jobs have finished, the command will print "No unfinished jobs found."

$ bjobs

In order to remove a job that you submitted, you can type this command:

$ bkill <job_id>