Difference between revisions of "LSF"
Line 1: | Line 1: | ||
== General == | == General == | ||
− | LSF is a type of [[Scheduler|scheduler]]. | + | LSF is a type of [[Scheduler|scheduler]]. The abbreviation stands for "Platform Load Sharing Facility", which is used to monitor and control the workload of the batch system of a supercomputer. This system targets applications that utilize a lot of resources and it cannot be directly accessed by the user, as opposed to the [[Nodes#Log-in|login-nodes]]. |
Revision as of 12:18, 29 March 2018
General
LSF is a type of scheduler. The abbreviation stands for "Platform Load Sharing Facility", which is used to monitor and control the workload of the batch system of a supercomputer. This system targets applications that utilize a lot of resources and it cannot be directly accessed by the user, as opposed to the login-nodes.
#BSUB Usage
If you are writing a jobscript for an LSF batch system, the magic cookie is "#BSUB". To use it, start a new line in your script with "#BSUB". Following that, you can put one of the parameters shown below, where the word written in <...> should be replaced with a value.
Basic settings:
Parameter | Function |
-J <name> | job name |
-o <path> | path to the file where the job output is written |
-e <path> | path to the file for the job error output (if not set, it will be written to output file as well) |
Requesting resources:
Parameter | Function | Default |
-W <runlimit> | runtime limit in the format [hour:]minute; once the time specified is up, the job will be killed by the scheduler | 00:15 |
-M <memlimit> | memory limit per process in MB | 512 |
-S <stacklimit> | limit of stack size per process in MB | 10 |
Parallel programming (read more here):
Parameter | Function |
-a openmp | start a parallel job for a shared-memory system |
-n <num_threads> | number of threads to execute OpenMP application with |
-a openmpi | start a parallel job for a distributed-memory system |
-n <num_procs> | number of processes to execute MPI application with |
Email notifications:
Parameter | Function |
-B | send email to the job submitter when the job starts running |
-N | send email to the job submitter when the job has finished |
-u <email_address> | recipient of mails |
Job Submission
This command submits the job you defined in your jobscript to the batch system. If the arrow "<" is left out, your job will be submitted, but all the resource requests in your jobscript will be ignored.
$ bsub < jobscript.sh
Just like any other incoming job, your job will first be queued. Then, the scheduler decides when your job will be run. The more resources your job requires, the longer it may be waiting to execute.
You can check the current status of your submitted jobs and their job ids with the following shell command. A job can either "pend" or "run". If all of your jobs are done, the command will print "No unfinished jobs found".
$ bjobs
In case you submitted a job on accident or realised that your job might not be running correctly, you can always remove it from the queue or terminate it when running by typing:
$ bkill <job_id>
Jobscript Examples
This serial job will run a given executable, in this case "myapp.exe".
#!/usr/bin/env zsh
### Job name
#BSUB -J MYJOB
### File where the output should be written
#BSUB -o MYJOB_OUTPUT.txt
### Time your job needs to execute, e. g. 1 h 20 min
#BSUB -W 1:20
### Memory your job needs, e. g. 1000 MB
#BSUB -M 1000
### Stack limit per process, e. g. 20 MB
#BSUB -S 20
### The last part consists of regular shell commands:
### Change to working directory
cd /home/user/mywork
### Execute your application
myapp.exe
This OpenMP job will start the parallel program "myapp.exe" with 24 threads.
#!/usr/bin/env zsh
### Job name
#BSUB -J OMPJOB
### File where the output should be written
#BSUB -o OMPJOB_OUTPUT
### Time your job needs to execute, e. g. 15 min
#BSUB -W 0:15
### Memory your job needs, e. g. 1000 MB
#BSUB -M 1000
### Stack limit per process, e. g. 50 MB
#BSUB -S 50
### Request 24 compute slots (in this case: threads)
#BSUB -n 24
### Execute as shared-memory job
#BSUB -a openmp
### Change to working directory
cd /home/user/mywork
### Execute your application
myapp.exe
This OpenMPI job will start the parallel program "myapp.exe" with 4 processes.
#!/usr/bin/env zsh
### Job name
#BSUB -J MPIJOB
### File where the output should be written
#BSUB -o MPIJOB_OUTPUT
### Time your job needs to execute, e. g. 30 min
#BSUB -W 0:30
### Memory your job needs, e. g. 1024 MB
#BSUB -M 1024
### Stack limit per process, e. g. 50 MB
#BSUB -S 50
### Request 4 compute slots (in this case: processes)
#BSUB -n 4
### Execute as distributed-memory job with OpenMPI
#BSUB -a openmpi
### Change to working directory
cd /home/user/mywork
### Execute your application
myapp.exe