Difference between revisions of "LSF"

From HPC Wiki
Jump to navigation Jump to search
m
 
(24 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 +
[[Category:HPC-User]]
 +
== General ==
 +
 +
LSF (Platform Load Sharing Facility) is a [[Batch-Scheduler]]. To get an overview of the functionality of a scheduler, go [[Scheduler#General|here]] or to the [[Scheduling_Basics|Scheduling Basics]].
 +
 +
 +
__TOC__
 +
 +
 
== #BSUB Usage ==
 
== #BSUB Usage ==
  
Line 46: Line 55:
 
| -N || send email to the job submitter when the job has finished
 
| -N || send email to the job submitter when the job has finished
 
|-
 
|-
| -u <email_address> || recipient of mails
+
| -u <email_address> || recipient of emails
 
|}
 
|}
 
  
 
== Job Submission ==
 
== Job Submission ==
  
This command submits the job you defined in your [[Jobscript|jobscript]] to the batch system. If the arrow "<" is left out, your job will be submitted, but all the resource requests in your jobscript will be ignored.
+
This command submits the job you defined in your [[Jobscript|jobscript]] to the batch system. If the less-than sign <code><</code> is left out, your job will be submitted, but all the resource requests in your jobscript will be ignored.
  
 
  $ bsub < jobscript.sh
 
  $ bsub < jobscript.sh
Line 58: Line 66:
 
Just like any other incoming job, your job will first be queued. Then, the scheduler decides when your job will be run. The more resources your job requires, the longer it may be waiting to execute.
 
Just like any other incoming job, your job will first be queued. Then, the scheduler decides when your job will be run. The more resources your job requires, the longer it may be waiting to execute.
  
You can check the current status of your submitted jobs and their job ids with the following shell command. A job can either "pend" or "run". If all of your jobs are done, the command will print "No unfinished jobs found".
+
You can check the current status of your submitted jobs and their job ids with the following shell command. A job can either be pending <code>PEND</code> (waiting for free nodes to run on) or running <code>RUN</code> (the jobscript is currently being executed). If all of your jobs have finished execution, the command will print <code>No unfinished jobs found</code>.
  
 
  $ bjobs
 
  $ bjobs
 +
 +
If you are interested in the current status of your job, you can try the utility <code>bpeek</code>. It prints the output which has already been written by your job:
 +
 +
$ bpeek <job_id>
  
 
In case you submitted a job on accident or realised that your job might not be running correctly, you can always remove it from the queue or terminate it when running by typing:
 
In case you submitted a job on accident or realised that your job might not be running correctly, you can always remove it from the queue or terminate it when running by typing:
Line 66: Line 78:
 
  $ bkill <job_id>
 
  $ bkill <job_id>
  
 +
== Array and Chain Jobs ==
 +
 +
<syntaxhighlight lang="zsh">
 +
 +
#BSUB -J "ChainJob[1-4]%1"
 +
 +
</syntaxhighlight>
 +
 +
This creates an array job with 4 subjobs where only one may be executed at a time in a random order. An explicit order can be forced by either submitting each subjob at the end of the one before (which may prolong pending) or using the dependencies feature, which results in a chain job.
 +
 +
<syntaxhighlight lang="zsh">
 +
 +
#BSUB -w <condition>
 +
 +
</syntaxhighlight>
 +
 +
The set condition is a logical expression, which can also be a combination of multiple logical expressions, connected by the logical operators && (AND), || (OR) or ! (NOT). The entire condition needs to be satisfied for the job to be executed.
  
 
== Jobscript Examples ==
 
== Jobscript Examples ==
Line 159: Line 188:
 
myapp.exe
 
myapp.exe
 
</syntaxhighlight>
 
</syntaxhighlight>
 
  
 
== References ==
 
== References ==
  
[https://doc.itc.rwth-aachen.de/display/CC/Example+scripts More jobscript examples]
+
[https://doc.itc.rwth-aachen.de/display/CC/Example+scripts More LSF jobscript examples]

Latest revision as of 07:23, 4 September 2019

General

LSF (Platform Load Sharing Facility) is a Batch-Scheduler. To get an overview of the functionality of a scheduler, go here or to the Scheduling Basics.



#BSUB Usage

If you are writing a jobscript for an LSF batch system, the magic cookie is "#BSUB". To use it, start a new line in your script with "#BSUB". Following that, you can put one of the parameters shown below, where the word written in <...> should be replaced with a value.

Basic settings:

Parameter Function
-J <name> job name
-o <path> path to the file where the job output is written
-e <path> path to the file for the job error output (if not set, it will be written to output file as well)

Requesting resources:

Parameter Function Default
-W <runlimit> runtime limit in the format [hour:]minute; once the time specified is up, the job will be killed by the scheduler 00:15
-M <memlimit> memory limit per process in MB 512
-S <stacklimit> limit of stack size per process in MB 10

Parallel programming (read more here):

Parameter Function
-a openmp start a parallel job for a shared-memory system
-n <num_threads> number of threads to execute OpenMP application with
-a openmpi start a parallel job for a distributed-memory system
-n <num_procs> number of processes to execute MPI application with

Email notifications:

Parameter Function
-B send email to the job submitter when the job starts running
-N send email to the job submitter when the job has finished
-u <email_address> recipient of emails

Job Submission

This command submits the job you defined in your jobscript to the batch system. If the less-than sign < is left out, your job will be submitted, but all the resource requests in your jobscript will be ignored.

$ bsub < jobscript.sh

Just like any other incoming job, your job will first be queued. Then, the scheduler decides when your job will be run. The more resources your job requires, the longer it may be waiting to execute.

You can check the current status of your submitted jobs and their job ids with the following shell command. A job can either be pending PEND (waiting for free nodes to run on) or running RUN (the jobscript is currently being executed). If all of your jobs have finished execution, the command will print No unfinished jobs found.

$ bjobs

If you are interested in the current status of your job, you can try the utility bpeek. It prints the output which has already been written by your job:

$ bpeek <job_id>

In case you submitted a job on accident or realised that your job might not be running correctly, you can always remove it from the queue or terminate it when running by typing:

$ bkill <job_id>

Array and Chain Jobs

#BSUB -J "ChainJob[1-4]%1"

This creates an array job with 4 subjobs where only one may be executed at a time in a random order. An explicit order can be forced by either submitting each subjob at the end of the one before (which may prolong pending) or using the dependencies feature, which results in a chain job.

#BSUB -w <condition>

The set condition is a logical expression, which can also be a combination of multiple logical expressions, connected by the logical operators && (AND), || (OR) or ! (NOT). The entire condition needs to be satisfied for the job to be executed.

Jobscript Examples

This serial job will run a given executable, in this case "myapp.exe".

#!/usr/bin/env zsh

### Job name
#BSUB -J MYJOB

### File where the output should be written
#BSUB -o MYJOB_OUTPUT.txt

### Time your job needs to execute, e. g. 1 h 20 min
#BSUB -W 1:20

### Memory your job needs, e. g. 1000 MB 
#BSUB -M 1000

### Stack limit per process, e. g. 20 MB
#BSUB -S 20

### The last part consists of regular shell commands:
### Change to working directory
cd /home/user/mywork

### Execute your application
myapp.exe

This OpenMP job will start the parallel program "myapp.exe" with 24 threads.

#!/usr/bin/env zsh

### Job name
#BSUB -J OMPJOB

### File where the output should be written
#BSUB -o OMPJOB_OUTPUT

### Time your job needs to execute, e. g. 15 min
#BSUB -W 0:15

### Memory your job needs, e. g. 1000 MB 
#BSUB -M 1000

### Stack limit per process, e. g. 50 MB
#BSUB -S 50

### Request 24 compute slots (in this case: threads)
#BSUB -n 24

### Execute as shared-memory job
#BSUB -a openmp

### Change to working directory
cd /home/user/mywork

### Execute your application
myapp.exe

This OpenMPI job will start the parallel program "myapp.exe" with 4 processes.

#!/usr/bin/env zsh

### Job name
#BSUB -J MPIJOB

### File where the output should be written
#BSUB -o MPIJOB_OUTPUT

### Time your job needs to execute, e. g. 30 min
#BSUB -W 0:30

### Memory your job needs, e. g. 1024 MB 
#BSUB -M 1024

### Stack limit per process, e. g. 50 MB
#BSUB -S 50

### Request 4 compute slots (in this case: processes)
#BSUB -n 4

### Execute as distributed-memory job with OpenMPI
#BSUB -a openmpi

### Change to working directory
cd /home/user/mywork

### Execute your application
myapp.exe

References

More LSF jobscript examples