Difference between revisions of "Torque"

From HPC Wiki
Jump to navigation Jump to search
()
Line 21: Line 21:
 
== #PBS Usage ==
 
== #PBS Usage ==
  
TODO
+
If you are writing a [[jobscript]] for a Torque batch system, the magic cookie is "#PBS". To use it, start a new line in your script with "#PBS". Following that, you can put one of the parameters shown below, where the word written in <...> should be replaced with a value.
 +
 
 +
Basic settings:
 +
{| class="wikitable" style="width: 40%;"
 +
| Parameter || Function
 +
|-
 +
| -N <name> || job name
 +
|-
 +
| -j oe || both the output and error log will be written to the same log file called <job_name>.o<job_id>
 +
|}
 +
 
 +
Requesting resources:
 +
{| class="wikitable" style="width: 60%;"
 +
| Parameter || Function
 +
|-
 +
| -l walltime=<runlimit> || runtime limit in the format hours:minutes:seconds; once the time specified is up, the job will be killed by the [[scheduler]]
 +
|-
 +
| -l mem=<memlimit> || memory limit per process as an integer number, followed by a unit, e. g. 400MB
 +
|-
 +
| -l nodes=1:ppn=1 || ask for a single processor for a sequential application
 +
|}
 +
 
 +
E-mail notifications:
 +
{| class="wikitable" style="width: 60%;"
 +
| Parameter || Function
 +
|-
 +
| -m a || receive a mail if your job gets aborted
 +
|-
 +
| -m b || get notified when your job starts running
 +
|-
 +
| -m e || receive a mail when your job has finished
 +
|-
 +
| -m abe || enable all mail options above
 +
|}
 +
 
 +
Parallel programming (read more [[Parallel_Programming|here]]):
 +
{| class="wikitable" style="width: 60%;"
 +
| Parameter || Function
 +
|-
 +
| -a openmp || start a parallel job for a shared-memory system
 +
|-
 +
| -n <num_threads> || number of threads to execute OpenMP application with
 +
|-
 +
| -a openmpi || start a parallel job for a distributed-memory system
 +
|-
 +
| -n <num_procs> || number of processes to execute MPI application with
 +
|}
  
 
== Jobscript Examples ==
 
== Jobscript Examples ==

Revision as of 14:54, 20 April 2018

General

Torque is a job scheduler. To get an overview of the functionality of a scheduler, go here.

Job Submission

This command submits the job you defined in your jobscript to the batch system:

$ qsub jobscript.sh

Just like any other incoming job, your job will first be queued. Then, the scheduler decides when your job will be run. The more resources your job requires, the longer it may be waiting to execute.

You can check the current status of your submitted jobs and their job ids with the following shell command. The most common states for a job are queued Q (job waits for free nodes), running R (the jobscript is currently being executed) or on hold H (job is currently stopped, but does not wait for resources). The command also shows the elapsed time since your job has started running and the time limit.

$ qstat -u <user_id>

In case you submitted a job on accident or realised that your job might not be running correctly, you can always remove it from the queue or terminate it when running by typing:

$ qdel <job_id>

#PBS Usage

If you are writing a jobscript for a Torque batch system, the magic cookie is "#PBS". To use it, start a new line in your script with "#PBS". Following that, you can put one of the parameters shown below, where the word written in <...> should be replaced with a value.

Basic settings:

Parameter Function
-N <name> job name
-j oe both the output and error log will be written to the same log file called <job_name>.o<job_id>

Requesting resources:

Parameter Function
-l walltime=<runlimit> runtime limit in the format hours:minutes:seconds; once the time specified is up, the job will be killed by the scheduler
-l mem=<memlimit> memory limit per process as an integer number, followed by a unit, e. g. 400MB
-l nodes=1:ppn=1 ask for a single processor for a sequential application

E-mail notifications:

Parameter Function
-m a receive a mail if your job gets aborted
-m b get notified when your job starts running
-m e receive a mail when your job has finished
-m abe enable all mail options above

Parallel programming (read more here):

Parameter Function
-a openmp start a parallel job for a shared-memory system
-n <num_threads> number of threads to execute OpenMP application with
-a openmpi start a parallel job for a distributed-memory system
-n <num_procs> number of processes to execute MPI application with

Jobscript Examples

TODO

References

Overview of how to write a jobscript for Torque

Job submission on Torque

Guide to the Torque scheduler