From HPC Wiki
Jump to: navigation, search

Gaussian is a computational chemistry software package widely-used by researchers in academia as well as in chemical industry. Its capabilities range from molecular mechanics calculations to various electronic structure methods, such as Hartree-Fock self-consistent field (HF-SCF), density functional theory (DFT), Møller-Plesset perturbation theory, configuration interaction and coupled cluster methods. One useful, but overlooked feature of Gaussian is the capability to save the intermediate results of computation as a checkpoint file. The checkpoint file can be used not only for visualization (e.g. molecular structure and molecular orbitals etc), but also for

  • creating concise Gaussian input file by reading the molecule information stored in the checkpoint file
  • pipelining continuous computations by starting with faster, albeit less accurate, methods, so that the more accurate, but time-consuming, computations can be facilitated by using the previously generated checkpoint file
  • last but not least, restarting a Gaussian computation, if the job was terminated due to unexpected incidents, e.g. exceeding the specified time limit in workload manager (WLM), hardware crash or a power supply failure.

In this tutorial we would like to show you how to utilize the intermediate results stored in the checkpoint file generated by Gaussian during computation to facilitate more accurate, but computationally expensive runs as final production. Because these production runs usually take much longer computation time and might be terminated prematurely, we would also like to guide you to have a periodic check and backup of the checkpoint file as well as to restart some typical computations in Gaussian (e.g. restart geometry optimization and vibrational frequency computation).

The jobscript used in this tutorial is intended to provide a WLM-agnostic solution to run Gaussian computation with the checkpoint file. The examples shown below use Gaussian 16 installed on the Noctua system at Paderborn Center for Parallel Computing (PC²).

Gaussian computation on multiple compute nodes

The molecular structure of caffeine.

The Gaussian computation of caffeine (see the figure) is used here to demonstrate how to run HF-SCF calculation with small basis set and generate the initial checkpoint file for later usage in more accurate computations. Moreover, the generated checkpoint file can be checked for integrity. It will also have a backup, if the integrity check can be passed successfully, and serve to restart unfinished Gaussian computation in previous run. The Gaussian input file is shown below:

# hf/sto-3g opt

We love Café

0 1
O    2.383524    1.065652    0.000000
O    0.332955   -3.082123    0.000000
N   -1.057329   -1.245565    0.000000
N   -0.465323    2.223753    0.000000
N    1.319793   -1.009521    0.000000
N   -2.286017    0.870360    0.000000
C    0.000000    0.927977    0.000000
C   -1.140130    0.130872    0.000000
C    1.344937    0.405397    0.000000
C    0.190307   -1.864298    0.000000
C   -1.831401    2.134481    0.000000
C   -2.275717   -2.049078    0.000000
C    0.349100    3.433269    0.000000
C    2.606002   -1.713972    0.000000
H   -2.459303    3.025784    0.000000
H   -2.875732   -1.821570    0.894667
H   -1.972156   -3.102774    0.000000
H   -2.875732   -1.821570   -0.894667
H   -0.326272    4.300612    0.000000
H    0.989438    3.453412    0.892939
H    0.989438    3.453412   -0.892939
H    3.392927   -0.950608    0.000000
H    2.684548   -2.353005   -0.891658
H    2.684548   -2.353005    0.891658

Here is a brief description of the Gaussian input file:

  • %chk=caffeine.chk: specifies the name of the checkpoint file
  • %mem=128GB: specifies 128 GB memory will be used for the computation. This may be too much for the HF-SCF computation, but it definitely helps in the later, more expensive geometry optimization and vibrational frequency computations with the second-order Møller-Plesset method (MP2).
  • %cpu=0-39: specifies a total of 40 CPU cores (from CPU core 0 to 39) in a compute node of Noctua to be used for this computation.
  • # hf/sto-3g opt: specifies the molecular geometry optimization at the HF-SCF level with the sto-3g basis set.
  • We love Café: specifies the title line of this Gaussian computation, which can be any one line text.
  • The remaining lines in the Gaussian input file specify the information of the caffeine molecule: total molecular charge, spin-multiplicity and molecular structure.

The Slurm jobscript is shown below and this Gaussian computation can be submitted with sbatch, where is the Gaussian input file given above.

#SBATCH --job-name=g16cafe
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=40
#SBATCH --time=10:00:00
#SBATCH --output=g16-%j.out
#SBATCH --error=g16-%j.err
# load Gaussian 16 - the two commands below may be adapted to your local HPC system
module reset
module load g16
# create Gaussian scratch and stripe the data
MYSCRATCH=$(mktemp -d G16.XXXXX)
# This Lustre setting can be adapted to your local HPC cluster.
lfs osts > /dev/null 2>&1 && lfs setstripe -c $(($(lfs osts | wc -l) - 1)) .
# compute nodes for TCP-Linda
# - btnode: boot node of Gaussian
# - cnodes: list of compute nodes
btnode=$(scontrol show hostnames | head  -n  1 )
cnodes=$(scontrol show hostnames | paste -s -d,)
# run Gaussian computation
g16 -w=${cnodes} $1 &
# backup the checkpoint file for restarting Gaussian computation
# - If you want backup, please set bkmin accordingly, e.g. bkmin=60 for backup in every 60 minutes.
# - If you do NOT want backup, simply set bkmin to a very large value, e.g. bkmin=$((2**30)).
while true; do
  for i in $(seq 1 ${bkmin}); do
    # terminate the job, if g16 is not running
    [[ $(hostname) == ${btnode} ]] && [[ -z $(ps -o pid= -o ucmd= -p ${g16pid}) ]] && exit 0
    sleep 1m
  # backup the checkpoint file in every bkmin minutes
  for i in $(ls *.chk); do
    while true; do
      rsync -auq ${i} ${i}.b4bk
      chkchk ${i}.b4bk > /dev/null 2>&1
      if [[ $? -eq 0 ]]; then
        mv ${i}.b4bk ${i}.lastbk

This hf/sto-3g computation can be fast on two compute nodes in Noctua, each of them has 40 CPU cores.

Here is a short description of the Slurm jobscript for running Gaussian computation.

  • #SBATCH --nodes=2: specifies the number of compute nodes for TCP-Linda parallelization. Here we are using 2 compute nodes.
  • #SBATCH --ntasks-per-node=1: specifies one TCP-Linda instance on each compute node.
  • #SBATCH --cpus-per-task=40: uses 40 CPU cores for one Lindaworker for the node-level parallelization.
  • Lines below # load Gaussian 16: the following two commands may be adapted to your local HPC system.
    • module reset will reset to the default system environment modules. It's only supported by Lmod[1]. The TCL-based Environment Modules, however, does not support this command[2].
    • module load g16 will load the Gaussian 16 program package in your environment.
  • Lines below # create Gaussian scratch and stripe the data: for some Gaussian computation large scratch files will be created. Therefore it is necessary to optimize the file I/O performance.
    • In the Slurm script given above we create the scratch directory on the Lustre parallel filesystem in Noctua and use all object storage targets (OSTs) for better file I/O performance[3].
    • Please adapt these settings to your local HPC clusters to achieve better file I/O performance.
  • Lines below # compute nodes for TCP-Linda: specifies the boot node and compute nodes for the Gaussian computation. The list of compute nodes will be passed to Gaussian for cluster-level parallelization.
  • Lines below # run Gaussian computation: runs the Gaussian computation in background and stores the g16 process ID on boot node for monitoring.

A Gaussian computation may be terminated by some unexpected incidents, e.g. exceeded time limit or hardware failure. It is necessary to keep the checkpoint file in order to restart previously failed Gaussian computation. The checkpoint file, however, may also be corrupted and then it may be impossible to restart the Gaussian computation any longer. In the last part of the script we provide a backup solution to such problems.

  • By default the checkpoint file will be backed up in every 60 minutes. This backup period can be altered by changing the value assigned to bkmin.
  • The file name of the backup is <checkpoint_file_name>.lastbk.
  • To prevent the backup copy of the checkpoint file from being corrupted, a temporary file for checkpoint is used in the script and its integrity is checked by using chkchk in the Gaussian package. This temporary file will become the real backup copy <checkpoint_file_name>.lastbk, if its integrity can be correctly verified.
  • If you do NOT want to have backup of the checkpoint file, simply set bkmin to a very large value, e.g. bkmin=$((2**30)), which is more than 2000 years.

After the checkpoint file as well as its backup are saved, we can proceed to perform a series of continuous Gaussian computations by reading the molecule information stored in the checkpoint file as well as restart an interrupted Gaussian computation.

Read molecule specification from the checkpoint file

The initial HF-SCF calculation is fast, but the obtained results for molecular properties may be less accurate. To refine the computational results further we need to employ the DFT method with larger basis sets and use the hf/sto-3g optimized molecular structure as a starting point, which can be read from the checkpoint file directly. As a consequence the Gaussian input file for the DFT computation can be very concise.

# b3lyp/6-311g(d) geom=allcheck opt

The line # b3lyp/6-311g(d) geom=allcheck opt specifies the B3LYP functional in combination with the 6-311g(d) basis set for the computation. All molecule information (e.g. total charge, spin multiplicity and molecular geometry optimized at the hf/sto-3g level) will be read from the checkpoint file as the starting point for subsequent geometry optimization.

Read basis sets and initial wavefunction from the checkpoint file

Besides the molecule specification the basis sets as well as the initial guess for the wavefunction can also be read from the checkpoint file generated in previous computation (e.g. the b3lyp/6-311g(d) computation). Here is the Gaussian input file for the MP2 calculation with the 6-311g(d) basis set and initial wavefunction, both of which are stored and read from the checkpoint file.

# mp2 chkbasis guess=read geom=allcheck opt

The Gaussian keywords:

  • chkbasis reads the basis set from the checkpoint file
  • guess=read reads the initial guess of the wavefunction from the checkpoint file
  • geom=allcheck reads all molecule specification from the checkpoint file

The output of this Gaussian computation confirms that the basis set, initial wavefunction and the molecule specification are all read from the checkpoint file.

Structure from the checkpoint file:  "caffeine.chk"
Basis read from chk: "caffeine.chk" (5D, 7F)
Initial guess from the checkpoint file:  "caffeine.chk"

Therefore the Gaussian input file becomes very concise and the present computation can be facilitated by deploying existing results obtained at the lower levels of computations (e.g. HF-SCF and DFT).

Restart molecular geometry optimization from the checkpoint file

The MP2 computation can be much slower than DFT, what to do, if the molecular geometry optimization is terminated by some unexpected incidents (e.g. hardware failure)? The solution is quite simple, just put the opt=restart option in the Gaussian input file and re-submit the computation. All existing information (e.g. basis sets, wavefunction and molecular structures during the geometry optimization) can be read from the checkpoint file. Here is the complete, but also very concise, Gaussian input file to restart the unfinished geometry optimization.

# mp2 chkbasis guess=read geom=allcheck opt=restart

The following line in the output file confirms that the aforementioned molecular information is successfully recovered from the checkpoint file.

Restoring state from the checkpoint file "caffeine.chk".

Restart vibrational frequency computation from the checkpoint file

The vibrational frequency computation at the MP2 level with 6-311g(d) basis set can be even more time-consuming than the geometry optimization, because the analytic 2nd order energy derivatives must be evaluated. Using the caffeine molecule as an example here is the Gaussian input file and the keyword freq specifies the computation of molecular vibrational frequency.

# mp2 chkbasis guess=read geom=allcheck freq

If such an expensive computation crashes because of something unexpected, it can be easily restarted with merely one single keyword restart. The following gives you the Gaussian input file to restart the molecular frequency computation.

# restart

The following line in the output file of Gaussian indicates the computation is restarting from the previous state.

(Restarting at link  811)

Other useful guides

In this short tutorial we have presented how to use the checkpoint file in Gaussian computation to

  • read the basis sets, molecular wavefunction and molecule specification
  • pipeline a series of computations and
  • restart molecular geometry optimization and vibrational frequency computation.

In addition a backup solution for the checkpoint file is provided in the jobscript, which checks the integrity of the checkpoint file and keeps the latest backup available.

More information on the usage of the checkpoint file in Gaussian can be found at: