Getting Started
Access
Depending on the specific supercomputer, one has either has to register to get a user account or write a project proposal and apply for computing resources that way. The respective pages are linked in this overview:
IT Center - RWTH Aachen [1] | RRZE - FAU Erlangen [2] | ZIH - TU Dresden [3] |
after this is done and login credentials are supplied, one can proceed to
Login
Most HPC Systems are unix-based environments with shell (commandline) access.
To log in, one usually uses ssh [4] to reach the respective Login Nodes (Computers reserved for the login of users).
IT Center - RWTH Aachen | RRZE - FAU Erlangen | ZIH - TU Dresden |
cluster.rz.rwth-aachen.de | cshpc.rrze.fau.de | taurus.hrsk.tu-dresden.de |
Once there, the user can interact with the system and run (very small) programs to generally test the system/software.
Schedulers or How-To-Run-Applications-on-a-supercomputer
To run any significant program or workload on a supercomputer, generally schedulers [5] are employed. Except from the above-mentioned Login Nodes there are usually far more Backend Nodes (Computers exclusively reserved for computing). The scheduler decides who gets how many of those for what time.
In order to run your application with that, you have to tell the Scheduler, what your application needs in term of
- time
- compute resources (how many cpus/sockets/nodes)
- memory resources (how much RAM/storage)
- how to actually execute your application
which obviously has to fit within the boundaries of the running system. If you ask for more than there is, chances are, the scheduler will take this job and wait until you buy and install the missing hardware -> forever. Information over the available hardware can be found in the following table.
This ususally is done with a Jobscript. When you have this jobscript ready with the help of jobscript-examples, colleagues or the Support, you can submit it to the respective Batch-Scheduler.
IT Center - RWTH Aachen [6] | RRZE - Erlangen [7] | ZIH - Dresden [8] |
LSF | SLURM |
After this the application is executed when a set of nodes (computers) are allocated the your 'job' by the scheduler. Usually there is (optionally) Email notification on start/finish of jobs. If the specified time runs out, before your application finishes and exits, it will be terminated.
Modules or How-To-Use-Software-Without-installing-everything-yourself
A lot of applications rely on 3rd party software. One prominent example beeing compilers, this software is usually loadable with the module system. Depending on the site, different modules are available, but there are usually common ones like the Intel or GCC Compilers.
A few common commands are
module list | lists loaded modules |
module avail | lists available (loadable) modules |
module load/unload x | loads/unloads modul x |
module switch x y | switches out module x for module y |
If you recurrently need lots of modules, this loading can be automated with an sh file, so that you just have to execute the file once and it loads all modules, you need.
Parallelizing or How-To-Use-More-Than-One-Core
Unfortunately currently development of computers is at the point, where you can not just make a processor run faster, because the physics simply dont work out. Therefore the current solution is to split the work into multiple partly independent parts, which are then executed in parallel. Similar to cleaning your house, where everybody takes care of a few rooms, on a supercomputer this is usually done with OpenMP or MPI. However like the vacuum cleaner, where you have only one, there are limits on how fast you can get, even with a big number of processors working on your problem in parallel.