Difference between revisions of "Getting Started"

From HPC Wiki
Jump to navigation Jump to search
m
Line 71: Line 71:
  
 
[[MPI|Message Passing Interface (MPI)]] is similar to the way how humans interact with problems: every process 'thinks' on it's own and can communicate with the others by sending messages. [[OpenMP|Open Multi-Processing (OpenMP)]] on the other hand works more like a depot, where every branch of the city can store their wares/results and access the wares/results of everybody else, who is on the same warehouse. Similar to a warehouse, there are logistical limits on how many branches can have access to the same memory and how big it can be. Therefore usually [[OpenMP]] is employed for the different processes in one node and [[MPI]] to communicate accross nodes. Both can be used simultaneously.
 
[[MPI|Message Passing Interface (MPI)]] is similar to the way how humans interact with problems: every process 'thinks' on it's own and can communicate with the others by sending messages. [[OpenMP|Open Multi-Processing (OpenMP)]] on the other hand works more like a depot, where every branch of the city can store their wares/results and access the wares/results of everybody else, who is on the same warehouse. Similar to a warehouse, there are logistical limits on how many branches can have access to the same memory and how big it can be. Therefore usually [[OpenMP]] is employed for the different processes in one node and [[MPI]] to communicate accross nodes. Both can be used simultaneously.
 +
 +
 +
== File Transfer or How-to-get-your-stuff-onto-the-supercomputer ==
 +
To get your stuff (files) onto the supercomputer, there are usually different ways. Sometimes there are computers, specifically reserved for this purpose:
 +
 +
{| class="wikitable" style="width: 40%;"
 +
| RWTH Aachen [https://doc.itc.rwth-aachen.de/display/CC/Remote+file+transfers]
 +
| RRZE - FAU Erlangen [https://www.anleitungen.rrze.fau.de/hpc/hpc-storage/]
 +
| ZIH - TU Dresden [https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/ExportNodes]
 +
|-
 +
| cluster-copy.rz.rwth-aachen.de || no dedicated copy nodes || taurusexport.hrsk.tu-dresden.de
 +
|-
 +
| cluster-copy2.rz.rwth-aachen.de ||
 +
|}
 +
 +
Common copying tools to do the copying are [[rsync]] which mirrors directories (folders) between the supercomputer and your local machine. [[scp]] which is useful for a few single files or specified file-lists, and lastly the commonly used [[ftp]] or the encrypted versions sftp (or ftps).
 +
The best is to check on the above links, which protocol/tool your supercomputer supports and then move from there.

Revision as of 11:22, 21 December 2017

Access

Depending on the specific supercomputer, one has either has to register to get a user account or write a project proposal and apply for computing resources that way. The respective pages are linked in this overview:

IT Center - RWTH Aachen [1] RRZE - FAU Erlangen [2] ZIH - TU Dresden [3]

after this is done and login credentials are supplied, one can proceed to


Login

Most HPC Systems are unix-based environments with shell (commandline) access.

To log in, one usually uses ssh [4] to reach the respective Login Nodes (Computers reserved for the login of users).

IT Center - RWTH Aachen RRZE - FAU Erlangen ZIH - TU Dresden
cluster.rz.rwth-aachen.de cshpc.rrze.fau.de taurus.hrsk.tu-dresden.de

Once there, the user can interact with the system and run (very small) programs to generally test the system/software.


Schedulers or How-To-Run-Applications-on-a-supercomputer

To run any significant program or workload on a supercomputer, generally schedulers [5] are employed. Except from the above-mentioned Login Nodes there are usually far more Backend Nodes (Computers exclusively reserved for computing). The scheduler decides who gets how many of those for what time.

In order to run your application with that, you have to tell the Scheduler, what your application needs in term of

  • time
  • compute resources (how many cpus/sockets/nodes)
  • memory resources (how much RAM/storage)
  • how to actually execute your application

which obviously has to fit within the boundaries of the running system. If you ask for more than there is, chances are, the scheduler will take this job and wait until you buy and install the missing hardware -> forever. Information over the available hardware can be found in the following table.

This ususally is done with a Jobscript. When you have this jobscript ready with the help of jobscript-examples, colleagues or the Support, you can submit it to the respective Batch-Scheduler.

IT Center - RWTH Aachen [6] RRZE - Erlangen [7] ZIH - Dresden [8]
LSF SLURM

After this the application is executed when a set of nodes (computers) are allocated the your 'job' by the scheduler. Usually there is (optionally) Email notification on start/finish of jobs. If the specified time runs out, before your application finishes and exits, it will be terminated.


Modules or How-To-Use-Software-Without-installing-everything-yourself

A lot of applications rely on 3rd party software. One prominent example beeing compilers, this software is usually loadable with the module system. Depending on the site, different modules are available, but there are usually common ones like the Intel or GCC Compilers.

A few common commands are

module list lists loaded modules
module avail lists available (loadable) modules
module load/unload x loads/unloads modul x
module switch x y switches out module x for module y

If you recurrently need lots of modules, this loading can be automated with an sh file, so that you just have to execute the file once and it loads all modules, you need.


Parallelizing or How-To-Use-More-Than-One-Core

Unfortunately currently development of computers is at the point, where you can not just make a processor run faster, because the physics simply dont work out. Therefore the current solution is to split the work into multiple partly independent parts, which are then executed in parallel. Similar to cleaning your house, where everybody takes care of a few rooms, on a supercomputer this is usually done with OpenMP or MPI. However like the vacuum cleaner, where you have only one, there are limits on how fast you can get, even with a big number of processors working on your problem in parallel.

Message Passing Interface (MPI) is similar to the way how humans interact with problems: every process 'thinks' on it's own and can communicate with the others by sending messages. Open Multi-Processing (OpenMP) on the other hand works more like a depot, where every branch of the city can store their wares/results and access the wares/results of everybody else, who is on the same warehouse. Similar to a warehouse, there are logistical limits on how many branches can have access to the same memory and how big it can be. Therefore usually OpenMP is employed for the different processes in one node and MPI to communicate accross nodes. Both can be used simultaneously.


File Transfer or How-to-get-your-stuff-onto-the-supercomputer

To get your stuff (files) onto the supercomputer, there are usually different ways. Sometimes there are computers, specifically reserved for this purpose:

RWTH Aachen [9] RRZE - FAU Erlangen [10] ZIH - TU Dresden [11]
cluster-copy.rz.rwth-aachen.de no dedicated copy nodes taurusexport.hrsk.tu-dresden.de
cluster-copy2.rz.rwth-aachen.de

Common copying tools to do the copying are rsync which mirrors directories (folders) between the supercomputer and your local machine. scp which is useful for a few single files or specified file-lists, and lastly the commonly used ftp or the encrypted versions sftp (or ftps). The best is to check on the above links, which protocol/tool your supercomputer supports and then move from there.