Difference between revisions of "Getting Started"

Revision as of 18:23, 22 January 2018

ToDo:

Write the linked pages shell, Login_Nodes, Jobscript, jobscript-examples, Support, Batch-Scheduler, LSF, SLURM, OpenMP, MPI, rsync, scp, ftp

Access or "How-to-be-allowed-onto-the-supercomputer"

Depending on the specific supercomputer, one has either to register to get a user account or write a project proposal and apply for computing resources that way. The respective pages are linked in this overview:

IT Center - RWTH Aachen [1]	RRZE - FAU Erlangen [2]	ZIH - TU Dresden [3]

After this is done and login credentials are supplied, one can proceed to

Login or "How-to-now-actually-connect-to-the-supercomputer"

Most HPC Systems are unix-based environments with shell (commandline) access.

To log in, one usually uses ssh [4] to reach the respective Login_Nodes (Computers reserved for people just like you that want to connect to the supercomputer).

IT Center - RWTH Aachen	RRZE - FAU Erlangen	ZIH - TU Dresden
cluster.rz.rwth-aachen.de	cshpc.rrze.fau.de	taurus.hrsk.tu-dresden.de

Once there, the user can interact with the system and run (small) programs to generally test the system/software.

Schedulers or "How-To-Run-Applications-on-a-supercomputer"

To run any significant program or workload on a supercomputer, generally schedulers [5] are employed. Except from the above-mentioned Login Nodes there are usually far more Backend Nodes (Computers exclusively reserved for computing, where you can not directly connect to). A programm called scheduler decides who gets how many of those for what time.

In order to run your application with that, you have to tell the Scheduler, what your application needs in term of

time
compute resources (how many cpus/sockets/nodes)
memory resources (how much RAM/storage)
how to actually execute your application

which obviously has to fit within the boundaries of the running system. If you ask for more than there is, chances are, the scheduler will take this job and wait until you buy and install the missing hardware -> forever. Information over the available hardware can be found in the following table.

Running larger computations on the above mentioned backend nodes is ususally done with a Jobscript. When you have this jobscript ready with the help of jobscript-examples, colleagues or your local Support, you can submit it to the respective Batch-Scheduler.

IT Center - RWTH Aachen [6]	RRZE - Erlangen [7]	ZIH - Dresden [8]
LSF	SLURM

After this the application is executed when a set of nodes (computers) are allocated to compute your 'job' by the scheduler. Usually there is (optionally) Email notification on start/finish of a job. If the specified time runs out, before your application finishes and exits, it will be terminated by the scheduler.

Modules or "How-To-Use-Software-Without-installing-everything-yourself"

A lot of applications rely on 3rd party software. One prominent example beeing compilers or special math libraries, this software is usually loadable with the module system. Depending on the institution, different modules are available, but there are usually common ones like the Intel or GCC Compilers.

A few common commands, to enter into the supercomputer commandline, are

module list	lists loaded modules
module avail	lists available (loadable) modules
module load/unload x	loads/unloads modul x
module switch x y	switches out module x for module y

If you recurrently need lots of modules, this loading can be automated with an sh file, so that you just have to execute the file once and it loads all modules, you need.

Parallelizing or "How-To-Use-More-Than-One-Core"

Unfortunately currently development of computers is at a point, where you can not just make a processor run faster, because the semiconductor physics simply dont work that way. Therefore the current solution is to split the work into multiple ideally independent parts, which are then executed in parallel. Similar to cleaning your house, where everybody takes care of a few rooms, on a supercomputer this is usually done with parallel programming paradigms like OpenMP or MPI. However like the fact that you only have one vacuum cleaner in the whole house which not everybody can use at the same time, there are limits on how fast you can get, even with a big number of processors (people) working on your problem (cleaning the house) in parallel.

Message Passing Interface (MPI) is similar to the way how humans interact with problems: every process 'works' (cleans) on it's own and can communicate with the others by sending messages (talking and listening). Open Multi-Processing (OpenMP) on the other hand works more like the communication via a pin board. There is one shared memory (pin-board in the analogy) where everybody can see what everybody is doing and how far they have gotten or which results (the bathroom is already clean) they got. Similar to the physical world, there are logistical limits on many people can use the memory (pin board) efficiently and how big it can be. Therefore usually OpenMP is employed for the different processes in one node (computer - corresponds to your house in the example) and MPI to communicate accross nodes (similar to talking to the neighbour and see how far their house-cleaning is). Both can be used simultaneously.

File Transfer or How-to-get-your-data-onto-or-off-the-supercomputer

To get your data(files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this purpose (copy Nodes):

RWTH Aachen [9]	RRZE - FAU Erlangen [10]	ZIH - TU Dresden [11]
cluster-copy.rz.rwth-aachen.de	no dedicated copy nodes	taurusexport.hrsk.tu-dresden.de
cluster-copy2.rz.rwth-aachen.de

Please use these nodes to copy your data to or off the supercomputer, if available, since you will get a better/faster connection and disturb other users less. Also the tools mentioned below might only work on those nodes.

Commonly used and widely supported copying tools are rsync which mirrors directories (folders) between the supercomputer and your local machine. scp which is useful for a few single files or specified file-lists, and lastly the commonly used ftp or the encrypted version sftp (or ftps). The best is to check on the above links, which protocol/tool your supercomputer supports and then move from there.

@@ Line 3: / Line 3: @@
 Write the linked pages [[shell]], [[Login_Nodes]], [[Jobscript]], [[jobscript-examples]], [[Support]], [[Scheduler]], [[LSF]], [[SLURM]], [[OpenMP]], [[MPI]], [[rsync]], [[scp]], [[ftp]]
-== Access ==
+== Access or "How-to-be-allowed-onto-the-supercomputer" ==
-Depending on the specific supercomputer, one has either has to register to get a user account or write a project proposal and apply for computing resources that way. The respective pages are linked in this overview:
+Depending on the specific supercomputer, one has either to register to get a user account or write a project proposal and apply for computing resources that way. The respective pages are linked in this overview:
 {| class="wikitable" style="width: 40%;"
@@ Line 11: / Line 11: @@
 | ZIH - TU Dresden [https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/Access]
 |}
-after this is done and login credentials are supplied, one can proceed to
+After this is done and login credentials are supplied, one can proceed to
-== Login ==
+== Login or "How-to-now-actually-connect-to-the-supercomputer" ==
 Most HPC Systems are unix-based environments with [[shell]] (commandline) access.
-To log in, one usually uses [[ssh]] [https://wickie.hlrs.de/platforms/index.php/Secure_Shell_ssh] to reach the respective [[Login_Nodes]] (Computers reserved for the login of users).
+To log in, one usually uses [[ssh]] [https://wickie.hlrs.de/platforms/index.php/Secure_Shell_ssh] to reach the respective [[Login_Nodes]] (Computers reserved for people just like you that want to connect to the supercomputer).
 {| class="wikitable" style="width: 40%;"
@@ Line 28: / Line 28: @@
 | taurus.hrsk.tu-dresden.de
 |}
-Once there, the user can interact with the system and run (very small) programs to generally test the system/software.
+Once there, the user can interact with the system and run (small) programs to generally test the system/software.
-== Schedulers or How-To-Run-Applications-on-a-supercomputer ==
+== Schedulers or "How-To-Run-Applications-on-a-supercomputer" ==
-To run any significant program or workload on a supercomputer, generally schedulers [https://en.wikipedia.org/wiki/Job_scheduler] are employed. Except from the above-mentioned Login Nodes there are usually far more Backend Nodes (Computers exclusively reserved for computing). The scheduler decides who gets how many of those for what time.
+To run any significant program or workload on a supercomputer, generally schedulers [https://en.wikipedia.org/wiki/Job_scheduler] are employed. Except from the above-mentioned Login Nodes there are usually far more Backend Nodes (Computers exclusively reserved for computing, where you can not directly connect to). A programm called scheduler decides who gets how many of those for what time.
 In order to run your application with that, you have to tell the Scheduler, what your application needs in term of
@@ Line 41: / Line 41: @@
 which obviously has to fit within the boundaries of the running system. If you ask for more than there is, chances are, the scheduler will take this job and wait until you buy and install the missing hardware -> forever. Information over the available hardware can be found in the following table.
-This ususally is done with a [[Jobscript]]. When you have this jobscript ready with the help of [[jobscript-examples]], colleagues or the [[Support]], you can submit it to the respective [[Scheduler]].
+Running larger computations on the above mentioned backend nodes is ususally done with a [[Jobscript]]. When you have this jobscript ready with the help of [[jobscript-examples]], colleagues or your local [[Support]], you can submit it to the respective [[Scheduler]].
 {| class="wikitable" style="width: 40%;"
 | IT Center - RWTH Aachen [https://doc.itc.rwth-aachen.de/display/CC/Hardware+of+the+RWTH+Compute+Cluster]
@@ Line 51: / Line 51: @@
 |}
-After this the application is executed when a set of nodes (computers) are allocated the your 'job' by the scheduler. Usually there is (optionally) Email notification on start/finish of jobs. If the specified time runs out, before your application finishes and exits, it will be terminated.
+After this the application is executed when a set of nodes (computers) are allocated to compute your 'job' by the scheduler. Usually there is (optionally) Email notification on start/finish of a job. If the specified time runs out, before your application finishes and exits, it will be terminated by the scheduler.
-== Modules or How-To-Use-Software-Without-installing-everything-yourself ==
+== Modules or "How-To-Use-Software-Without-installing-everything-yourself" ==
-A lot of applications rely on 3rd party software. One prominent example beeing compilers, this software is usually loadable with the module system. Depending on the site, different modules are available, but there are usually common ones like the [[Intel Compiler|Intel]] or [[GCC]] Compilers.
+A lot of applications rely on 3rd party software. One prominent example beeing compilers or special math libraries, this software is usually loadable with the module system. Depending on the institution, different modules are available, but there are usually common ones like the [[Intel Compiler|Intel]] or [[GCC]] Compilers.
-A few common commands are
+A few common commands, to enter into the supercomputer commandline, are
 {| class="wikitable" style="width: 40%;"
 | module list || lists loaded modules
@@ Line 71: / Line 71: @@
-== Parallelizing or How-To-Use-More-Than-One-Core ==
+== Parallelizing or "How-To-Use-More-Than-One-Core" ==
-Unfortunately currently development of computers is at the point, where you can not just make a processor run faster, because the physics simply dont work out. Therefore the current solution is to split the work into multiple partly independent parts, which are then executed in parallel. Similar to cleaning your house, where everybody takes care of a few rooms, on a supercomputer this is usually done with [[OpenMP]] or [[MPI]]. However like the vacuum cleaner, where you have only one, there are limits on how fast you can get, even with a big number of processors working on your problem in parallel.
+Unfortunately currently development of computers is at a point, where you can not just make a processor run faster, because the semiconductor physics simply dont work that way. Therefore the current solution is to split the work into multiple ideally independent parts, which are then executed in parallel. Similar to cleaning your house, where everybody takes care of a few rooms, on a supercomputer this is usually done with parallel programming paradigms like [[OpenMP]] or [[MPI]]. However like the fact that you only have one vacuum cleaner in the whole house which not everybody can use at the same time, there are limits on how fast you can get, even with a big number of processors (people) working on your problem (cleaning the house) in parallel.
-[[MPI|Message Passing Interface (MPI)]] is similar to the way how humans interact with problems: every process 'thinks' on it's own and can communicate with the others by sending messages. [[OpenMP|Open Multi-Processing (OpenMP)]] on the other hand works more like a depot, where every branch of the city can store their wares/results and access the wares/results of everybody else, who is on the same warehouse. Similar to a warehouse, there are logistical limits on how many branches can have access to the same memory and how big it can be. Therefore usually [[OpenMP]] is employed for the different processes in one node and [[MPI]] to communicate accross nodes. Both can be used simultaneously.
+[[MPI|Message Passing Interface (MPI)]] is similar to the way how humans interact with problems: every process 'works' (cleans) on it's own and can communicate with the others by sending messages (talking and listening). [[OpenMP|Open Multi-Processing (OpenMP)]] on the other hand works more like the communication via a pin board. There is one shared memory (pin-board in the analogy) where everybody can see what everybody is doing and how far they have gotten or which results (the bathroom is already clean) they got. Similar to the physical world, there are logistical limits on many people can use the memory (pin board) efficiently and how big it can be. Therefore usually [[OpenMP]] is employed for the different processes in one node (computer - corresponds to your house in the example) and [[MPI]] to communicate accross nodes (similar to talking to the neighbour and see how far their house-cleaning is). Both can be used simultaneously.
-== File Transfer or How-to-get-your-stuff-onto-or-off-the-supercomputer ==
+== File Transfer or How-to-get-your-data-onto-or-off-the-supercomputer ==
-To get your stuff (files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers, specifically reserved for this purpose:
+To get your data(files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this purpose (copy Nodes):
 {| class="wikitable" style="width: 40%;"
@@ Line 90: / Line 90: @@
 |}
-If available, please use these nodes to do the copying, since you will get a better/faster connection and disturb other users less. Also the belowmentioned tools might only work on those nodes.
+Please use these nodes to copy your data to or off the supercomputer,  if available, since you will get a better/faster connection and disturb other users less. Also the tools mentioned below might only work on those nodes.
 Commonly used and widely supported copying tools are [[rsync]] which mirrors directories (folders) between the supercomputer and your local machine. [[scp]] which is useful for a few single files or specified file-lists, and lastly the commonly used [[ftp]] or the encrypted version sftp (or ftps).
 The best is to check on the above links, which protocol/tool your supercomputer supports and then move from there.