Difference between revisions of "Getting Started"
m |
m |
||
Line 1: | Line 1: | ||
'''ToDo:''' | '''ToDo:''' | ||
− | Write the linked pages [[shell]], [[Nodes#Log-In|Log-in Nodes]], [[sh-file]], [[vim]], [[Jobscript]], [[jobscript-examples]], [[Support]], [[Scheduler]], [[LSF]], [[SLURM]], [[OpenMP]], [[MPI]], [[rsync]], [[scp]], [[ftp]], [[Modules]] | + | Write the linked pages [[shell]], [[Nodes#Log-In|Log-in Nodes]], [[sh-file]], [[vim]], [[Jobscript]], [[jobscript-examples]], [[Support]], [[Scheduler]], [[LSF]], [[SLURM]], [[OpenMP]], [[MPI]], [[rsync]], [[scp]], [[ftp]], [[Modules]], [[File_Transfer]] |
Line 20: | Line 20: | ||
− | == [[ | + | == [[File_Transfer|File_Transfer]] or How-to-get-your-data-onto-or-off-the-supercomputer == |
To get your data (files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this purpose called [[Nodes#Copy_Nodes|copy Nodes]] | To get your data (files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this purpose called [[Nodes#Copy_Nodes|copy Nodes]] | ||
Revision as of 09:43, 5 March 2018
ToDo:
Write the linked pages shell, Log-in Nodes, sh-file, vim, Jobscript, jobscript-examples, Support, Batch-Scheduler, LSF, SLURM, OpenMP, MPI, rsync, scp, ftp, Modules, File_Transfer
External link notation with [1] and [2]: [url name] is the syntax for external links
Access or "How-to-be-allowed-onto-the-supercomputer"
Depending on the specific supercomputer, one has either to register to get a user account or write a project proposal and apply for computing resources that way. The respective pages are linked in this overview.
After this is done and login credentials are supplied, one can proceed to
Log-in or "How-to-now-actually-connect-to-the-supercomputer"
Most HPC Systems are unix-based environments with shell (commandline) access.
To log in, one usually uses ssh to reach the respective Log-in Nodes (Computers reserved for people just like you that want to connect to the supercomputer). Sometimes this access is restricted, so you can only connect, when you are within the university/facility and it's Network. To still access the Log-in Nodes externally, one can 'pretend to be inside the network' by using a Virtual Private Network (VPN).
Once there, the user can interact with the system and run (small) programs to generally test the system/software.
File_Transfer or How-to-get-your-data-onto-or-off-the-supercomputer
To get your data (files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this purpose called copy Nodes
Please use these nodes to copy your data to or off the supercomputer, if available, since you will get a better/faster connection and disturb other users less. Also the tools mentioned below might only work on those nodes. If there are no dedicated copy nodes, you can usually use the Log-in Nodes for this purpose
Commonly used and widely supported copying tools are rsync which mirrors directories (folders) between the supercomputer and your local machine. scp which is useful for a few single files or specified file-lists, and lastly the commonly used ftp or the encrypted version sftp (or ftps). The best is to check on the above links, which protocol/tool your supercomputer supports and then move from there.
Schedulers or "How-To-Run-Applications-on-a-supercomputer"
To run any significant program or workload on a supercomputer, generally schedulers are employed. Except from the above-mentioned Login Nodes there are usually far more Backend Nodes (Computers exclusively reserved for computing, where you can not directly connect to). A programm called scheduler decides who gets how many of those for what time. Please use the scheduler for everything which is not a simple small test, which only runs for a few minutes. More than 98% of the power of a supercomputer can only be accessed via the scheduler and you will block the Login-Nodes for everybody when you run your calculations there :/
When you log in, you can run commands on the Log-in nodes interactive. You type, you hit return, the command gets executed. Schedulers works different: You submit a series of commands (in form of a file) and tell it, how long you think these commands will take (also in the file) and how much resources it will approximately need in terms of:
- time: If the specified time runs out, before your application finishes and exits, it will be terminated by the scheduler.
- compute resources: how many cpus ('calculation thingies'), sockets ('cpu-houses') and nodes ('computers')
- memory resources: how much RAM ('very fast memory, similar to the few books you have at home')
- how to actually execute your application
and the scheduler then runs this series of commands when there is a part of the supercomputer free, which fits your requirements. This is usually delayed (sometimes you have to wait a day or two) and not instant. Furthermore you can not change the series of commands, but just terminate 'job' and submit a new one in case of an error.
The file specifying this series of commands and the requirements is called a Jobscript and specific/different depending on the scheduler. When you have this jobscript ready with the help of jobscript-examples, colleagues or your local Support, you can submit it to the respective Batch-Scheduler of your facility. The Scheduler then waits until a set of nodes (computers) are free and then allocated those to compute your 'job'. Sometimes there is (an optional) Email notification on when your job starts/finishes computation.
The requirements you specified obviously have to fit within the boundaries of the system of your facility. If you ask for more than there is, chances are, the scheduler will take your job and wait until you buy and install the missing hardware in your facility -> probably forever. Information over the available hardware can be found in the Hardware Overview. You can find more information about parallelizing programs here
Modules or "How-To-Use-Software-Without-installing-everything-yourself"
Since a lot of applications rely on 3rd party software, there is a program on most supercomputers, called the Module system. With this system, other software, like compilers or special math libraries, are easily loadable and usable. Depending on the institution, different modules might be available, but there are usually common ones like the Intel or GCC Compilers.
A few common commands, to enter into the supercomputer commandline and talk to the module system, are
module list | lists loaded modules |
module avail | lists available (loadable) modules |
module load/unload x | loads/unloads modul x |
module switch x y | switches out module x for module y |
If you recurrently need lots of modules, this loading can be automated with an sh-file, so that you just have to execute the file once and it loads all modules, you need.
Parallelizing or "How-To-Use-More-Than-One-Core"
Currently development of computers is at a point, where you can not just make a processor run faster, because the semiconductor physics simply dont work that way. Therefore the current solution is to split the work into multiple ideally independent parts, which are then executed in parallel. Similar to cleaning your house, where everybody takes care of a few rooms, on a supercomputer this is usually done with parallel programming paradigms like Open Multi-Processing (OpenMP) or Message Passing Interface (MPI). However like the fact that you only have one vacuum cleaner in the whole house which not everybody can use at the same time, there are limits on how fast you can get, even with a big number of processing units/cpus/cores (analogous to people in the metaphor) working on your problem (cleaning the house) in parallel.
MPI is similar to the way how humans interact with problems: every process 'works' (cleans) on it's own and can communicate with the others by sending messages (talking and listening). OpenMP on the other hand works more like the communication via a pin board. There is one shared memory (pin-board in the analogy) where everybody can see what everybody is doing and how far they have gotten or which results (the bathroom is already clean) they got. Similar to the physical world, there are logistical limits on many people can use the memory (pin board) efficiently and how big it can be. Therefore usually OpenMP is employed for the different processes in one node (computer - corresponds to your house in the example) and MPI to communicate accross nodes (similar to talking to the neighbour and see how far their house-cleaning is). Both can be used simultaneously.