File Transfer

From HPC Wiki
Jump to navigation Jump to search


To get your data (files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this purpose called copy nodes.

If available to you, it is recommened to use these copy nodes to move data to or from the supercomputer, since this will result in a better connection and disturb other users less. Additionally the tools mentioned below might only work on these nodes. If there are no dedicated copy nodes, you can usually use the Login Nodes for this purpose.



Secure Copy (scp)

This is generally the easiest way to transfer single files. It builds on ssh and usually works on every machine, that you can connect to via ssh. It can be used as follows:

$ scp your_username@remotehost.edu:foobar.txt /some/local/directory  

this copies the file foobar.txt from the remote machine to your local directory. Turning this around also works:

$ scp foobar.txt your_username@remotehost.edu:/some/remote/directory 

and copies the local file to the remote machine. Here you can find some more examples of scp and its usage.

Beware - scp or sftp might be confused by STDOUT/STDERR generated during your normal login session!
Most frequent example: you want to have a certain set of modules loaded automatically at each login, and you have edited your $HOME/.bashrc to contain "module load ..." commands. This in turn creates output on STDOUT, like "loading module ..." appears on your terminal while logging in. This is all very fine for interactive logins, but scp or sftp or graphical clients like Filezilla might fail for no apparent reason.
Solution: put such commands only after a check whether a shell is in fact an interactive one, like:
 ########################
 # do NOT generate output unless we're an interactive shell:
 ########################
 [ -z "$PS1" ] && return
Only if the prompt string #1 (PS1) is set, the shell is interactive - otherwise, continue execution of the calling script (return).

rsync

Quite a bit more sophisticated, rsync is another and more efficient file transfer mechanism. It is mainly used to copy multiple files or even whole directory trees across different locations. Its basic usage is like this

$ rsync [options] [source] [destination]

or in a concrete example

$ rsync -azvh -e ssh /my/source/directory/ user@123.123.123.123:/my/destination/directory

where rsync is used on top of ssh to copy the source directory to the remote server (via network). The options -azvh specify a compressed transfer with human readable output, preserving time stamps, and are thus a good starting point. Another common option is -r which recursively synchronizes all subdirectories (part of -a).

Due to the fact that rsync compares the contents of the source and destination directory beforehand, transferring only the necessary differences (delta), it is quite efficient in mirroring multiple files or whole directory trees.
Further documentation can be found in this basic guide, or in even more depth here.

File Transfer Protocol (ftp)

The file transfer protocol (ftp) is a network protocol used to exchange files with a server. Usually a seperate program like Filezilla or WinSCP (for Windows) is employed to utilize ftp. If available, one can simply connect to the copy nodes using this method. Utilizing a seperate program to exchange files with an intuitive graphical user interface can be a lot more flexible and easy to use for beginners.

Utilizing Filezilla e.g. looks like this:

Filezilla.png

You paste the respecting Copy Nodes into the 'server' field, authenticate with your username and password and have the server on the right and your local machine on the left.

References

Basic Examples of SCP usage

Basic rsync guide

more indepth rsync guide

Filezilla a free FTP client

WinSCP a free Windows FTP client