File Transfer

From HPC Wiki
Jump to navigation Jump to search

To get your data (files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this purpose called copy nodes.

If available to you, it is recommened to use these copy nodes to move data to or from the supercomputer, since this will result in a better connection and disturb other users less. Additionally the tools mentioned below might only work on these nodes. If there are no dedicated copy nodes, you can usually use the Login Nodes for this purpose.



Secure Copy (scp)

This is generally the easiest way to transfer single files. It builds on ssh and usually works on every machine, that you can connect to via ssh. It can be used as follows:

$ scp your_username@remotehost.edu:foobar.txt /some/local/directory  

this copies the file foobar.txt from the remote machine to your local directory. Turning this around also works:

$ scp foobar.txt your_username@remotehost.edu:/some/remote/directory 

and copies the local file to the remote machine. Here you can find some more examples of scp and its usage.

Beware - scp or sftp might be confused by STDOUT/STDERR generated during your normal login session!
Most frequent example: you want to have a certain set of modules loaded automatically at each login, and you have edited your $HOME/.bashrc to contain "module load ..." commands. This in turn creates output on STDOUT, like "loading module ..." appears on your terminal while logging in. This is all very fine for interactive logins, but scp or sftp or Clients like Filezilla might fail for no apparent reason.
Solution: put such commands only after a check whether a shell is in fact an interactive one, like:
 ########################
 # do NOT generate output unless we're an interactive shell:
 ########################
 [ -z "$PS1" ] && return
Only if the prompt string #1 (PS1) is set, the shell is interactive - otherwise, continue execution of the calling script (return).

rsync

A command that is a little more sophisticated is rsync. It is mainly used to copy multiple files or mirror whole directories across different locations. It's basic usage works like this

$ rsync [options] [source] [destination]

or in a concrete example

$ rsync -azvh -e ssh /my/source/directory/ user@123.123.123.123:/my/destination/directory

where rsync is used on top of ssh to copy the source directory to the remote server over the internet. The options -azvh specify a compressed transfer with human readable output, preserving time stamps and is a good starting point. Another common option is -r which recursively synchronizes all subdirectories.

Due to the fact that rsync compares the contents of the source and destination directory and transfers only the necessary differences, its usage is quite efficient, when mirroring multiple files. Further documentation can be found in this basic guide, or this more indepth guide

File Transfer Protocol (ftp)

The file transfer protocol (ftp) is a network protocol used to exchange files with a server. Usually a seperate program like Filezilla or WinSCP (for Windows) is employed to utilize ftp. If available, one can simply connect to the copy nodes using this method. Utilizing a seperate program to exchange files with an intuitive graphical user interface can be a lot more flexible and easy to use for beginners.

Utilizing Filezilla e.g. looks like this:

Filezilla.png

You paste the respecting Copy Nodes into the 'server' field, authenticate with your username and password and have the server on the right and your local machine on the left.

References

Basic Examples of SCP usage

Basic rsync guide

more indepth rsync guide

Filezilla a free FTP client

WinSCP a free Windows FTP client