Difference between revisions of "File Transfer"

From HPC Wiki
Jump to navigation Jump to search
(Created page with "To get your data (files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this p...")
 
 
(19 intermediate revisions by 6 users not shown)
Line 1: Line 1:
To get your data (files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this purpose called [[Nodes#Copy_Nodes|copy Nodes]]
+
[[Category:Basics]]
 +
 
 +
To get your data (files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this purpose called [[Nodes#Copy|copy nodes]].
 +
 
 +
If available to you, it is recommened to use these copy nodes to move data to or from the supercomputer, since this will result in a better connection and disturb other users less. Additionally the tools mentioned below might only work on these nodes. If there are no dedicated copy nodes, you can usually use the [[Nodes#Login|Login Nodes]] for this purpose.
 +
 
 +
 
 +
__TOC__
  
Please use these nodes to copy your data to or off the supercomputer, if available, since you will get a better/faster connection and disturb other users less. Also the tools mentioned below might only work on those nodes. If there are no dedicated copy nodes, you can usually use the [[Nodes#Log-In|Log-in Nodes]] for this purpose.
 
  
 
== Secure Copy (scp) ==
 
== Secure Copy (scp) ==
This is generally the easiest way to transfer single files. It builds on [[ssh]] usually works on every node (machine) that you can build [[ssh]] connection to and can be used like this
+
This is generally the easiest way to transfer single files. It builds on [[ssh]] and usually works on every machine, that you can connect to via [[ssh]]. It can be used as follows:
  
 
  $ scp your_username@remotehost.edu:foobar.txt /some/local/directory   
 
  $ scp your_username@remotehost.edu:foobar.txt /some/local/directory   
Line 12: Line 18:
 
  $ scp foobar.txt your_username@remotehost.edu:/some/remote/directory  
 
  $ scp foobar.txt your_username@remotehost.edu:/some/remote/directory  
  
and copies the local file to the remote machine. Here you can find some more [http://www.hypexr.org/linux_scp_help.php Examples of SCP] syntax.
+
and copies the local file to the remote machine. Here you can find some more [http://www.hypexr.org/linux_scp_help.php examples of scp] and its usage.
 +
 
 +
;Beware - <code>scp</code> or <code>sftp</code> might be confused by STDOUT/STDERR generated during your normal login session!
 +
: Most frequent example: you want to have a certain set of ''[[modules]]'' loaded automatically at each login, and you have edited your <code>$HOME/.bashrc</code> to contain "module load ..." commands. This in turn creates output on STDOUT, like "loading module ..." appears on your terminal while logging in. This is all very fine for interactive logins, but <code>scp</code> or <code>sftp</code> or graphical clients like Filezilla might fail for no apparent reason.
 +
: <u>Solution</u>: put such commands only ''after'' a check whether a shell is in fact an interactive one, like:
 +
  ########################
 +
  # do NOT generate output unless we're an interactive shell:
 +
  ########################
 +
  [ -z "$PS1" ] && return
 +
:Only if the prompt string #1 (''PS1'') is set, the shell is interactive - otherwise, continue execution of the calling script (''return'').
  
 
== rsync ==
 
== rsync ==
 +
Quite a bit more sophisticated, rsync is another and more efficient file transfer mechanism. It is mainly used to copy multiple files or even whole directory trees across different locations. Its basic usage is like this
 +
$ rsync [options] [source] [destination]
 +
or in a concrete example
 +
$ rsync -azvh -e ssh /my/source/directory/ user@123.123.123.123:/my/destination/directory
 +
where rsync is used on top of [[ssh]] to copy the source directory to the remote server (via network). The options <code>-azvh</code> specify a compressed transfer with human readable output, preserving time stamps, and are thus a good starting point. Another common option is <code>-r</code> which recursively synchronizes all subdirectories (part of <code>-a</code>).
 +
 +
Due to the fact that rsync compares the contents of the source and destination directory beforehand, transferring only the necessary differences (delta), it is quite efficient in mirroring multiple files or whole directory trees.<br />
 +
Further documentation can be found in the basic Rsync Guide <ref> [http://www.createdbypete.com/articles/a-practical-guide-to-using-rsync Basic Rsync guide] </ref>, or in even more depth in the Ubuntu Users Guide <ref>[https://wiki.ubuntuusers.de/rsync/ Ubuntu Users Guide to Rsync (pretty indepth)]</ref>.
  
 
== File Transfer Protocol (ftp) ==
 
== File Transfer Protocol (ftp) ==
 +
The file transfer protocol (ftp) is a network protocol used to exchange files with a server. Usually a seperate program like [https://filezilla-project.org/ Filezilla] or [https://winscp.net/ WinSCP] (for Windows) is employed to utilize ftp. If available, one can simply connect to the [[Nodes#Copy|copy nodes]] using this method. Utilizing a seperate program to exchange files with an intuitive graphical user interface can be a lot more flexible and easy to use for beginners.
 +
 +
Utilizing Filezilla e.g. looks like this:
 +
 +
[[File:Filezilla.png]]
 +
 +
You paste the respecting [[Nodes#Copy|Copy Nodes]] into the 'server' field, authenticate with your username and password and have the server on the right and your local machine on the left.
 +
 +
== further Material ==
 +
[http://www.hypexr.org/linux_scp_help.php Basic Examples of SCP usage]
 +
 +
[https://filezilla-project.org/ Filezilla a free FTP client]
 +
 +
[https://winscp.net/ WinSCP a free Windows FTP client]
  
 
== References ==
 
== References ==
[http://www.hypexr.org/linux_scp_help.php Basic SCP usage]
 

Latest revision as of 08:55, 7 September 2020


To get your data (files) onto the supercomputer or back to your local machine, there are usually different ways. Sometimes there are computers specifically reserved for this purpose called copy nodes.

If available to you, it is recommened to use these copy nodes to move data to or from the supercomputer, since this will result in a better connection and disturb other users less. Additionally the tools mentioned below might only work on these nodes. If there are no dedicated copy nodes, you can usually use the Login Nodes for this purpose.



Secure Copy (scp)

This is generally the easiest way to transfer single files. It builds on ssh and usually works on every machine, that you can connect to via ssh. It can be used as follows:

$ scp your_username@remotehost.edu:foobar.txt /some/local/directory  

this copies the file foobar.txt from the remote machine to your local directory. Turning this around also works:

$ scp foobar.txt your_username@remotehost.edu:/some/remote/directory 

and copies the local file to the remote machine. Here you can find some more examples of scp and its usage.

Beware - scp or sftp might be confused by STDOUT/STDERR generated during your normal login session!
Most frequent example: you want to have a certain set of modules loaded automatically at each login, and you have edited your $HOME/.bashrc to contain "module load ..." commands. This in turn creates output on STDOUT, like "loading module ..." appears on your terminal while logging in. This is all very fine for interactive logins, but scp or sftp or graphical clients like Filezilla might fail for no apparent reason.
Solution: put such commands only after a check whether a shell is in fact an interactive one, like:
 ########################
 # do NOT generate output unless we're an interactive shell:
 ########################
 [ -z "$PS1" ] && return
Only if the prompt string #1 (PS1) is set, the shell is interactive - otherwise, continue execution of the calling script (return).

rsync

Quite a bit more sophisticated, rsync is another and more efficient file transfer mechanism. It is mainly used to copy multiple files or even whole directory trees across different locations. Its basic usage is like this

$ rsync [options] [source] [destination]

or in a concrete example

$ rsync -azvh -e ssh /my/source/directory/ user@123.123.123.123:/my/destination/directory

where rsync is used on top of ssh to copy the source directory to the remote server (via network). The options -azvh specify a compressed transfer with human readable output, preserving time stamps, and are thus a good starting point. Another common option is -r which recursively synchronizes all subdirectories (part of -a).

Due to the fact that rsync compares the contents of the source and destination directory beforehand, transferring only the necessary differences (delta), it is quite efficient in mirroring multiple files or whole directory trees.
Further documentation can be found in the basic Rsync Guide [1], or in even more depth in the Ubuntu Users Guide [2].

File Transfer Protocol (ftp)

The file transfer protocol (ftp) is a network protocol used to exchange files with a server. Usually a seperate program like Filezilla or WinSCP (for Windows) is employed to utilize ftp. If available, one can simply connect to the copy nodes using this method. Utilizing a seperate program to exchange files with an intuitive graphical user interface can be a lot more flexible and easy to use for beginners.

Utilizing Filezilla e.g. looks like this:

Filezilla.png

You paste the respecting Copy Nodes into the 'server' field, authenticate with your username and password and have the server on the right and your local machine on the left.

further Material

Basic Examples of SCP usage

Filezilla a free FTP client

WinSCP a free Windows FTP client

References