Difference between revisions of "Admin Guide Data Transfer with Globus Online"

From HPC Wiki
Admin Guide Data Transfer with Globus Online
Jump to navigation Jump to search
(Autor Marcel Rodekamp)
 
m
Line 1: Line 1:
[[Category:HPC-Admin]]
+
[[Category:HPC-Admin|Data Transfer with Globus Online]]
[[Category:HPC.NRW-Best-Practices]]
+
[[Category:HPC.NRW-Best-Practices|Data Transfer with Globus Online]]
  
 
This article shows how to handle Globus as a tool to transfer large amounts of data between servers. It is described here with the example of the Bielefeld cluster system.
 
This article shows how to handle Globus as a tool to transfer large amounts of data between servers. It is described here with the example of the Bielefeld cluster system.

Revision as of 16:15, 2 November 2020


This article shows how to handle Globus as a tool to transfer large amounts of data between servers. It is described here with the example of the Bielefeld cluster system.

GridFTP (Globus Toolkit)

Globus is a tool to transfer large amounts of data from server to server without you being required to be constantly logged in and watching. To use Globus you need to create an account at (https://globus.org) and connect the Bielefeld clusters endpoint to it. To get the Bielefeld cluster search for

phyadmin#influx1

and connect via your local Bielefeld username/password combination.

After having set up the Bielefeld cluster and another server, you can start a transfer between them either via the web interface or the command line

globus transfer [OPTIONS] SOURCE_ENDPOINT_ID[:SOURCE_PATH] DEST_ENDPOINT_ID[:DEST_PATH]

You can also connect your local machine to globus. To do so, download and extract globusconnectpersonal

wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
tar -xzf globusconnectpersonal-latest.tgz

and install the Globus CLI (command line interface, requires Python)

pip install --upgrade --user globus-cli

The CLI needs to be linked to your account to make use of the most commands, so log in by

globus login

and follow the given instructions. Using the CLI you can then create a local endpoint

$ globus endpoint create --personal my-linux-laptop
Message:     Endpoint created successfully
Endpoint ID: <endpoint-id>
Setup Key:   <setup-key>

and add it as such to your account in the web interface using the <endpoint-id>. Then start your new endpoint using

./globusconnectpersonal -setup <setup-key>
./globusconnectpersonal -start &

Globus example: transfer files between Bielefeld and Jülich

Although judac is already a GridFTP server, it is easiest connected to globus via a personal endpoint, if you do not have the required Grid certificates.

To do so, download globusconnectpersonal to judac just as you did above for your local machine. You can then create an endpoint via the globus web interface (recommended) or via the globus-CLI (discouraged), where you would need to install pip first by

wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py --user
GLOBUS_CLI_INSTALL_DIR="$(python -c 'import site; print(site.USER_BASE)')/bin"
echo "GLOBUS_CLI_INSTALL_DIR=$GLOBUS_CLI_INSTALL_DIR"
export PATH="$GLOBUS_CLI_INSTALL_DIR:$PATH"
echo 'export PATH="'"$GLOBUS_CLI_INSTALL_DIR"':$PATH"' >> "$HOME/.bashrc"
pip install --upgrade --user setuptools
pip install --upgrade --user globus-cli

You will probably want to connect more directories than just your /home, which is available per default. To do so, edit .globusonline/lta/config-paths to list all the directories you want to connect. For example for Jülich:

~/,0,1
/p/project/chbi18,0,1
/p/scratch/chbi18,0,1

You can then setup and start an endpoint just as above and submit a transfer via the web interface.

Further information can be found here: