Git Tutorial: Creating and Changing Repositories

From HPC Wiki
Git Tutorials/Creating and Changing Repositories
Jump to navigation Jump to search



Tutorial
Title: Git Tutorials
Provider: HPC.NRW

Contact: tutorials@hpc.nrw
Type: Online
Topic Area: Revision control
License: CC-BY-SA
Syllabus

1. Basic Git overview
2. Creating and Changing Repositories
3. Branching


Most of the users are only interested in having some kind of revision control for their own code. Therefore, we will start with basics, which will allow you to put your code under revision control in a few simple steps, and you will also learn how to easily add changes to the code (this will be done locally on your hard drive; utilizing GitHub follows later). At first, you will need to create a repository for your code. To fully grasp what a repository is, you can read the corresponding Wikipedia page, but for now it is sufficient to think of it as a box into which you will put everything that you want to have under revision control. Additionally, the box contains a list, which tracks everything put into the box, everything taken out of it and any changes made to the objects inside.

This tutorial will consist of two blocks. The first one focuses on your local repository and the following contents (we will stick with the box analogy for now):

  1. initializing your repository: You create the box
  2. adding files to the repository: You note on the list what goes into the box and what changes to apply
  3. committing your changes to the repository: You put the objects and apply the changes to the box as has been noted on the list
  4. Excluding files from your repository: Deciding what should never be noted for the box

The second block will teach you how to be ready for collaboration by using a remote repository - a box used by everyone. Concepts here are similar to those for the local repository, and the topics covered are:

  1. pushing your latest commit to the remote repository: Applied changes to your local box should be applied to the remote box, too
  2. pulling the latest version from the remote repository: Apply changes from the remote box to my box

Keep in mind that push and pull usually require additional merge work, which will be covered in a later tutorial. However, if you stick with this tutorial, you should be fine.


Many of git's command line interactions can be easily reproduced through GitHub's GUI. We strongly recommend that, additionally to working through this tutorial, you take a peek at the suggested interactive GitHub tutorials in the Useful Links section. They were chosen to closely match this Wiki tutorial.


Creating a new Repository

Before we start, please make sure that you fulfill the following conditions:

  • You have some kind of Linux distribution installed
  • You have installed Git
  • You have a folder (we will call it MyFolder), that you want to put under revision control
  • You have set your identity and email through the two commands
user@HPC.NRW:~$ git config --global user.name "Your Name Comes Here"
user@HPC.NRW:~$ git config --global user.email you@yourdomain.example.com

Once those conditions are met, go into MyFolder and initialize the repository by entering

1user@HPC.NRW:~$ cd MyFolder
2user@HPC.NRW:~MyFolder/$
3user@HPC.NRW:~MyFolder/$ git init
4Initialized empty Git repository in /home/MyFolder/.git/

Your repository has been initialized (line 4). Your box has been created, but currently it is still empty. You need to note with which content you want to fill the box/repository by using the git add command. You can simply note to add the whole directory with

user@HPC.NRW:~MyFolder/$ git add .

If you want to note the addition of one or more particular files you can do this through

user@HPC.NRW:~MyFolder/$ git add someFiles* andMore

which in this case would note to add the file andMore and any files beginning with someFiles into the repository (e.g., someFiles2 or someFilesWithMoreTextAndNumb3rsAtTheEnd).

Once you are sure what goes in, it is time to put it in the box and apply the changes. This is done through the git commit command. It has to be accompanied by a message, which describes the changes you have made to the repository. Using the imperative in commit messages is a widely suggested convention. The message can be directly written with the commit through the option -m:

user@HPC.NRW:~MyFolder/$ git commit -m "Create repository/box and fill it with some initial files"

This command will be followed by information regarding your commit like the amount of files and changes it contained. At this point, it will also inform you that you have committed to your master branch. Branches will be covered later, so you do not need to worry about them.

Modifying your Repository

You actually already know everything you need to apply further changes to your repository/box. Let's assume you want to add a new file newFile to the repository and that you have also made some adjustments to the file andMore, which is already inside the box. As the box is already created, you only need to note what you want to add and change:

user@HPC.NRW:~MyFolder/$ git add andMore newFile

Of course, you can also simply use git add . to note all the changes that have been made so far. Afterwards, put everything in the box and apply the changes through git commit -m "My commit message". If you have not added any new files, but only made changes to existing ones, then you can type

user@HPC.NRW:~MyFolder/$ git commit -a -m "My commit message with no new files added"

This will automatically perform the git add command on files which have changed since your last commit (again: No new files will be added this way!).

Excluding files from the Repository

Usually you want to keep your repository tidy without unnecessary file bloat. For example, this can happen when you compile your repository's code and keep the binaries inside. Entering git add . would result in binaries being added to the repository, which is usually not desirable as they lead to larger repository size. The easiest way to control what should not go in your repository is the .gitignore file. Simply create the file in your repository and note inside, what you want to exclude from revision control. If you do not want to track any .txt files and the logo.jpg file in particular, you could do the following:

1user@HPC.NRW:~MyFolder/$ touch .gitignore
2user@HPC.NRW:~MyFolder/$ echo -e "*.txt\nlogo.jpg" > .gitignore

Naturally, you can exclude anything, but below you find a .gitignore example which contains a comprehensive list of file types that can usually be ignored.

Example of a comprehensive .gitignore file

# SLURM output #
################
*.out
*.err

# Compiled source #
###################
*.com
*.class
*.dll
*.exe
*.o
*.so

# Packages #
############
*.7z
*.dmg
*.gz
*.iso
*.jar
*.rar
*.tar
*.zip

# Logs and databases #
######################
*.log
*.sql
*.sqlite

# OS generated files #
######################
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

Going back to a previous Version

One advantage of revision control is the possibility of easily going back to an older version of your repository. As with every git add you have noted the changes made to the repository's contents, git can use these notes to switch back to a previous state/version. In order to see all existing versions, use the git log command:

 1user@HPC.NRW:~MyFolder/$ git log
 2commit aab2d012c5a5965d14c440a6727191c19625e6e3 (HEAD -> master)
 3Author: user <tutorials@hpc.nrw>
 4Date:   Thu Jul 5 14:15:09 2000 +0200
 5
 6    add .gitignore file
 7
 8commit 30763897a2fc05b13f3ecb3197a9243b4a7941d8
 9Author: user <tutorials@hpc.nrw>
10Date:   Wed Jul 1 03:43:57 2000 +0200
11
12    Create repository/box and fill it with some initial files

This yields information about the authors who made changes to the repository, dates, related commit messages (lines 6 and 12) and commit names/references (lines 2 and 8, the string after commit). These automatically generated references are the SHA-values of each commit. You can use git show <commit reference> for detailed information on a particular commit. The long reference is actually not required, and the shortened version you get with git log --oneline can be used instead. Remember to inspect the different options you can pass to git log for extended or filtered output.

Once you know which version you want to go back to, use the git checkout <commit reference> . command (with . at the end):

1user@HPC.NRW:~MyFolder/$ git checkout 3076389 .
2Updated 1 path from 83ca202

Please keep in mind that you should attach an appropriate message to your next commit to keep everybody aware of version revert due to a checkout. Generally, such reverts should not be done without caution, even more so when cooperating on a project. It is preferable to use branches in such cases - a topic touched later upon.

Using a remote Repository

Although Git is useful for keeping track of your own projects, it becomes most valuable when collaborating with others on the same code. In such cases, it is useful to set up a remote repository to which everyone has access and can contribute their changes. This tutorial will focus on how to set up your remote repository with the help of GitHub. At the end, we will also show you how to clone a remote repository from GitHub and start contributing to an existing project. Please note that you can set the Git remote repository to be anywhere, e.g., a location in your facilities own network, but for simplicity's sake we stick with GitHub only. Before we start, please make sure that you have

  • registered a GitHub account
  • an existing Git repository to upload to GitHub (you can use your local repository from the previous section)

Now, log into your account and do the following preparations:

  1. Start creating a new GitHub repository
  2. Give the repository an appropriate name, we choose "Wikipages"
  3. Set yourself as the owner
  4. Add some short description
  5. Set the repository to private
  6. Do not add any .ignore or README file, as you are going to use your existing local repository, which should at least have an .ignore file
  7. Finish creating the repository

Afterwards, you associate your local repository with your remote one and push your latest commit to GitHub's repository:

 1user@HPC.NRW:~MyFolder/$ git remote add origin https://github.com/HPC.NRW-User/Wikipages.git
 2user@HPC.NRW:~MyFolder/$ git branch -M main
 3user@HPC.NRW:~MyFolder/$ git push -u origin main
 4Username for 'https://github.com': HPC.NRW-User
 5Password for 'https://HPC.NRW-User@github.com': ********
 6Enumerating objects: 5, done.
 7Counting objects: 100% (5/5), done.
 8Delta compression using up to 12 threads
 9Compressing objects: 100% (3/3), done.
10Writing objects: 100% (3/3), 341 bytes | 341.00 KiB/s, done.
11Total 3 (delta 2), reused 0 (delta 0)
12remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
13To https://github.com/HPC.NRW-User/Wikipages.git
14   75895d2..b4e0116  main -> main

The origin in line 1 is an arbitrary name (though by convention) chosen for your remote repository. In theory, you could have several remote repositories and always chose which one to use. For now, it should be sufficient to stick with conventions and use only one remote repository named origin. Line 2 generates a branch called main to which you will be automatically set. Branches will be covered in the next tutorial, but it is beneficial to start using them early on, in particular when collaborating on a project. Line 3 is the actual push, meaning that afterwards the same files and changes as in your local repository will be put into the remote one. Do not forget to commit your changes before pushing them to make sure that everything is up-to-date. However, if you were following this tutorial step by step, then no commit was necessary here, because no changes have been performed before the push. GitHub provides an interactive tutorial for moving your local repository to GitHub. You can check it out in the Useful Links section.


The next step is to learn how you can join an existing remote repository. Usually, you have to make sure that others (or you yourself) are allowed to contribute to a remote repository and to download from it. In our case, you are the only user and therefore no further settings are required. For practicing purposes, we will continue using the current remote repository. The first thing you often want to do is to clone an existing repository, which basically means, you are going to create an empty box and put into it exactly what is in the remote repository box:


Figure 1: Visualization of the worklflow with different git operations acting on a remote and a local repository. Different colors represent different/modified content. As can be seen, it is suggested that a push is only performed after sufficient changes have been implemented in the local repository. However, for the local repository it is better to commit often.
 1user@HPC.NRW:~MyFolder/$ mkdir ~/myClonedFolder
 2user@HPC.NRW:~MyFolder/$ cd ~/myClonedFolder
 3user@HPC.NRW:~MyFolder/$ git init
 4user@HPC.NRW:~MyFolder/$ git clone https://github.com/HPC.NRW-User/Wikipages.git
 5Cloning into 'Wikipages'...
 6Username for 'https://github.com': HPC.NRW-User
 7Password for 'https://HPC.NRW-User@github.com': ******** 
 8remote: Enumerating objects: 44, done.
 9remote: Counting objects: 100% (44/44), done.
10remote: Compressing objects: 100% (23/23), done.
11remote: Total 44 (delta 23), reused 42 (delta 21), pack-reused 0
12Unpacking objects: 100% (44/44), done.

That’s it. You have successfully cloned a remote repository and can start working on it. Figure 1 visualizes how clone, commit and push operations affect local and remote repositories.


The last thing you will learn in this tutorial is how to get pushed changes from a remote repository. The prerequisite here is, that you already have a local version of a remote repository (through either git clone or git remote add). If you have been following the tutorial up to this point, this should be a given. What we will do now is:

  1. Go to the ~/myClonedFolder directory
  2. Change a file in your cloned repository in ~/myClonedFolder and add a new file clonedRepoFile to it
  3. add (or note) the changes you want to do
  4. commit the changes
  5. push the last commit to the remote repository
  6. Go back to the initial local repository in ~/MyFolder
  7. Get the recently pushed changes from the remote repository into the local repository in ~/MyFolder

The above points correspond to the following lines:

1user@HPC.NRW:~MyFolder/$ cd ~/myClonedFolder
2user@HPC.NRW:~MyFolder/$ touch clonedRepoFile
3user@HPC.NRW:~MyFolder/$ git add clonedRepoFile someChangedFile
4user@HPC.NRW:~MyFolder/$ git commit -m “add file clonedRepoFile and make some small changes”
5user@HPC.NRW:~MyFolder/$ git push origin main
6user@HPC.NRW:~MyFolder/$ cd ~/myFolder
7user@HPC.NRW:~MyFolder/$ git pull origin main

Afterwards, the contents of your folders myFolder and myClonedFolder should be identical. pull is most commonly used to be up-to-date with changes other contributors have pushed and not for going back to an older version. The latter will more often than not lead to conflicts with other versions which need to be painstakingly resolved and that will propagate to future versions for your colleagues, once you have committed and pushed your changes after pulling an older version.

Best Practices

  • This cannot be understated: WORK THROUGH THE NEXT TUTORIAL ON BRANCHES
  • Do not push every commit to the remote repository. Do commits regularly to your local repository, but only push them, once a significant amount of changes has been made.
  • checkout and pull are not meant for jumping back and forth between different versions of code. checkout should be used sparsely, while pull is intended to get your local repository up-to-date with other's contributions to the remote repository.
  • commit messages should be used in imperative form ("add" instead of "added" or "Write new method for..." instead of "Wrote new method for..."). However, this is just a convention.
  • commit messages should give an idea of what the functional idea of the * commit is. Therefore, a detailed description of new methods should not be part of the message, because that is what the code itself is for. In general the messages should be short (less than 50 characters).


Glossary

  • add: Keeps track of what should be adjusted in the local repository with your next commit.
  • commit: Files marked for tracking (aka staging) with add will be adjusted in the current branch of your repository (other branches can be specified).
  • push: Take your most recent commit and ask for your changes to be integrated into the most recent remote repository version of the desired branch. Usually, when collaborating on a project, this does not mean that the remote version will be identical to yours afterwards, because others have been pushing their changes, too.
  • checkout: Adjust files in your current directory to a different version (or a different branch).
  • pull: Similar to checkout, but you access a version from a remote repository. Usually requires some merge adjustments (which have not been covered, yet).


Useful Links