Working with GitHub

This blog post is intended to be a reference not only for you but for me, because I always forget how it works. “It”, in this case, is Git, or more specifically, GitHub. GitHub is a web site that hosts public Git repositories. It allows anyone in the world to collaborate on any GitHub project without having to worry about repository commit privileges or patch files. If you want to work on a GitHub project, you simply fork the project’s GitHub repository, creating a new GitHub repository in your name. After you make changes to your GitHub repository, you can send pull requests to other GitHub users to notify them of the changes, and they can pull your changes into their own repositories.

While GitHub does make social coding convenient, there is still a significant learning curve. I won’t discuss the process of installing Git or signing up for GitHub account; you can find documentation for those elsewhere. The most confusing aspect of working with GitHub, in my opinion, is repository management, and that’s what I’ll explain here. My explanation will give you the steps of the GitHub workflow using an example project. I’ve chosen ClickToFlash as my example, because people are familiar with it, and I’ve contributed code to the project.

Mystery surrounds the origin of ClickToFlash. The project first appeared on Google Code, posted by an anonymous donor. Not long thereafter, the project disappeared without a trace. We still don’t know the identity of ClickToFlash’s author. (I suspect Holtzman, Holtzmann, or Holzmann.) Fortunately, several developers including ‘Wolf’ Rentzsch preserved the source code, and Rentzsch’s GitHub repository has become ‘official’. Other GitHub repositories such as my own and Simone Manganelli’s are forked from Rentzsch’s. That’s the place to start.

If you haven’t already forked ClickToFlash, you’ll see a “Fork” button on http://github.com/rentzsch/clicktoflash. When I clicked that button, it created a fork of the project at http://github.com/lapcat/clicktoflash. The fork is my public repository, which GitHub users can pull from. The catch is that I can’t work directly on my public repository, because I don’t have shell access to GitHub’s servers where the repository resides. Besides, I wouldn’t be able to run Xcode anyway. Thus, I have to clone the repository on my own local Mac. The URL of my public repository is listed on http://github.com/lapcat/clicktoflash. Actually, there are multiple URLs, but you’ll want to make sure the clone the SSH version, which is read-write. If you clone the read-only version, then you won’t be able to push changes back to the public repository.

$ git clone git@github.com:lapcat/clicktoflash.git

Your private, local clone automatically has a master branch that matches the master branch of your public, remote repository.

$ cd clicktoflash
$ git branch
* master
$ git status
# On branch master
nothing to commit (working directory clean)

Your remote cloned repository is given the special name origin by your local clone repository.

$ git remote
origin

The local master branch also tracks the remote repository, so that git fetch, git pull, and git push automatically apply to origin when run with master checked out.

$ git remote show origin
* remote origin
  Fetch URL: git@github.com:lapcat/clicktoflash.git
  Push  URL: git@github.com:lapcat/clicktoflash.git
  HEAD branch: master
  Remote branches:
    cutting-edge tracked
    master       tracked
  Local branch configured for 'git pull':
    master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (up to date)

Despite the fact that the local repository is a clone of origin, and origin is a fork of rentzsch, the local repository knows nothing of rentzsch. [Expletives censored.] If you want to pull changes from rentzsch, you need to add it as a remote repository. In this case, you can use the read-only URL, because you can’t push changes to his repository.

$ git remote add rentzsch git://github.com/rentzsch/clicktoflash.git
$ git remote
origin
rentzsch

Some people suggest upstream for the name of the remote repository, but I find this needlessly confusing. The name rentzsch tells me exactly where the changes are coming from. Unlike upstream, it’s not abstract or subject to misinterpretation with origin.

Note that unless you use the -f option, the rentzsch remote is not automatically fetched, so you’ll need to fetch it manually. I also find this needlessly confusing and wish the default behavior were to fetch rather than not fetch. You might find yourself perplexed, for example, if you try to create a new branch from the remote.

$ git branch rentzsch-master rentzsch/master
fatal: Not a valid object name: 'rentzsch/master'.
$ git branch -r
  origin/HEAD -> origin/master
  origin/cutting-edge
  origin/master
$ git fetch rentzsch
$ git branch -r
  origin/HEAD -> origin/master
  origin/cutting-edge
  origin/master
  rentzsch/1.4.2-64bit
  rentzsch/cutting-edge
  rentzsch/gh-pages
  rentzsch/master
$ git branch rentzsch-master rentzsch/master
Branch rentzsch-master set up to track remote branch master from rentzsch.

I recommend that you create a branch specifically to track the forked repository, as I do in the last instruction above. Then no matter what changes you make, you can still look at the ‘official’ version of the project by checking out the rentzsch-master branch. If you use the remote branch rentzsch/master as the starting point for the local branch rentzsch-master, the local branch automatically tracks the remote repository rentzsch, just as the local master automatically tracks the remote origin.

$ git remote show rentzsch
* remote rentzsch
  Fetch URL: git://github.com/rentzsch/clicktoflash.git
  Push  URL: git://github.com/rentzsch/clicktoflash.git
  HEAD branch: master
  Remote branches:
    1.4.2-64bit  tracked
    cutting-edge tracked
    gh-pages     tracked
    master       tracked
  Local branch configured for 'git pull':
    rentzsch-master merges with remote master
  Local ref configured for 'git push':
    master pushes to master (fast-forwardable)

When changes occur in the master branch of the remote rentzsch repository, here is the procedure for merging them:

$ git checkout rentzsch-master
$ git fetch
$ git merge rentzsch/master
$ git checkout master
$ git merge rentzsch-master
$ git push

You could use the one step git pull instead of the two steps git fetch and git merge rentzsch/master. However, I’ve heard it suggested that git pull sometimes causes problems, though that issue is beyond the scope of this blog post. Anyway, what you’re doing with these steps is first merging the remote rentzsch repository changes into the local rentzsch-master branch, then merging the local rentzsch-master branch into the local master branch, and finally pushes the local changes to the remote origin repository. The somewhat convoluted procedure is necessary because you cannot directly pull the remote rentzsch changes into origin, they have to go through the local repository.

The key to successful repository management, I believe, is to never write code on the local master branch. I’ve learned this important lesson by trial and error. In particular, if you try to merge changes from a remote repository into master while you have local changes on master that haven’t yet been pushed to origin, everything can blow up. It’s best to keep master as pure as possible. In fact, it’s best to keep all your tracking branches as pure as possible. With Git, branches are cheap. When you want to make local changes, always create and check out a new branch, and then merge the changes back into the tracking branch when you want to push.

As far as I can tell, origin by default will contain the same branches that existed in the rentzsch repository at the time you forked it. Consequently, http://github.com/lapcat/clicktoflash only has 2 branches, whereas Rentzsch’s GitHub repository has 4. In any case, the local clone only has master by default. New branches created in the local repository with local starting points are not automatically pushed to origin. This means you can safely hack on local code changes in a branch without exposing your mess to the public. If you want to work on another public ClickToFlash branch, such as rentzsch/cutting-edge instead of rentzsch/master, you’ll need to create a new local branch.

$ git branch rentzsch-cutting-edge rentzsch/cutting-edge
Branch rentzsch-cutting-edge set up to track remote branch cutting-edge from rentzsch.
$ git branch cutting-edge rentzsch-cutting-edge
$ git checkout cutting-edge
Switched to branch 'cutting-edge'

Again, we have both a branch rentzsch-cutting-edge that is a duplicate of rentzsch/cutting-edge and a branch cutting-edge that includes your changes. This mirrors the arrangement of the branches rentzsch-master and master.

If origin already contains a cutting-edge branch, then git push should be sufficient to push your local changes. (Beware: in another maddening default behavior, git push will push all branches that exist on origin, i.e., master and cutting-edge, not just the currently checked out branch.) On the other hand, if origin does not yet contain a cutting-edge branch, you’ll need to use git push origin cutting-edge to create the branch on origin.

I hope this mini tutorial helps you to work with GitHub more efficiently and with fewer headaches (from banging your head against the wall). If you have further questions, feel free to ask … someone else, because I don’t know the answer.

Comments are closed.