Skip to content

Git merge vs. rebase for the SVN initiated

Kirill Katsnelson edited this page Jul 2, 2015 · 11 revisions

Rebase is arguably one of the most mysterious concepts in Git. Here's the good news for those of you transitioning from Subversion to Git: Subversion is also doing the rebase under the hood, only not calling it thus! But first thing first, what is the rebase and why is it an alternative to the merge?

Let's think a very abstract source control system (SCS). It is distinct from just a heap of files in so that it additionally keeps a complete history of changes in these files. While this sounds innocuous, keeping this history consistent is a tricky business. The problem is, there is not one single version of history; multiple people are working on the same code, and they see different histories. The SCS, like an Orwellian Ministry of Truth, is given the task of rewriting the (recent) history to keep it consistent for all users, when it becomes more distant history.

Suppose you collaborate on a project your friend. You both started changing the codebase in some starting state S. What is in the S? First of all, S refers to a cumulative contents C of all files. But also, S is a reference to the history of all preceding changed in the repository: as we “go back in time” to when S was the most recent code version, we also see the history the same way we seen it back then. This time travel is perfect. In the end, in our simplified linear history model, every state refers to a preceding state (and, recurrently, to the complete history of states): Si = { CiSi−1 }.

Indeed, there is a previous history of states { S1,...,Sn } (and the n is the familiar Subversoin revision number), but we won't be looking at them, and just focus on the latest state S = Sn. This is where you both started, and then each of you has created two different recent histories by changing some files. Your development is in the state Su (u because it was you who made the change), and your friend's is in the state Sf. This means that you both have different histories of development, diverging at S: yours is { ..., SSu }, while your friend's is { ..., SSf }. This is pretty obvious, but what is often overlooked is that as long as we keep our commitment to keep a linear history of changes, then one of the histories has to change to reconcile the differences. Now, none of the two histories is better or more correct than the other. There must be a symmetry-breaker in our picture.

In centralized systems like Subversion, the tie-breaking decision is taken by the only server, hosting an authoritative copy of the repository. Either of you can commit the history that becomes a "true" piece of history, and--you snooze you lose--the other needs their version of history corrected. Let's assume, without any loss of generality, that your friend won the race. It's official now: the authoritative history is { ..., SSf }, and will not change. The ball is in your court, and you must reconcile your work with that history. This is exactly what svn update would do. A few steps happen when it is run. First, all your local sandbox changes are converted to a patch, then your sandbox is cleaned to a pristine base revision, then it is quickly changed to the state of the repository by downloading all changes from the server, and then finally the saved patch is applied on top of the new state of sandbox. Note that changes both the code in your sandbox and its history: the previous history state is no longer S, it is Sf now, one or more revision numbers later.

It must be pretty clear what svn just did. Now we can translate the names of the things it did into Gitese. When your clean sandbox was quickly brought up to date by appending to its history, svn fast-forwarded it. And when it saved the patch, then fast-forwarded your container and then reapplied the patch, it rebased your changes on top of the repository state. The rebase is nothing else than replacing a base revision under some changes that happened later. Therefore, rebase always changes the history.

Subversion update applies a fast-forward to a modified sandbox as part of a rebase.

Rebase rewrites history.

Is this important? In Subversion it was not. Your sandbox is yours and yours alone, and no one gets to see the history or base their work on it. So the choice to rebase was done by the tool. In Git, no such luxury. Git was designed as a distributed source control system without a central server. In the early years, there was no GitHub, kernel hackers were real kernel hackers and e-mailed each other their pull requests. This means than in Git one just does not simply rewrite history whenever he wishes, since the tool does not know is it possible or not that someone has already based their work on a state in the history that rebase would rewrite (and hence eliminate and replace with another state--remember what a definition of the state is? Even if the code is same, but history differs, it is a different state!) So, while the rebase is certainly a way to keep your history tidy, it simply cannot be made the default in Git. I would be happy to write "and the rule of thumb is..." but there is not a rule of thumb here. Certainly,

Do not rebase a branch if someone may have branched their work off it.

This means that in a collaboration scenario, when a few people push into the same branch in a common repository, rebase is not an option. Should you do the merge from the trunk then? In Subversion, you use the update command often. The rationale here is that when you commit your changes upstream, all changed files must be at the same revision level as the trunk (we would now say they should be rebased on top of the trunk state, so that the server will fast-forward the trunk to your sandbox's state when you commit). If you do not update often, you may get conflicts in your precious sandbox: there is no going back after you encountered a merge conflict, it must be resolved. You have to be very careful resolving them too. To Git, however, this common sense does not apply. This is where the philosophy is different, even more so than the technology. When you (singular or plural) begin working on a feature, a branch for this feature is forked off. Working on a branch is much safer than in a volatile Subversion sandbox, and therefore you care less about the divergence. Only after the whole feature is complete, it is pulled into the master repository. Conflicts may be resolved at the time of merge.

Of course, there may be cases when merging of the master into your working branch may be necessary. One reason may be that you need a fix to one of routines that was committed to master. Another example is when your development took a long time and your pull request is rejected by upstream because it causes too many conflicts that the maintainer does not know how to resolve (or simply has no time and energy to). In this case you are looking at a merge. Indeed, here we are in more familiar waters: Git merges are similar to Subversion merges. In Gitese, the merge back from the originating branch into a feature branch is often termed a back-merge. This record is going to stay permanently in your branch, and it may make the feature merge more complicated. Git, however, is flexible enough to clean up a branch before merging it back to master. Since we are not rewriting the history, we are just creating a parallel one: branch another branch off the tip of your feature branch, then you are free to rebase it however you want. There is another option of cherry-picking changes that you need if only a small hotfix is required, and this also has pros and cons. There are workflows (the famous Gitflow being the main example) where you would pull a hotfix from its own branch, as everything has its own branch. The topic of selecting the best workflow is something that I am specifically trying to avoid in this writing.

Here's a short summary table of similarities and opposites in Subversion and Git that we touched:

Topic Subversion Git
Main development arena Sandbox--volatile Branch--resilient. A failed merge can be easily abandoned.
Integration of changes Sandbox rebased on trunk (svn update) then trunk fast-forwarded to sandbox (svn commit) You have a liberty to rebase or merge.
History Rewrite History of your sandbox gets abandoned (it is volatile, exists only as a concept, not actual record) Your choice to change either of histories when merging changes, or change neither and create a merge record without breaking symmetry, recording all states into a non-local span of history.
Integration of upstream changes Encouraged (svn update often). You have in the end only one sandbox, do not let it go stale. Discouraged. You may have as many local, personal branches as you want without exposing them to any other repository.