Skip to content

Merge svn branch from another Git repo that Git thinks is unrelated

Kirill Katsnelson edited this page Jul 22, 2015 · 13 revisions

We do not have sandbox/nnet3 in our main kaldi repository imported as a branch. Now we want to import it. A safe way to go is just import it using git svn into a temporary git repository, as messing with an existing git svn config and all the caches it has created is not safe. Good chances are that the 2 repositories either will not have a common commit, or the commit deemed common will be buried too deep in the history. Why? Remember that the commit author e-mail is part of the SHA1 hash, as is its parent commit hash(es). So if a single e-mail address has changed (and it is controlled by git svn with a non-versioned file pointed to by the config value svn.authorsfile), the whole history starting at this change may have different hashes. So, even if histories have a common commit, it is a good idea to verify that this is really the point where the branch was created in Subversion.

In our case, since SourceForge is down for a few days, we'll do another interesting step. I have found a recently updated git svn Kaldi import on github owned by @jpuigcerver. Fortunately, it was imported with all branches. But Git does not know that nnet3 is a branch: it is imported as a subdirectory sandboxes/nnet3 on branch master. So we'll also convert it to a branch in process.

Start with a fresh Kaldi clone, then add the new repository as a remote (remote is a misnomer, as a "remote" can still reside in your local filesystem if you just imported it fresh from Subversion).

kkm@yupana:~/work/nnet3test$ git clone [email protected]:kaldi-asr/kaldi.git .
Cloning into '.'...
remote: Counting objects: 50694, done.
remote: Compressing objects: 100% (25/25), done.
remote: Total 50694 (delta 8), reused 0 (delta 0), pack-reused 50669
Receiving objects: 100% (50694/50694), 67.23 MiB | 4.23 MiB/s, done.
Resolving deltas: 100% (40193/40193), done.
Checking connectivity... done.

kkm@yupana:~/work/nnet3test[master]$ git remote add nnet3 [email protected]:jpuigcerver/kaldi.git
kkm@yupana:~/work/nnet3test[master]$ git fetch nnet3
warning: no common commits
remote: Counting objects: 66937, done.
remote: Total 66937 (delta 0), reused 0 (delta 0), pack-reused 66937
Receiving objects: 100% (66937/66937), 78.94 MiB | 4.80 MiB/s, done.
Resolving deltas: 100% (52848/52848), done.
From github.com:jpuigcerver/kaldi
 * [new branch]      em         -> nnet3/em
 * [new branch]      master     -> nnet3/master
 * [new branch]      wip-decoder -> nnet3/wip-decoder

Note the warning: no common commits. In our case, there are no common commits at all (because in the imported repo, e-mail addresses were not mapped, so they differ at the initial commit already). I'll keep the head nnet3orig temporarily for an easy reference. I am just used to naming things.

kkm@yupana:~/work/nnet3test[master]$ git checkout nnet3/master -b nnet3orig --no-track
Checking out files: 100% (103354/103354), done.
Switched to a new branch 'nnet3orig'

Note the checkout took a while. Git had to replace all files in the worktree. Now, find the oldest commit that touched the "directory" sandbox/nnet3

kkm@yupana:~/work/nnet3test[nnet3orig]$ git log --reverse -- sandbox/nnet3 | head
commit 3246800a23c51f4d5107371a1642f2a4af190d2e
Author: danielpovey <danielpovey@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8>
Date:   Fri May 1 17:53:30 2015 +0000

    Creating new sandbox as a copy of trunk: ^/sandbox/nnet3
    3

    git-svn-id: http://svn.code.sf.net/p/kaldi/code@5041 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8

Looks like it (you might use the original Subversion history, if it were available). Note that this Subversion revision is not in our Git repo, as we do not import branches. See if its parent is:

kkm@yupana:~/work/nnet3test[nnet3orig]$ git log -1 3246800~1
commit b2869f7ea89dc188c8a58e67c3e9c6bca64d2d3b
Author: jtrmal <jtrmal@5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8>
Date:   Wed Apr 29 03:01:07 2015 +0000

    (trunk) Adding OpenFstWin specific patch -- includes all from openfst-1.3.4.patch plus OpenFstWin specific things to make Kaldi compile under MSVC

    git-svn-id: http://svn.code.sf.net/p/kaldi/code@5040 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8

Looks good, this is what we'll likely find in our master:

kkm@yupana:~/work/nnet3test[nnet3orig]$ git log master --grep '5040'
commit 428af53b5781472d8c98074e4562a67bb6175a46
Author: Jan Trmal <[email protected]>
Date:   Wed Apr 29 03:01:07 2015 +0000

    (trunk) Adding OpenFstWin specific patch -- includes all from openfst-1.3.4.patch plus OpenFstWin specific things to make Kaldi compile under MSVC

    git-svn-id: https://svn.code.sf.net/p/kaldi/code/trunk@5040 5e6a8d80-dfce-4ca6-a32a-6e07a63d50c8

Bingo! Now we know it is the same commit. Make note of its SHA1 428af53b. Time to build the nnet3 branch that we'll graft into master:

kkm@yupana:~/work/nnet3test[nnet3orig]$ git checkout -b nnet3
Switched to a new branch 'nnet3'
kkm@yupana:~/work/nnet3test[nnet3]$ time git filter-branch --subdirectory-filter sandbox/nnet3 nnet3
Rewrite c3c701be3f17df6a49e5f31202256c5254056acb (59/59)
Ref 'refs/heads/nnet3' was rewritten

real    0m3.802s

The above command simply made the content of subdirectory sandbox/nnet3 the new root of my worktree and index, and rewrote its whole history, every single commit, to match this change. This is how we convert a single branch in Subversion sense to a Git native branch.

kkm@yupana:~/work/nnet3test[nnet3]$ ls -l
total 44
-rw-rw-r--  1 kkm kkm 16365 Jul 20 15:57 COPYING
-rw-rw-r--  1 kkm kkm   258 Jul 20 15:57 INSTALL
-rw-rw-r--  1 kkm kkm  1125 Jul 20 15:57 README.txt
drwxrwxr-x 36 kkm kkm  4096 Jul 20 15:57 egs
drwxrwxr-x  8 kkm kkm  4096 Jul 20 15:57 misc
drwxrwxr-x 45 kkm kkm  4096 Jul 20 15:57 src
drwxrwxr-x  5 kkm kkm  4096 Jul 20 15:57 tools
drwxrwxr-x  2 kkm kkm  4096 Jul 20 15:57 windows

Check the history:

kkm@yupana:~/work/nnet3test[nnet3]$ git log --oneline | head -3
ba05a4f sandbox/nnet3: committing some more progress (code analysis and checking code, as precursor to optimization code).
d14e45c sandbox/nnet3: adding functionality for pretty-printing compiled computations.
0a9a6db sandbox/nnet3: various bug fixes and refactoring.

And see how the history abruptly ends at the creation point. Since parent did not contain the subdirectory sandbox/nnet3, it was just cut off: all previous commits would be empty anyway.

kkm@yupana:~/work/nnet3test[nnet3]$ git log --oneline | tail -3
70321f6 sandbox/nnet3: adding some more (sketches of) code
a5938d8 sandbox/nnet3: committing some early drafts of code.  This is mostly just to share with others to get comments.
4f71fee Creating new sandbox as a copy of trunk: ^/sandbox/nnet3 3

Note the historically first commit 4f71fee here. It becomes the child of the graft point we found a few steps above. Make sure the diff is empty:

kkm@yupana:~/work/nnet3test[nnet3]$ git diff 4f71fee 428af53b
kkm@yupana:~/work/nnet3test[nnet3]$

Now make git think as if the last commit branched off out master (commit 428af53b). Note that git wants full SHA1 hashes in the graft file, so we expand them first using git rev-parse:

kkm@yupana:~/work/nnet3test[nnet3]$ echo `git rev-parse 4f71fee 428af53b`
4f71fee22f1354d833d7a02ce9123734b15f2735 428af53b5781472d8c98074e4562a67bb6175a46
kkm@yupana:~/work/nnet3test[nnet3]$ echo `git rev-parse 4f71fee 428af53b` >> .git/info/grafts

Grafts are a nice feature. Git now knows that 4f71fee has 428af53b as its parent. This means we can look at the diffs that looks like "what will be merged" (note the 3 dot syntax):

kkm@yupana:~/work/nnet3test[nnet3]$ git diff master...nnet3
 . . . .

Looks good now, so we can make the graft permanent. The grafts file is local to your repo, so others would not see the history graft! (there is another git feature, namely git replace, that creates essentially permanent, fetchable grafts, but we'll do it the old school way).

kkm@yupana:~/work/nnet3test[nnet3]$ git filter-branch nnet3
Cannot create a new backup.
A previous backup already exists in refs/original/
Force overwriting the backup with -f

Oops. There is a backup head from our first filter-branch. Remove the backup head and redo (TAB completion is your friend). You may as well add the -f switch instead. Just keep in mind it might force more things that you expect...

kkm@yupana:~/work/nnet3test[nnet3]$ rm .git/refs/original/refs/heads/nnet3
kkm@yupana:~/work/nnet3test[nnet3]$ time git filter-branch master...nnet3
Rewrite af37d842d816fbf4403a6d63e83928c76f1db202 (178/178)
Ref 'refs/heads/nnet3' was rewritten
WARNING: Ref 'refs/heads/master' is unchanged

real    0m5.834s

Why did I use the master...nnet3 syntax? Well, filter-branch will not rewrite unchanged commits (a SHA1 is deterministic, in the end, so rewriting them is same as not rewriting). I could specify simply nnet3 and let Git figure out, but that would look at the complete branch history of 5000 commits. Too long. The syntax I used is quick to type, and had filter-branch examine both master and nnet3 histories to the common point. 5 seconds and under 200 commits examined, not bad. The absolute minimum set would be specified by $(git merge-base nnet3 master)..nnet3, but I would not save any more time by typing this long command. In a script I'd rather use just that instead though.

Now examine

kkm@yupana:~/work/nnet3test[nnet3]$ git log --oneline
 . . . .
a5798ab Creating new sandbox as a copy of trunk: ^/sandbox/nnet3 3
428af53 (trunk) Adding OpenFstWin specific patch -- includes all from openfst-1.3.4.patch plus OpenFstWin specific things to make Kaldi compile under MSVC
 . . . 

The earliest commit on branch has changed, but its parent on master has not, obviously. We rewrote only the isolated branch. Now remove the graft (we rewrote the branch, so we do not need it), backup head, backup branches and remotes:

kkm@yupana:~/work/nnet3test[nnet3]$ rm .git/info/grafts .git/refs/original/refs/heads/nnet3
kkm@yupana:~/work/nnet3test[nnet3]$ git branch -D nnet3orig
Deleted branch nnet3orig (was c3c701b).
kkm@yupana:~/work/nnet3test[nnet3]$ git remote remove nnet3

If you want to test merge now, you can try on a test branch, to roll back easier:

kkm@yupana:~/work/nnet3test[nnet3]$ git checkout master -b testmerge
kkm@yupana:~/work/nnet3test[testmerge]$ git merge nnet3
Auto-merging src/nnetbin/paste-post.cc
CONFLICT (content): Merge conflict in src/nnetbin/paste-post.cc
Auto-merging src/nnet2/nnet-component.h
Auto-merging src/nnet2/nnet-component.cc
Auto-merging src/nnet/nnet-loss.cc
CONFLICT (content): Merge conflict in src/nnet/nnet-loss.cc
Auto-merging src/matrix/matrix-lib-test.cc
Auto-merging src/cudamatrix/cu-matrix.h
Auto-merging src/cudamatrix/cu-matrix.cc
Auto-merging src/cudamatrix/cu-matrix-test.cc
Auto-merging src/cudamatrix/cu-matrix-speed-test.cc
Auto-merging src/cudamatrix/cu-kernels.h
Auto-merging src/cudamatrix/cu-kernels.cu
Auto-merging src/cudamatrix/cu-kernels-ansi.h
Auto-merging src/cudamatrix/cu-device.cc
CONFLICT (content): Merge conflict in src/cudamatrix/cu-device.cc
Auto-merging src/Makefile
Auto-merging egs/wsj/s5/steps/rnnlmrescore.sh
CONFLICT (content): Merge conflict in egs/wsj/s5/steps/rnnlmrescore.sh
Auto-merging egs/wsj/s5/steps/nnet/make_priors.sh
CONFLICT (content): Merge conflict in egs/wsj/s5/steps/nnet/make_priors.sh
Auto-merging egs/wsj/s5/steps/cleanup/decode_segmentation.sh
CONFLICT (content): Merge conflict in egs/wsj/s5/steps/cleanup/decode_segmentation.sh
Auto-merging egs/wsj/s5/steps/cleanup/create_segments_from_ctm.pl
CONFLICT (content): Merge conflict in egs/wsj/s5/steps/cleanup/create_segments_from_ctm.pl
Auto-merging egs/rm/s5/local/nnet/run_blocksoftmax.sh
CONFLICT (content): Merge conflict in egs/rm/s5/local/nnet/run_blocksoftmax.sh
CONFLICT (modify/delete): egs/ami/s5/local/run_dnn.sh deleted in HEAD and modified in nnet3. Version nnet3 of egs/ami/s5/local/run_dnn.sh left in tree.
Automatic merge failed; fix conflicts and then commit the result.

Examining the conflicts reveals that they are small and look genuine. We'll fix them later during a real merge when the feature is complete. Now it is ok to go back to the nnet3 branch, drop the testmerge and push our new nnet3 branch to make it publicly available.

kkm@yupana:~/work/nnet3test[testmerge *+|MERGING]$ git merge --abort
kkm@yupana:~/work/nnet3test[testmerge]$ git checkout nnet3
Switched to branch 'nnet3'
kkm@yupana:~/work/nnet3test[nnet3]$ git branch -d testmerge
Deleted branch testmerge (was af37d84).
kkm@yupana:~/work/nnet3test[nnet3]$ git push origin HEAD
. . . .