Skip to content

Git Migration Notes

Rick Pernak edited this page May 11, 2020 · 10 revisions

Initial Migration

There are a couple ways to migrate from SVN to Git version control:

  1. git svn clone
  2. svn2git

We use item 2 because it is a little more straightforward. However, this does not come as a standard install on our Linux systems (e.g., CentOS 6 or 7), so item 1 might be more preferable if one does not have access to svn2git. We have not provided all of the arguments for item 1.

# first check out the SVN repo
% svn co https://svn.aer.com/svn/aer/project/RD/LBLRTM/trunk LBLRTM
% cd LBLRTM/

# grab list of authors, save it for Git repo
% svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > authors-rrtmg-lw.txt
% mv authors-rrtmg-lw.txt ..
% cd ..
% mkdir Git_LBLRTM
% cd Git_LBLRTM/
% mv ../authors-rrtmg-lw.txt .

# add (no author), otherwise migration does not complete
% cat authors-rrtmg-lw.txt
 =  <>
miacono = miacono <miacono>
mike = mike <mike>
(no author) = no_author <no_author@no_author>

# merge two repositories, keeping trunk, branches, and tags and inhibit `svn2git` from automated URL determination
% svn2git https://svn.aer.com/svn/aer/project/RD/LBLRTM/ --authors authors-rrtmg-lw.txt --no-minimize-url
% git remote add origin [email protected]:RC/LBLRTM.git
% git push --all origin
% git push --tags origin

Repository Refinement

There was a lot of bloat in the SVN repository, mostly because this code has been maintained for decades. The repo had to pared down so we are not close to the GitHub size limitation on free public projects. Here are some notes on this refinement process (a lot of which first occurred in my "testbed" on the AER Gitlab server -- LBLRTM2):

Project Tags

I thought the run_examples were taking up too much disk space for a code repository, but it turns out that the tags were the majority of the repository space. Without run_examples or tags:

% git clone [email protected]:RC/LBLRTM2.git
Cloning into 'LBLRTM2'...
% cd LBLRTM2/
% ls
build docs  src
% du -hs
 16M	.

that's acceptable, probably. with tags:

% ls
build docs  src
% du -hs
458M	.
% du -hs */
 36K	build/
4.8M	docs/
5.3M	src/
% du -hs .git/
448M	.git/

Our R&C repositories -- LNFL, RRTMG_SW, and RRTMG_LW -- do not contain releases before the push to Git, anyway, so there is no need to include all of the tags for LBLRTM. Because of this plan, I did not include tags or branches in the migration:

% cat authors-lblrtm.txt
 =  <>
bobk = bobk <bobk>
cadyp = cadyp <cadyp>
clough = clough <clough>
dgombos = dgombos <dgombos>
dweisens = dweisens <dweisens>
gallery = gallery <gallery>
jdelamer = jdelamer <jdelamer>
kcadyper = kcadyper <kcadyper>
malvarad = malvarad <malvarad>
miacono = miacono <miacono>
mshep = mshep <mshep>
patbrown = patbrown <patbrown>
pbrown = pbrown <pbrown>
rpernak = rpernak <rpernak>
vpayne = vpayne <vpayne>
(no author) = no_author <no_author@no_author>

% cat .gitignore
run_examples

% svn2git https://svn.aer.com/svn/aer/project/RD/LBLRTM/trunk --authors authors-lblrtm.txt --no-minimize-url
% git filter-branch --tree-filter "rm -rf run_examples" --prune-empty HEAD
% git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
% git gc
% foreach TAG ( `git tag` )
foreach> git tag -d $TAG
foreach> end
% git push --all origin
% git push --tags origin

Run Examples

But run_examples is still pretty big:

% svn2git https://svn.aer.com/svn/aer/project/RD/LBLRTM/ --authors authors-lblrtm.txt --no-minimize-url
% du -hs */ .git
40K	build/
4.9M	docs/
157M	run_examples/
5.3M	src/
542M	.git

So run_examples are currently not available, but they will be eventually, perhaps in the form of a Jupyter Notebook to help end users with running the model. The directory has been removed from the repository history in a similar fashion, using a StackOverflow solution as guidance.

ABSCO Release

LBLRTM and LNFL were both in GitHub before our "official" migration, but they were pushed in an ad-hoc fashion for a specific project -- see the ReFRACtor ABSCO repo. Just to be safe, i saved the ABSCO tag as a release and a branch (locally):

% git remote -v
origin	[email protected]:AER-RC/LBLRTM.git (fetch)
origin	[email protected]:AER-RC/LBLRTM.git (push)
% pwd
/Users/rpernak/Work/LBLRTM
% git branch ABSCO
% git checkout ABSCO
Switched to branch 'ABSCO'
% git push -u origin ABSCO

FAQ Word Document

I've removed the docs/FAQ_LBLRTM.doc from version control. It has been kept as a PDF, which shows up nicely in the web interface. But (almost) all of the documentation from it have been placed in the top-level README, so the Word document has been removed from this repository's history:

 2121  git filter-branch --tree-filter "rm FAQ_LBLRTM.doc" --prune-empty HEAD
 2123  git rm docs/FAQ_LBLRTM.doc
 2127  git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
 2131  echo "*.doc" >> .gitignore
 2135  git commit -a -m 'removed FAQ MS Word doc from history'
 2136  git gc
 2137  git push origin master --force

Submodules

  • First migrated aer_rt_utils to GitHub, then added as a submodule (see README notes). Then pushed LBLRTM from AER Gitlab to GitHub.
    • this required a --force because the ABSCO release had its own aer_rt_utils and included the FAQ_LBLRTM.doc i removed
    • to checkout the ABSCO branch or release, one has to rm -rf aer_rt_utils, then git checkout absco or git checkout tags/absco, respectively. this is because aer_rt_utils was not a submodule with this release