Michael Crosby


SVN to GIT and Extracting Individual Projects

This post is still not formatting. I switched from worpress and this is a long post and have not had time to update it.

So here is the issue. You first started to use version control and because you were coding solo and the low learning curve, you started to use Subversion (SVN). It worked great for you because you had your own repository and you added all your projects to it and it was easy. Then you started to work with other people on projects and the very first commit that you made to your svn repo ended in conflicts. Not a good way to start. So you started to use more powerful version controle systems like GIT and now you wonder why you ever used svn in the first place.

However, you have a years worth of revision history sitting in your svn repository that you want out and placed into your shinny new git repo. You have already been working in your git repo for a few months, you just checked out the latest copy from your svn repo and added it to a new git repo, and now you have a branches and commits of all your recent work. It's time to convert your svn to git and unite your past revision history with the present. Lets get to work.

Video is at the bottom if this is confusing.

Problems:

SVN repo has many different projects in the one large repository

SVN has a year worth of revision history with commits from many different projects

You checked out the latest copy from svn and made git repositories for each of your projects

These individual project git repos now have many commits of their own and branches of your recent work

Goal:

Extract individual projects with only their complete revision history from svn and merge that with your current work and revision history in your individual git repos.

Lets look at our starting point. SVN revision history and GIT revision history.

Last svn file change and git's last change. You can see that the files are different. SVN on left and GIT on right.

1:

First we need to suck our svn repo down to a machine where we have shell access. I had a large 4GB svn repo with 25+ projects in it. It had 280 commits from various projects. ie: commit 278 from Encoder Assistant changes, commit 279 from HTTP Assistant changes, and commit 280 from another project. So the svn repo was tracking overall revision history for the entire repo and not individual projects. This was a mistake on my part, I was new to svn and my hosting provider offered free private svn repos to use. So it was a free way for me to have revision history and also offsite backup of my code.

To grab your svn repo from various locations you need to first make a new folder and cd into it. Yes, you will have to use the terminal.

git svn clone url-to-repo

(not the working copy, the actual svn repo)

Local file:///Users/michael/svnrepo

Http http://yoursite.com/svnrepo

SSH ssh://username@yoursite.com/svnrepo

So to cone I will do this:

[highlight]git svn clone file:///Users/michael/Development/Video/WIP/SVNTOGIT/[/highlight]

If you get an error with the wrong url to the svn repo then you will have to delete any folders in your current folder before you can try again.

We can see by the terminal output and then doing an ls that it received 3 revisions(commits) and created a new folder of the svn repo name. Now cd into it and you can see that there are two projects exactly in the same as the svn repo. If we want to check to see if git cloned the full revision history we can type:

[highlight]git log[/highlight]

See it's all there.

If you only had one project in your svn repo then you can skip the next step.

2:

Now we need to extract one project out of this new git repo. We only want one project per repo in our new setup. To do this we need to do a filter.

Lets extract the contents of ProjectOne to the root dir of this new git repo. (If you have more than one project you will want to clone this git repo so that you can keep do this over and over again for each project.)

[highlight]git filter-branch --subdirectory-filter ProjectOne/ -- --all[/highlight]

Now you can see that it rewrote the branch and by doing an ls you can see the contents of ProjectOne are now in the root of your repo. There is only one code.txt file from this repo.

Lets do a git log again and see that only the ProjectOne revision history is extracted with the ProjectOne files.

You can see only two commits, the init and Added more code. These are only the commits that affected ProjectOne. Nothing for ProjectTwo.

Lets make a new dir to keep the same folder structure as your CURRENT git repo that we have been working in.

[highlight]mkdir ProjectOne[/highlight]

Now we need to move the files back into this project one folder.

[highlight]mv * ProjectOne [/highlight]

This will give you an error about moving ProjectOne into it's self but it works and moves every other file and folder into our new ProjectOne folder. You can cd into it and check for your self.

Now we need to commit the changes that we have just done to the repo.

[highlight]git add .[/highlight]

This adds and removed all changes that we have made.

[highlight]git commit -a[/highlight]

This will commit all the changes we made. Give it a message for the commit.

3:

Now we need to merge the revision history, branches, etc from our CURRENT repo with the new git repo's svn revisions. We are still in the same folder of the new git repo that we have created.

Lets create a new branch to add our current work into. We will now refer to our git repo with the svn history as our OLD repo and the git repo with the CURRENT changes as the CURRENT repo. This will get confusing to pay attention.

To create and new branch and check it out at the same time we do:

[highlight]git checkout -b current-repo[/highlight]

Now we need to pull in the changes from the CURRENT repo into the current-repo branch of the OLD repo (the one with the svn revision history and extracted ProjectOne)

git pull pathtorepo

So in my example

[highlight]git pull ~/Development/Video/WIP/GitCopy/svn-repo/[/highlight]

So if you notice, it pulled the CURRENT repo into the OLD repo and when it tried to merge the changes into the branch it received a conflict. Don't worry, resolving merging conflict's in git is easy. If you are on OSX like me or have a merge tool setup, ie: FileMerge all you have to do when you receive a conflict on a merge it type:

[highlight]git mergetool[/highlight]

It will tell you what file you are going to be working with and when you hit enter, it will launch the tool so you can work on the conflict in the file.

Now we can see the code on the left is from the last commit from our svn repo and the code on the right is from our CURRENT git repo with all the changes. Now you just go down through the differences and select the left or right to keep. After you are finished, I am just going to keep all the right side changes, save the file and close your merge tool.

Now if we type:

[highlight]git status [/highlight]

We can see that we have the modified file to commit. If you have untracked files or if you have files with a .orig extension, conflicted files, add or remove those before committing.

[highlight]rm ProjectOne/code.txt.orig[/highlight]

Now commit

[highlight]git commit -a[/highlight]

You can see that git has already added a comment for you stating that this was a merge from another repo and also adds the repo path or url. Write and quit your comment tool and it's done. If we type in:

[highlight]gitk[/highlight]

This will bring up an gui to show our git repos history.

Now you can see that we not only have the Init and Added more code commit's that were originally from the svn repo but also all the commits from the git project added to our new branch. If you want to do another merge to merge the changes from the current-repo branch back into the master branch for the project you can do that and now we can finally call this repository our CURRENT, full revision history, repository of awesomeness.

If you were able to follow along, your welcome.

comments powered by Disqus