Moving Files and Directories to a New Repository in Git

6 minute read

I’ve needed to move files or directories (along with their histories) from one Git repository into a new repository often enough now that I’m annoyed with myself each time I can’t remember how to do it. Hence, here are my notes on how to accomplish this.

I don’t take any credit for the actual commands mentioned here, everything has been gleaned from the amazing knowledge resource that is StackOverflow. In particular these answers were used when working out the solution presented below:

When one project is really two

Imagine you have a repository which has been growing and growing and at some point you realise that a part of the repository is really a project on its own. How to take this part (be it a file, set of files, or entire subdirectory) and create a new repository containing only these files and their respective histories (and no other)?

The trick is to think of the new repository as being the old repository, however with the files (and their histories) that you don’t want to keep removed from it.

This is the process to use:

  • clone the original repository locally
  • enter the clone and remove all files from git that aren’t wanted

Moving files and directories

First, clone the original repo:

$ git clone file:///path/to/original/repository repo_clone

Now remove the origin remote reference (we want to detach the new repository from the history of the original one):

$ git remote rm origin

Then it’s a simple matter of uttering the following incantation:

$ git filter-branch --prune-empty --index-filter \
      'git ls-tree -z -r --name-only --full-tree $GIT_COMMIT | \
       grep -z -v "file1" | \
       grep -z -v "file2" | \
       grep -z -v "dir1" | \
       xargs -0 -r git rm --cached -r' \
  -- --all

What this does is goes through all commits in the clone of the original repository looking for files which don’t match the files you want to keep and removes their entries in the index. Afterwards you’re left with just the commits for just the files you’re interested in.

To make sure that everything is cleaned up, you can also run Git’s garbage collector explicitly so that everything that isn’t required really has been purged:

$ git gc --aggressive

Now rename the directory to something more appropriate for the subproject that has been created and reassign the origin remote pointer (assuming, of course, that the remote bare repository has already been created):

$ git remote add origin git@git.server.example.com:new_repo_name.git

Of course, the moved files need to be removed from the original repository and a commit message indicating where they ended up would be very helpful for possible repository archaeology in the future.

Update: prefer git filter-repo over git filter-branch

A reader made me aware of git filter-repo which is a more powerful tool for history rewriting than git filter-branch. In fact, the documentation for git filter-branch explicitly warns against its use and recommends users to prefer git filter-repo instead. The reason for the warning is that git filter-branch has several safety and performance pitfalls which make using it potentially dangerous for the casual user. Some people won’t be able to use git filter-repo yet because it requires git >= 2.22.0 in order to work. If you’re in that situation, you’ll have to fall back to the git filter-branch solution.

The command git filter-repo isn’t part of the standard suite of Git tools, hence it’s necessary to install it before you can use it. Unfortunately, it hasn’t yet been packaged for Debian (but is packaged by some other Linux distributions) hence it will be necessary for Debian users to install the source code. Since this comes as a single Python script, the installation is very simple: just put the file somewhere in your $PATH.

To install the script, grab and unpack the latest tarball:

$ wget https://github.com/newren/git-filter-repo/releases/download/v2.29.0/git-filter-repo-2.29.0.tar.xz
$ tar -xvJf git-filter-repo-2.29.0.tar.xz

and copy the git-filter-repo script into a directory in your $PATH such as $HOME/bin:

$ cp git-filter-repo-2.29.0/git-filter-repo $HOME/bin/

Now the filter-repo git subcommand will be available; in other words, you can run the command as git filter-repo.

So how do we use filter-repo to filter files as described in the filter-branch example above? Again, clone the repo and remove the origin:

$ git clone file:///path/to/original/repository repo_clone
$ git remote rm origin

then run

$ git filter-repo --path "path/to/file1" --path "path/to/file2" --path "dir1"

which, as you can see, is significantly easier to use than the previous solution. Note that you’ll need to specify full paths to the files you want to keep; the filter-branch solution used a grep hence the paths weren’t as relevant in that case.

Much more information, including several examples, is available in the extensive git filter-repo documentation.

Moving just a directory

The definitive guide to moving a subdirectory is in the answer to this question on Stack Overflow: Detach (move) subdirectory into separate Git repository

To paraphrase that answer, here is how to extract just the given directory, pulling in all branches and tags.

$ git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter <dirname> -- --all

If you don’t want all tags and branches you can just rewrite the current HEAD by using this version of the command:

$ git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter <dirname> HEAD

Making the complex simple: git subtree

It turns out that splitting a subdirectory of a project out into a new project is sufficiently common that there is also a Git command especially for it:

$ git subtree split

Again, clone the original repo. This is effectively a backup of your repository, which is a good idea, because the git subtree command is destructive and will rewrite your history. As I saw on a T-shirt recently: “No backup? No pity!”.

$ git clone file:///path/to/original/repository repo_clone

Now we split a subdirectory of the repository (called the “prefix” in git subtree terminology) into its own “project” and create a new branch with just this subdirectory and its history.

$ git subtree split --prefix <dirname> --branch <new-project-name>

If you check out the new branch

$ git checkout <new-project-name>

you’ll find only the files from the subdirectory that you just split from the original project. Assuming that you’ve already made a bare repository for the new project, you can now add the bare repository as an upstream reference and push this branch to the new project’s master branch:

$ git remote add <new-project-origin> git@git.server.example.com:new_repo_name.git
$ git push -u <new-project-origin> <new-project-name>:master

The nice, clean, shiny project repository can now be cloned from upstream:

$ git clone git@git.server.example.com:new_repo_name.git

And that’s it! I hope that helped someone and that it helps my forgetful future self :-)

Is there anything that I’ve missed? Was this post helpful? How could I make it better? Let me know by dropping me a line via email or pinging me on Mastodon.


Update (2022-04-01)

Since publishing this post, a reader pointed out a tool he is working on called Git X-Modules to solve the problem outlined above. He describes it like so:

The tool is designed to help with migration to monorepo and as a replacement to Git submodules/subtree. Unlike one-time conversion described in your article, it continuously syncs old and new repository, this makes the migration smooth.

Note that I haven’t used Git X-Modules and hence can’t endorse it, however it might be of interest to people looking for a higher-level solution to the problem of moving files to a new Git repository.

Support

If you liked this post and want to see more like this, please buy me a coffee!

buy me a coffee logo

Categories:

Updated: