Moving Files and Directories to a New Repository in Git
I’ve needed to move files or directories (along with their histories) from one Git repository into a new repository often enough now that I’m annoyed with myself each time I can’t remember how to do it. Hence, here are my notes on how to accomplish this.
I don’t take any credit for the actual commands mentioned here, everything has been gleaned from the amazing knowledge resource that is StackOverflow. In particular these answers were used when working out the solution presented below:
- How to split a git repository while preserving subdirectories?
- Splitting a set of files within a git repo into their own repository, preserving relevant history
- How to move a file from one git repository to another while preserving history
When one project is really two
Imagine you have a repository which has been growing and growing and at some point you realise that a part of the repository is really a project on its own. How to take this part (be it a file, set of files, or entire subdirectory) and create a new repository containing only these files and their respective histories (and no other)?
The trick is to think of the new repository as being the old repository, however with the files (and their histories) that you don’t want to keep removed from it.
This is the process to use:
- clone the original repository locally
- enter the clone and remove all files from git that aren’t wanted
Moving files and directories
First, clone the original repo:
Now remove the origin
remote reference (we want to detach the new
repository from the history of the original one):
Then it’s a simple matter of uttering the following incantation:
What this does is goes through all commits in the clone of the original repository looking for files which don’t match the files you want to keep and removes their entries in the index. Afterwards you’re left with just the commits for just the files you’re interested in.
To make sure that everything is cleaned up, you can also run Git’s garbage collector explicitly so that everything that isn’t required really has been purged:
Now rename the directory to something more appropriate for the subproject
that has been created and reassign the origin
remote pointer (assuming, of
course, that the remote bare repository has already been created):
Of course, the moved files need to be removed from the original repository and a commit message indicating where they ended up would be very helpful for possible repository archaeology in the future.
Update: prefer git filter-repo
over git filter-branch
A reader made me aware of git
filter-repo
which is a more
powerful tool for history rewriting than git filter-branch
. In fact, the
documentation for git
filter-branch
explicitly warns
against its use and recommends users to prefer git filter-repo
instead.
The reason for the warning is that git filter-branch
has several safety
and performance pitfalls which make using it potentially dangerous for the
casual user. Some people won’t be able to use git filter-repo
yet because
it requires git >= 2.22.0
in order to work. If you’re in that situation,
you’ll have to fall back to the git filter-branch
solution.
The command git filter-repo
isn’t part of the standard suite of Git tools,
hence it’s necessary to install it before you can use it. Unfortunately, it
hasn’t yet been packaged for Debian (but is packaged by some other Linux
distributions)
hence it will be necessary for Debian users to install the source code.
Since this comes as a single Python script, the installation is very simple:
just put the file somewhere in your $PATH
.
To install the script, grab and unpack the latest tarball:
and copy the git-filter-repo
script into a directory in your $PATH
such
as $HOME/bin
:
Now the filter-repo
git subcommand will be available; in other words, you
can run the command as git filter-repo
.
So how do we use filter-repo
to filter files as described in the
filter-branch
example above? Again, clone the repo and remove the origin:
then run
which, as you can see, is significantly easier to use than the previous
solution. Note that you’ll need to specify full paths to the files you want
to keep; the filter-branch
solution used a grep
hence the paths weren’t
as relevant in that case.
Much more information, including several examples, is available in the
extensive git filter-repo
documentation.
Moving just a directory
The definitive guide to moving a subdirectory is in the answer to this question on Stack Overflow: Detach (move) subdirectory into separate Git repository
To paraphrase that answer, here is how to extract just the given directory, pulling in all branches and tags.
If you don’t want all tags and branches you can just rewrite the current
HEAD
by using this version of the command:
Making the complex simple: git subtree
It turns out that splitting a subdirectory of a project out into a new project is sufficiently common that there is also a Git command especially for it:
Again, clone the original repo. This is effectively a backup of your
repository, which is a good idea, because the git subtree
command is
destructive and will rewrite your history. As I saw on a T-shirt recently:
“No backup? No pity!”.
Now we split a subdirectory of the repository (called the “prefix” in git
subtree
terminology) into its own “project” and create a new branch with
just this subdirectory and its history.
If you check out the new branch
you’ll find only the files from the subdirectory that you just split from the original project. Assuming that you’ve already made a bare repository for the new project, you can now add the bare repository as an upstream reference and push this branch to the new project’s master branch:
The nice, clean, shiny project repository can now be cloned from upstream:
And that’s it! I hope that helped someone and that it helps my forgetful future self :-)
Is there anything that I’ve missed? Was this post helpful? How could I make it better? Let me know by dropping me a line via email or pinging me on Mastodon.
Update (2022-04-01)
Since publishing this post, a reader pointed out a tool he is working on called Git X-Modules to solve the problem outlined above. He describes it like so:
The tool is designed to help with migration to monorepo and as a replacement to Git submodules/subtree. Unlike one-time conversion described in your article, it continuously syncs old and new repository, this makes the migration smooth.
Note that I haven’t used Git X-Modules and hence can’t endorse it, however it might be of interest to people looking for a higher-level solution to the problem of moving files to a new Git repository.
Support
If you liked this post and want to see more like this, please buy me a coffee!