Filter branch¶
This chapter describe some uses of git filter-branch
before using
it you should be aware that this command is destructive, and even the
untouched commits end up with different object names so your new
branch is separate from the original one. If ever you repository was
shared anyone downstream is forced to manually fix their history,
by rebasing all their topic branches over the new HEAD.
More details in
git-rebase(1) - recovering from upstream rebase
When filtering branches the original refs, are stored in the namespace
refs/original/
, you can always recover your work from there, but if
you want to delete the previous state, after checking the new one is
coherent, you need to delete these refs otherwise the original object
will not be garbage collected.
If you want to make experiments without the trouble to recovering from
refs/original
you should get a copy of your repository
with:
git clone path_of_origin path_of_copy
cd path_of_copy
git branch --unset-upstream
git reset --hard
References¶
- The main reference is git documentation: filter-branch
- It is introduced in S. Chacon Pro-Git Rewriting History chapter and in Maintenance and Data Recovery - removing objects.
Removing an object or a directory¶
This can be done with --tree-filter
or -index-filter
as the
second one does not check out the tree, it is a lot quicker.
When filtering branches you may remove all the changes introduced by some commit and ends up with empty commit. Some of these emty commits are useful because they have many parents, i.e. they record a merge.
To avoid such situation you can use --prune-empty
(but it is
incompatible with --commit-filter
.
Your command will be:
git filter-branch --prune-empty --index-filter \
'git rm --cached --ignore-unmatch badfile' HEAD
Here the git rm
command has the option --cached
since we are
working on the index and --ignore-unmatch
because the file can be
absent in the index for some commits, like those anterior to the first
occurrence of the file.
If you rather want to delete a full directory content, you will add
the -r
option to make the remove recursive.:
git filter-branch --prune-empty --index-filter \
'git rm -r --cached --ignore-unmatch baddir' HEAD
If your object or directory is in many branch, cleaning HEAD will not get read of it, you should in this case clean all refs and filter all tags with:
git filter-branch --prune-empty --index-filter \
'git rm --cached --ignore-unmatch badfile' \
-tag-name-filter cat -- --all
If your unwanted blob has changed name along the history, it will still be kept with the olders name, but if you take care to find them with:
git log --name-only --follow --all -- badfile
After that your history no longer contains a reference to badfile
but all the refs/original/branch
and the reflog still do. You have
to options, if you have no backup you should do:
git clone file:///path/to/cleanrepo
It is quick since done with hardlinks and the clone will not have the removed objects.
If you have yet done a backup as proposed above
you can clean before repacking.
After a filter-branch git keep original refs, that prevent the
previously referenced object to become loose and be cleaned by garbage
collection. If you want to get rid of them you delete these refs, on
the other side if you want to keep them longer, you better rename them
to prevent them to be overrode by some next operation (even if you can
also control the original namespace with --original
option).
git for-each-ref --format='%(refname)' refs/original | \
xargs -n 1 git update-ref -d
Then your logs:
git reflog expire --expire=now --all
And you garbage collect all unreferenced objects with:
git gc --prune=now
More details in the section garbage collection.
Note: Many collaborative hosted repositories like GitHub, BitBucket and others, will not let you push back your deletes, so if you really want to be sure nobody can get your old file, you will have to delete these repos an push new ones.