Completely removing accidentally committed files from the Git history

Completely removing accidentally committed files from the Git history

You've accidentally added and committed a large binary file or several large files to a Git repository? And to make things even worse, you've already pushed your changes to the remote?

The most important thing is to understand that simply deleting the files with a subsequent commit is not going to help you: The files will still be part of the repository's history, blowing up its size. So you'll need to remove them from the history as well.

If you Google around, there are lots of different solutions, ranging from plain git calls over custom scripts to full-fledged tools. However, as long as you're on a branch whose history you can modify without causing trouble for others (i.e., you can force-push without creating a mess), it's not that dire. If you haven't pushed anything yet to a remote repository, even better.

Please note that I wrote up this post as a personal reference and to share with students, collaborators, and colleagues. It assumes that you are reasonably familiar with Git. If you're not certain that you understand the previous two paragraphs, please don't proceed blindly, as you'll likely lose data or irreversibly mess up the entire Git repository. (If I sent you this link personally, reach out and we'll go through the steps together.)

The problematic file(s) are in the most recent commit

Congratulations, that's the best-case scenario!

First, do a soft reset to the previous commit:

git reset --soft HEAD^

Then, inspect and verify the repository's state:

git status

Now you can unstage the problematic files:

git restore --staged PATH/OF/PROBLEMATIC/FILE(S)

Verify that all problematic files have been unstaged using git status.

At this point, you most likely want to edit your .gitignore file (or create one) to prevent the problem from re-occurring. Don't forget to add this file to the staging area!

Finally, create a new commit. You can just pull up the original git commit call and execute it again.

All that's left to do is a git push -f and you're good to go.

The commits with the problematic file(s) are buried in the branch's history

In this case, you could follow the same protocol as above: Do a soft reset to the commit right before the one that introduced the first of the problematic files, deal with them, and then replicate your commits. But this will likely either see you lose your mind trying to assemble meaningful commits using git add -p or you'll give up and simply add everything in one giant commit.

There is a much more convenient solution using an interactive rebase. I learned this from the Microsoft Azure Repos documentation.

Here's the short version:

First, figure out the most recent "good" commit (the one before the commit that added the first problematic file) using git log or gitk. You'll need the first handful of characters of the commit hash.

Second, inspect the history of your branch (again, using git log or gitk, or by browsing it on GitHub or GitLab) and note down which commits introduced problematic files. (Alternatively, you can also mark all commits as edit during the interactive rebase in the next step and inspect the history as you're rewriting it. However, I strongly recommend you go in with a clear plan and complete understanding of the entire history you're about to modify.)

Third, start an interactive rebase of your branch onto this commit:

git rebase -i HASH_OF_THE_MOST_RECENT_GOOD_COMMIT

A text editor will come up that shows all commits between the "good" commit and the one you're currently on. By editing the text file presented to you, you can select which commits to keep (pick), to omit entirely (drop), or to amend (edit). The full list of commands available is included in the file Git presents to you.

Saving the file and closing the text editor starts the rebase process. From here on out, it works like a standard rebase. The only exception is that Git stops to allow you to modify the commits you marked as edit before the rebase continues. Note that even though you're "rebasing the branch on itself", conflicts might still occur if you've dropped intermediate commits.

Don't forget to edit your .gitignore file (or create one) to prevent the problem from re-occurring. (And don't forget to commit it at some point!)

Once the rebase has been completed and you've verified that the history no longer includes the problematic files, git push -f and you're good to go.