Tutorial :Remove sensitive files and their commits from Git history



Question:

I would like to put a Git project on GitHub but it contains certain files with sensitive data (usernames and passwords, like /config/deploy.rb for capistrano).

I know I can add these filenames to .gitignore, but this would not remove their history within Git.

I also don't want to start over again by deleting the /.git directory.

Is there a way to remove all traces of a particular file in your Git history?


Solution:1

For all practical purposes, the first thing you should be worried about is CHANGING YOUR PASSWORDS! It's not clear from your question whether your git repository is entirely local or whether you have a remote repository elsewhere yet; if it is remote and not secured from others you have a problem. If anyone has cloned that repository before you fix this, they'll have a copy of your passwords on their local machine, and there's no way you can force them to update to your "fixed" version with it gone from history. The only safe thing you can do is change your password to something else everywhere you've used it.


With that out of the way, here's how to fix it. GitHub answered exactly that question as an FAQ:

Note for Windows users: use double quotes (") instead of singles in this command

git filter-branch --index-filter \  'git update-index --remove filename' <introduction-revision-sha1>..HEAD  git push --force --verbose --dry-run  git push --force  

Keep in mind that once you've pushed this code to a remote repository like GitHub and others have cloned that remote repository, you're now in a situation where you're rewriting history. When others try pull down your latest changes after this, they'll get a message indicating that the the changes can't be applied because it's not a fast-forward.

To fix this, they'll have to either delete their existing repository and re-clone it, or follow the instructions under "RECOVERING FROM UPSTREAM REBASE" in the git-rebase manpage.


In the future, if you accidentally commit some changes with sensitive information but you notice before pushing to a remote repository, there are some easier fixes. If you last commit is the one to add the sensitive information, you can simply remove the sensitive information, then run:

git commit -a --amend  

That will amend the previous commit with any new changes you've made, including entire file removals done with a git rm. If the changes are further back in history but still not pushed to a remote repository, you can do an interactive rebase:

git rebase -i origin/master  

That opens an editor with the commits you've made since your last common ancestor with the remote repository. Change "pick" to "edit" on any lines representing a commit with sensitive information, and save and quit. Git will walk through the changes, and leave you at a spot where you can:

$EDITOR file-to-fix  git commit -a --amend  git rebase --continue  

For each change with sensitive information. Eventually, you'll end up back on your branch, and you can safely push the new changes.


Solution:2

Changing your passwords is a good idea, but for the process of removing password's from your repo's history, I recommend the BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branch explicitly designed for removing private data from Git repos.

Create a private.txt file listing the passwords, etc, that you want to remove (one entry per line) and then run this command:

$ java -jar bfg.jar  --replace-text private.txt  my-repo.git  

All files under a threshold size (1MB by default) in your repo's history will be scanned, and any matching string (that isn't in your latest commit) will be replaced with the string "***REMOVED***". You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive  

The BFG is typically 10-50x faster than running git-filter-branch and the options are simplified and tailored around these two common use-cases:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials & other Private data

Full disclosure: I'm the author of the BFG Repo-Cleaner.


Solution:3

I recommend this script by David Underhill, worked like a charm for me.

It adds these commands in addition natacado's filter-branch to clean up the mess it leaves behind:

rm -rf .git/refs/original/  git reflog expire --all  git gc --aggressive --prune  

Full script (all credit to David Underhill)

#!/bin/bash  set -o errexit    # Author: David Underhill  # Script to permanently delete files/folders from your git repository.  To use   # it, cd to your repository's root and then run the script with a list of paths  # you want to delete, e.g., git-delete-history path1 path2    if [ $# -eq 0 ]; then      exit 0  fi    # make sure we're at the root of git repo  if [ ! -d .git ]; then      echo "Error: must run this script from the root of a git repository"      exit 1  fi    # remove all paths passed as arguments from the history of the repo  files=$@  git filter-branch --index-filter \  "git rm -rf --cached --ignore-unmatch $files" HEAD    # remove the temporary history git-filter-branch  # otherwise leaves behind for a long time  rm -rf .git/refs/original/ && \  git reflog expire --all && \  git gc --aggressive --prune  

The last two commands may work better if changed to the following:

git reflog expire --expire=now --all && \  git gc --aggressive --prune=now  


Solution:4

To be clear: The accepted answer is correct. Try it first. However, it may be unnecessarily complex for some use cases, particularly if you encounter obnoxious errors such as 'fatal: bad revision --prune-empty', or really don't care about the history of your repo.

An alternative would be:

  1. cd to project's base branch
  2. Remove the sensitive code / file
  3. rm -rf .git/ # Remove all git info from your code
  4. Go to github and delete your repository
  5. Follow this guide to push your code to a new repository as you normally would - https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/

This will of course remove all commit history branches, and issues from both your github repo, and your local git repo. If this is unacceptable you will have to use an alternate approach.

Call this the nuclear option.


Solution:5

If you have already pushed to GitHub, the data is compromised even if you force push it away one second later because:

To test this out, I have created a repo: https://github.com/cirosantilli/test-dangling and done:

git init  git remote add origin git@github.com:cirosantilli/test-dangling.git    touch a  git add .  git commit -m 0  git push    touch b  git add .  git commit -m 1  git push    touch c  git rm b  git add .  git commit --amend --no-edit  git push -f  

If you delete the repository however, commits do disappear even from the API immediately and give 404, e.g. https://api.github.com/repos/cirosantilli/test-dangling-delete/commits/8c08448b5fbf0f891696819f3b2b2d653f7a3824 This works even if you recreate another repository with the same name.

So my recommended course of action is:

  • change your credentials

  • if that is not enough (e.g. naked pics):

    • delete the repository
    • contact support


Solution:6

Here is my solution in windows

git filter-branch --tree-filter "rm -f 'filedir/filename'" HEAD

git push --force

make sure that the path is correct otherwise it won't work

I hope it helps


Solution:7

Use filter-branch:

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch *file_path_relative_to_git_repo*' --prune-empty --tag-name-filter cat -- --all    git push origin *branch_name* -f  


Solution:8

You can use git forget-blob.

The usage is pretty simple git forget-blob file-to-forget. You can get more info here

https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/

It will disappear from all the commits in your history, reflog, tags and so on

I run into the same problem every now and then, and everytime I have to come back to this post and others, that's why I automated the process.

Credits to contributors from Stack Overflow that allowed me to put this together


Solution:9

I've had to do this a few times to-date. Note that this only works on 1 file at a time.

  1. Get a list of all commits that modified a file. The one at the bottom will the the first commit:

    git log --pretty=oneline --branches -- pathToFile

  2. To remove the file from history use the first commit sha1 and the path to file from the previous command, and fill them into this command:

    git filter-branch --index-filter 'git rm --cached --ignore-unmatch <path-to-file>' -- <sha1-where-the-file-was-first-added>..


Solution:10

So, It looks something like this:

git rm --cached /config/deploy.rb  echo /config/deploy.rb >> .gitignore  

Remove cache for tracked file from git and add that file to .gitignore list


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »