Ubuntu: Search file with known sha1 sum


I have to find specific file with known sha1 sum. I know in which folder the file should be, but there are sub-folders (up to max-depth 4). I know more or less parts of filename (contains words "project" and "screenshoot"), but there are various possible file formats (.ods, .docx, .pdf ...). And of course I know what sha1 sum it has. How to find it?

I have to do this for about 15 files.


find + grep

Use find command

find /that/directory -type f -exec sha1sum {} \; | grep 'known sha1 sum'  

The way this works is as follows:

  • find will operate recursively on /that/directory
  • -type f allows us to filter out only regular files
  • exec sha1sum {} \; will perform sha1sum command with each file as argument ( which is what {} brackets signify )
  • grep 'known sha1sum' allows us to filter the output of find command to get the line of output with the sha1 hashsum that we need.

Bash's globstar

Another things that could be done, is to use bash's globstar to enable recursive globbing, and iterate that way. Here's how I would search for a file with known sha1sum

bash-4.3$ shopt -s globstar ;  bash-4.3$ known_sha1sum="4b1e65aab01f76b8863707eda5215af09633d275"  bash-4.3$ for f in ./**/* ; do [ -f "$f" ] && shasum=$(sha1sum "$f" | awk '{print $1}'); [ "$shasum" = "$known_sha1sum" ] && echo "$f"; done  ./golang/hello_world  

Instead of iterating via for loop, we can make this even shorter:

bash-4.3$ shopt -s globstar  bash-4.3$ sha1sum ./**/* 2>/dev/null | grep '4b1e65aab01f76b8863707eda5215af09633d275'4b1e65aab01f76b8863707eda5215af09633d275  ./golang/hello_world  

While this method might be short, I would be skeptical of this method on a directory with large amount of files, where glob might expand outside of range of maximum amount of command-line arguments. Caveat emptor

Python 3

Of course being a Python aficionado, I couldn't leave without providing a python script for this task. This script takes multiple arguments, so you can specify multiple sha1sums that you need to find, which aligns with the requirement of the question for doing this task for multiple files.

Note that the script assumes you want to search from current working directory down to subdirectories, so ensure you cd to desired top directory first

#!/usr/bin/env python3  import os  import sys  from hashlib import sha1    def get_sha1sum(file_path):      sha1sum = sha1()      with open(file_path, 'rb') as fd:          data_chunk = fd.read(1024)          while data_chunk:                sha1sum.update(data_chunk)                data_chunk = fd.read(1024)      return str(sha1sum.hexdigest())    def find_files(treeroot):      for dir,subdirs,files in os.walk(treeroot):           for f in files:                full_path = os.path.join(dir,f)               path_sha1sum = get_sha1sum( full_path  )               if path_sha1sum in sys.argv[1:]:                   print(path_sha1sum,full_path)    def main():      find_files('.')    if __name__ == '__main__': main()  

Test run:

$ ./find_with_sha1.py  '4b1e65aab01f76b8863707eda5215af09633d275' '38ab29bdda161da8082cbbc97d33747dff6fb848'        4b1e65aab01f76b8863707eda5215af09633d275 ./golang/hello_world  38ab29bdda161da8082cbbc97d33747dff6fb848 ./golang/hello_world.go  

This script is also available on my personal GitHub respository, where further development and changes will be added to this script.


How about a combination of find, sha1sum and grep:

find . -maxdepth 4 -type f | xargs -IF sha1sum "F" | grep 83976c8060222298565fd434c64ee09d19733559  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »