Ubuntu: Recursively search a pattern/text only in the specified file name of a directory?



Question:

I have a directory (e.g., abc/def/efg) with many sub-directories (e.g.,: abc/def/efg/(1..300)). All of these sub-directories have a common file (e.g., file.txt). I want to search a string only in this file.txt excluding other files. How can I do this?

I used grep -arin "pattern" *, but it is very slow if we have many sub-directories and files.


Solution:1

In the parent directory, you could use find and then run grep on only those files:

find . -type f -iname "file.txt" -exec grep -Hi "pattern" '{}' +  


Solution:2

You could also use globstar.

Building grep commands with find, as in Zanna's answer, is a highly robust, versatile, and portable way to do this (see also sudodus's answer). And muru has posted an excellent approach of using grep's --include option. But if you want to use just the grep command and your shell, there is another way to do it -- you can make the shell itself perform the necessary recursion:

shopt -s globstar   # you can skip this if you already have globstar turned on  grep -H 'pattern' **/file.txt

The -H flag makes grep show the filename even if only one matching file is found. You can pass the -a, -i, and -n flags (from your example) to grep as well, if that's what you need. But don't pass -r or -R when using this method. It is the shell that recurses directories in expanding the glob pattern containing **, and not grep.

These instructions are specific to the Bash shell. Bash is the default user shell in Ubuntu (and most other GNU/Linux operating systems), so if you're on Ubuntu and don't know what your shell is, it's almost certainly Bash. Although popular shells usually support directory-traversing ** globs, they don't always work the same way. For more information, see Stéphane Chazelas's excellent answer to The result of ls * , ls ** and ls *** on Unix.SE.

How It Works

Turning on the globstar bash shell option makes ** match paths containing the directory separator (/). It is thus a directory-recursing glob. Specifically, as man bash explains:

When the globstar shell option is enabled, and * is used in a pathname expansion context, two adjacent *s used as a single pattern will match all files and zero or more directories and subdirectories. If followed by a /, two adjacent *s will match only directories and subdirectories.

You should be careful with this, since you can run commands that modify or delete far more files than you intend, especially if you write ** when you meant to write *. (It's safe in this command, which doesn't change any iles.) shopt -u globstar turns the globstar shell option back off.

There are a few practical differences between globstar and find.

find is far more versatile than globstar. Anything you can do with globstar, you can do with the find command too. I like globstar, and sometimes it's more convenient, but globstar is not a general alternative to find.

The method above does not look inside directories whose names start with a .. Sometimes you don't want to recurse such folders, but sometimes you do.

As with an ordinary glob, the shell builds a list of all matching paths and passes them as arguments to your command (grep) in place of the glob itself. If you have so many files called file.txt that the resulting command would be too long for the system to execute, then the method above will fail. In practice you'd need (at least) thousands of such files, but it could happen.

The methods that use find are not subject to this restriction, because:

  • Zanna's way builds and runs a grep command with potentially many path arguments. But if more files are found than can be listed in a single path, the +-terminated -exec action runs the command with some of the paths, then runs it again with some more paths, and so forth. In the case of greping for a string in multiple files, this produces the correct behavior.

    Like the globstar method covered here, this prints all matching lines, with paths prepended to each.

  • sudodus's way runs grep separately for each file.txt found. If there are many files, it might be slower than some other methods, but it works.

    That method finds files and prints their paths, followed by matching lines if any. This is a different output format from the format produced by my method, Zanna's, and muru's.

Getting color with find

One of the immediate benefits of using globstar is, by default on Ubuntu, grep will produce colorized output. But you can easily get this with find, too.

User accounts in Ubuntu are created with an alias that makes grep really run grep --color=auto (run alias grep to see). It's a good thing that aliases are pretty much only expanded when you issue them interactively, but it means that if you want find to invoke grep with the --color flag, you'll have to write it explicitly. For example:

find . -name file.txt -exec grep --color=auto -H 'pattern' {} +


Solution:3

You don't need find for this; grep can handle this perfectly fine on its own:

grep "pattern" . -airn --include="file.txt"  

From man grep:

--exclude=GLOB        Skip  files  whose  base  name  matches  GLOB  (using   wildcard        matching).   A  file-name  glob  can  use  *,  ?,  and [...]  as        wildcards, and \ to quote  a  wildcard  or  backslash  character        literally.    --exclude-from=FILE        Skip  files  whose  base name matches any of the file-name globs        read from FILE  (using  wildcard  matching  as  described  under        --exclude).    --exclude-dir=DIR        Exclude  directories  matching  the  pattern  DIR from recursive        searches.    --include=GLOB        Search  only  files whose base name matches GLOB (using wildcard        matching as described under --exclude).  


Solution:4

The method given in muru's answer, of running grep with the --include flag to specify a filename, is often the best choice. However, this can also be done with find.

The approach in this answer uses find to run grep separately for each file found, and prints the path to each file exactly once, above the matching lines found in each file. (Methods that print the path in front of every matching line are covered in other answers.)


You can change directory to the top of the directory tree where you have those files. Then run:

find . -name "file.txt" -type f -exec echo "##### {}:" \; -exec grep -i "pattern" {} \;  

That prints the path (relative to the current directory, ., and including the filename itself) of each file named file.txt, followed by all matching lines in the file. This works because {} is a placeholder for the file found. Each file's path is set apart from its contents by being prefixed with #####, and is printed only once, before the matching lines from that file. (Files called file.txt that contain no matches still have their paths printed.) You might find this output less cluttered than what you get from methods that print a path at the beginning of every matching line.

Using find like this will almost always be faster than running grep on every file (grep -arin "pattern" *), because find searches for the files with the correct name and skips all other files.

Ubuntu uses GNU find, which always expands {} even when it appears in a larger string, like ##### {}:. If you need your command to work with find on systems that might not support this, or you prefer to use the -exec action only when absolutely necessary, you can use:

find . -name "file.txt" -type f -printf '##### %p:\n' -exec grep -i "pattern" {} \;  

To make the output easier to read, you can use ANSI escape sequences to get coloured file names. This makes each file's path heading stand out better from the matching lines that get printed under it:

find . -name file.txt -printf $'\e[32m%p:\e[0m\n' -exec grep -i "pattern" {} \;  

That causes your shell to turn the escape code for green into the actual escape sequence that produces green in a terminal, and to do the same thing with the escape code for normal colour. These escapes are passed to find, which uses them when it prints a filename. ($' ' quotation is necessary here because find's -printf action doesn't recognize \e for interpreting ANSI escape codes.)

If you prefer, you could instead use -exec with the system's printf command (which does support \e). So another way to do the same thing is:

find . -name file.txt -exec printf '\e[32m%s:\e[0m\n' {} \; -exec grep -i "pattern" {} \;  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »