Ubuntu: Extract a number in a txt file by using regular expressions



Question:

I am saving the output of terminal by 2>&1 | tee ./ results.txt in a .txt file which has the following text:

executing: ./home/images/image-001-041.png  0,33, /results/image-001-041.png  1.7828,32, /results/image-001-040.png  1.86051,34, /results/image-001-042.png  1.90462,31, /results/image-001-039.png  1.90954,30, /results/image-001-038.png  1.91953,35, /results/image-001-043.png  1.92677,28, /results/image-001-036.png  1.92723,3160, /results/image-037-035.png  1.93353,7450, /results/image-086-035.png  1.93375,1600, /results/image-019-044.png  

I need to take the second numbers (after first comma sign, i.e. 33,32,34,...) and save it in a list in Python. What is the bash command, or the regular expression command in python? Thanks


Solution:1

Using cut:

cut -sd',' -f2 < result.txt  

from man cut:

-d, --delimiter=DELIM            use DELIM instead of TAB for field delimiter  -s, --only-delimited            do not print lines not containing delimiters  -f, --fields=LIST            select only these fields;  also print any line that contains            no delimiter character, unless the -s option is specified  


Solution:2

You could use awk

awk -F ',' '{print $2}' results.txt  

Define a comma as the field separator and print the second column.


Solution:3

Example with sed

$ sed -rn 's/[^,]+,([^,]+),.*/\1/p' results.txt  33  32  34  31  30  35  28  3160  7450  1600  

Notes

  • -n don't print anything until we ask for it (removes non-matching lines)
  • -r use ERE (so we don't need backslashes for + and ( ) metacharacters)
  • [^,]+, some non-commas followed by a comma
  • ([^,]+), save some non-commas followed by a comma for later (we only want this part)
  • .* any number of any characters (gets rid of the rest of the line)
  • \1 the pattern we saved
  • p print the lines we changed (needed with -n)


Solution:4

Since you mention Python:

with open('results.txt') as results:      ids = [int(line.split(',')[1]) for line in results if ',' in line]      print(ids)  

It creates a list of integers as ids, and displays it:

[33, 32, 34, 31, 30, 35, 28, 3160, 7450, 1600]  


Solution:5

You can use Perl which is similar to the awk and sed solutions posted.

-a enables automatic splitting on each line.

-F is used to specify the delimiter to split each line. It defaults to ' '. Then the result is stored in @F. Hence $F[1] gives us the second column.

-l makes sure a newline is added to each line.

-e is used to specify the command we need to execute on each line which is print

$ perl -F, -ale 'print $F[1]' results.txt  33  32  34  31  30  35  28  3160  7450  1600  

The above expands to the below program :

$ perl -MO=Deparse -F, -ale 'print $F[1]' results.txt  BEGIN { $/ = "\n"; $\ = "\n"; }  LINE: while (defined($_ = readline ARGV)) {      chomp $_;      our @F = split(/,/, $_, 0);      print $F[1];  }  -e syntax OK  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »