Ubuntu: Comparison between two tab separated files in shell script using awk



Question:

I've written this code but I am facing the problem as mentioned below.

My Code is:

paste 1.txt   2.txt|  awk ' { FS = "\t " } ; NR == 1 { n = NF/2 }                {for(i=1;i<=n;i++)                   if($i!=$(i+n))                     {c = c s i; s = "," }                 if(c)                   {print "Line No. " NR-1 " COLUMN NO " c;                    c = "" ; s = "" } } '  

Expected Output:

Line No. 2 COLUMN NO 2,3  Line No. 4 COLUMN NO 1,2,3,4  Line No. 6 COLUMN NO 2,3,4,5  Line No. 7 COLUMN NO 1,2,3,4,5  

Output that is getting generated:

Line No. 2 COLUMN NO 2,3  Line No. 4 COLUMN NO 1,2,3,4  Line No. 6 COLUMN NO 2,3,4,5  Line No. 7 COLUMN NO 1,2,3,4  

Below specified file is space separated. To understand it better I have formatted it this way.

File1:

ID_ID   First_name  Last_name Address                      Contact_Number  ID1     John        Rock      32, Park Lake, California    2222200000  ID2     Tommy       Hill      5322 Otter Lane Middleberge  3333300000  ID3     Leonardo    Test      Half-Way Pond, Georgetown    4444400000  ID8     Rhyan       Bigsh     6762,33 Ave N,St. Petersburg 5555500000  ID50    Steve       Goldberg  6762,33 Ave N,St. Petersburg 6666600000  ID60    Steve       Goldberg                               6666600000  

File2:

ID_ID   First_name  Last_name   Address                      Contact_Number  ID1     John        Rock        32, Park Lake, California    2222200000  ID2     Tommy1      Hill1       5322 Otter Lane Middleberge  3333300000  ID3     Leonardo    Test        Half-Way Pond, Georgetown    4444400000  ID80    Sylvester   Stallone                                 5555500000  ID50    Steve       Goldberg    6762,33 Ave N,St. Petersburg 6666600000  ID60    Mark        Waugh       St. Petersburg               7777700000  ID70    John        Smith                                    8888800000  


Solution:1

Here is a hint. The output of:

paste 1.txt 2.txt | awk '  { FS = "\t" }  NR == 1 { n = NF/2 } {    for(i=1;i<=n;i++) print "\"" $i "\" " ($i == $(i+n) ? "==":"!=") " \"" $(i+n) "\""    print "###############"  }'  

which compares and prints every field in every record between files, ends with:

"ID60" == "ID60"  "Steve" != "Mark"  "Goldberg" != "Waugh"  "" != "St. Petersburg"  "6666600000" != "7777700000"  ###############  "" != "8888800000"  "ID70" != ""  "John" != ""  "Smith" != ""  "" == ""  ###############  

There are two errors:

  1. There is an hidden off-by-one error in the case of a line only existing in the second file. This is because a missing record has one field, an empty string before the tab added by paste. Therefore, in this case, you are effectively comparing the fields in the order 5,1,2,3,4.

  2. Field 4 is empty for this line in both files (albeit in different ways), so I would have expected the output:

    Line No. 7 COLUMN NO 1,2,3,5

To get the exact output you want, the crude fix below will report that all fields don't match if a line is only present in one file. This can be detected by the NF == n+1 added to the if, because there should be only n+1 and not 2*n fields for a line only present in one file, whichever file that is.

paste 1.txt 2.txt | awk '  { FS = "\t" }  NR == 1 { n = NF/2 } {    for(i=1;i<=n;i++) if(NF == n+1 || $i!=$(i+n)) {c = c s i; s = "," }    if(c){print "Line No. " NR-1 " COLUMN NO " c; c = "" ; s = "" }  }'  

This assumes that all records contain the correct number of tabs, n-1.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »