Ubuntu: Join two files, adding values in certain columns



Question:

How can I add another 2 values after matching the values from 2 columns in another file just like VLOOKUP?

Sample below. Value in column 6 & 7 from file1, when matched with column 1 & 2 from file2, will add column 9 & 10 in file1 with the value of column 3 & 4 from file2.

file1.txt

1 1 1 1 1 5 9 1    2 2 2 2 2 7 8 2    3 3 3 3 3 7 7 3    4 4 4 4 4 8 6 4  

file2.txt

5 9 A B    8 6 E F    7 7 G H    7 8 C D  

output.txt

1 1 1 1 1 5 9 1 A B    2 2 2 2 2 7 8 2 C D    3 3 3 3 3 7 7 3 G H    4 4 4 4 4 8 6 4 E F  

Thanks,


Solution:1

Use awk

awk 'NR==FNR{ seen[$1FS$2]=$3FS$4; next } { print $0, seen[$6FS$7] }' file2 file1  

and to delete empty lines from output:

awk 'NR==FNR{ seen[$1FS$2]=$3FS$4; next } NF{ print $0, seen[$6FS$7] }' file2 file1  

or a little whitespace and sensible variables names go a long way toward readability. Also, take advantage of using a comma in the array key

awk '      NR == FNR {value[$1,$2] = $3 OFS $4; next}       {print $0, value[$6,$7]}  ' file2.txt file1.txt  

  • NR is set to 1 when the first record read by awk and incrementing for each next records reading either in single or multiple input files until all read finished.
  • FNR is set to 1 when the first record read by awk and incrementing for each next records reading in current file and reset back to 1 for the next input file if multiple input files.
  • so NR == FNR is always a true condition and the block followed by this will perform actions on first file only.

  • The seen is an associated awk array with the key combination of column$1 and column$2 with the value of column$3 and column$4.

  • The next token skips to executing rest of the commands and those will only execute actually for next file(s) except first.

  • NF; presetting Number of Fields in a record where fields are known and separated with a Field Separator FS; so FS between columns there is used to intact the fields separator or you could use comma , within array instead.

  • so this NF{ print $0, seen[$6FS$7] }, print the current record $0 in file1 and the value matched with column$6 and column$7 present in the array seen when that was not an empty line.


Solution:2

I know you didn't ask for a database solution, but if you happen to have a MySQL server around, here is how to do it:

create table file1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int, c8 int);  create table file2 (c1 int, c2 int, c3 char, c4 char);  load data infile 'file1' into table file1 fields terminated by ' ';  load data infile 'file2' into table file2 fields terminated by ' ';  select f1.*, f2.c3, f2.c4 from file1 as f1       join file2 as f2           on f1.c6 = f2.c1 and f1.c7 = f2.c2       order by f1.c1;  

(I had to strip the blank lines as well)

Result:

+------+------+------+------+------+------+------+------+------+------+  | c1   | c2   | c3   | c4   | c5   | c6   | c7   | c8   | c3   | c4   |  +------+------+------+------+------+------+------+------+------+------+  |    1 |    1 |    1 |    1 |    1 |    5 |    9 |    1 | A    | B    |  |    2 |    2 |    2 |    2 |    2 |    7 |    8 |    2 | C    | D    |  |    3 |    3 |    3 |    3 |    3 |    7 |    7 |    3 | G    | H    |  |    4 |    4 |    4 |    4 |    4 |    8 |    6 |    4 | E    | F    |  +------+------+------+------+------+------+------+------+------+------+  4 rows in set (0,00 sec)  


Solution:3

Responding to @Jos's answer: sqlite

db=$(mktemp)  sqlite3 "$db" <<'END'  create table f1 (v1 text,v2 text,v3 text,v4 text,v5 text,v6 text,v7 text,v8 text);  create table f2 (v1 text,v2 text,v3 text,v4 text);  .separator " "  .import file1.txt f1  .import file2.txt f2  select f1.*, f2.v3, f2.v4 from f1,f2 where f1.v6=f2.v1 and f1.v7=f2.v2;  END  rm "$db"  


Solution:4

bash: I took the liberty of removing blank lines from the files.

declare -A keys  while read -r k1 k2 value; do       keys[$k1,$k2]=$value  done < file2.txt  while read -ra fields; do       key="${fields[5]},${fields[6]}";       echo "${fields[*]} ${keys[$key]}"  done < file1.txt  
1 1 1 1 1 5 9 1 A B  2 2 2 2 2 7 8 2 C D  3 3 3 3 3 7 7 3 G H  4 4 4 4 4 8 6 4 E F  


Solution:5

This will work although I'm pretty sure someone will come up with a much better one-liner awk solution.

cp file1.txt output.txt &&  while read -r file2_line; do      # Empty line --> continue      [[ -z "$file2_line" ]] && continue      # Find matching line      file1_matching_line=$(grep -n "$(echo "$file2_line" | cut -d' ' -f 1,2)" <(cut -d' ' -f6,7 output.txt) | grep -Po "^[0-9]+");      # no find? continue!      [[ ! $? -eq 0 ]] && continue      # Add the fields 3 and 4 of file2 to the end of the matching line of output.txt      echo "$file1_matching_line" | while read -r ml; do          sed -i "${ml}s/$/ $(echo "$file2_line" | cut -d' ' -f 3,4)/" output.txt      done  done < file2.txt && cat output.txt  

The magic happens in the line:

file1_matching_line=[...]  

Find the line number (-n) of all occurrences of field 1 and 2 of File 2

$(echo "$file2_line" | cut -d' ' -f 1,2)  

within output.txt which is a copy of file1.txt

<(cut -d' ' -f6,7 output.txt)  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »