Ubuntu: Concatenate multiple files without header



Question:

I have several directories ("amazon", "niger",...), in which I have several subdirectories ("gfdl", "hadgem",...), in which I also have several sub-directories ("rcp8p5", "rcp4p5",...). In this last subdirectories I always have two folders ("historical", "projected") that contain thousand of tables having the same frame. Therefore, I would like to concatenate those tables (present in the two folders of the last sub-directories) in order to have just one big table with only one header and not an header each time that a table has been concatenate. Does anyone knows how to do that?

I am currently using the following loop structure:

#!/bin/bash  # usage:cat_dat dirname    data_dir=/scratch/01/stevens/climate_scenario/river    for river in tagus    do     for gcm in gfdl-esm2m hadgem2-es       do        for scenario in rcp8p5 rcp4p5 rcp6p0 rcp2p6          do            find "${data_dir}/${river}/${gcm}/${scenario}" name \*.dat -exec cat {} + >> "${data_dir}/${river}/${gcm}/${scenario}.dat"        done     done  done  

but I can´t get rid of the header with that! Any helps is greatly appreciated! Thanks!


Solution:1

Using awk in a single folder

awk 'NR==1 {header=$_} FNR==1 && NR!=1 { $_ ~ $header getline; } {print}' *.dat > out  

find and awk if you need all files in the current folder and in the subfolders. You can replace . with your desired folder.

find . -type f -name "*.dat" -print0 | \      xargs -0 awk 'NR==1 {header=$_} FNR==1 && NR!=1 { $_ ~ $header getline; } {print}' > out  

or, as getline is bad (thx @fedorqui)

find . -type f -name "*.dat" -exec awk 'NR==1 || FNR!=1' {} + ;  

Example

% cat foo1.dat   a   b   c  1   2   3    % cat foo2.dat  a   b   c  4   5   6    % awk 'NR==1 {header=$_} FNR==1 && NR!=1 { $_ ~ $header getline; } {print}' *.dat > out    % cat out   a   b   c  1   2   3  4   5   6  


Solution:2

You can use a while loop that gets fed by a find through process substitution:

d=0  while IFS= read -r file  do     [ "$d" -ge 1 ] && tail -n +2 "$file" || cat "$file"     (( d ++ ))  done < <(find "/dir/folder" name *.dat)  

So it will perform a cat on the first match and tail -n +2 on the rest.


Alternatively, if you have all the files in the same dir you can say:

awk 'FNR>1 || NR==1' files*  

This will match everything but the case when FNR==1 and NR>1, that is, everything but the header of the files after the first one. Why? Because NR holds the number of line being read overall, whereas FNR holds the number of line of the current file being read.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »