Ubuntu: SED copy specific section to different files


I've tried to create a script which saves my time. This is what I want to do: I have one big .xml file with sections, let's say it looks like this:

some text  .....  HEAD  context A  TAIL  some text  .....  HEAD  context B  TAIL  ....  some text  ....  HEAD  context C  TAIL  ....  some text  

I need to cut section with context A and move to file contextA, then context B to file contextB, and so on. (All files should contain syntax HEAD context x TAIL) The problem is that all contexts start and end in the same regex (HEAD and TAIL). I can cut all section to one file but it's not enough.

Can you help me?

It's a little update because maybe I'm not clarified it enough: Let say my file looks like that:

 some text 1   <config>   1   2   3   </config>   some text 2   <config>   4   5   6   </config>   some text 3   <config>   7   8   9   </config>   some text 4  

and I want file named:

"first" which contain:

 <config>   1   2   3   </config>  

"second" which contain:

 <config>   4   5   6   </config>  

and so on.


It is easier to do this with awk:

awk -v RS="HEAD\n" -v FS="\n" 'NR>1{print "HEAD\n" $0 > $1".txt"}' ex  


  • RS="HEAD\n" registers are separated by "HEAD\n"
  • FS="\n" each field is a line
  • NR>1{print "HEAD\n" $0 > $1} for all register except the first, write it to a file named field 1 -- "$1"

UPDATE: for the new question:

awk -v RS="<config>\n" -F"</config>" 'NR>1{print RS $1 FS > "conf-"NR-1}' ex  

The configuration outputs are stored in files named like: "conf-1"


My script using awk:

#!/bin/bash  for i in $(seq -w $(<"$1" grep -cx "$2")); do    <"$1" >$i awk -va=$i -vb="$2" -vc="$3" '$0~b{d++;e=1}d==a&&e==1;$0~c{e=0}'  done  

Save it as e.g. myscript.sh, make it executable, navigate to your onebig.xml and call it like this:

/path/to/myscript.sh onebig.xml HEAD TAIL  

It will cut out every section from onebig.xml beginning with HEAD and ending with TAIL and save them as 1, 2, … if there are less than 10 sections, as 01, 02, … if there are 10 to 99 sections, as 001, 002, … if there are 100 to 999 sections and so forth.

Short explanations

  • <"$1" grep -cx "$2" â€" count occurences of HEAD in onebig.xml, let's say that's 3
  • for i in $(seq -w 3); do …; done â€" loop over every occurence from 1 to 3, seq's-w option adds trailing zeroes if necessary
  • <"$1" >$i â€" read from onebig.xml and write to a file named like the current count
  • awk -va=$i -vb="$2" -vc="$3" â€" start awk and assign three variables, a being the count, b being HEAD and c being TAIL
  • $0~b{d++;e=1} â€" if the current line contains the content of b (= HEAD) increase d by one and set e=1
  • d==a&&e==1 â€" if d equals a (= the current count) and e equals 1 then print the current line (print is the implied action; essentially that's: if it's after the ath occurence of HEAD and we're between HEAD and TAIL then print)
  • $0~c{e=0} â€" if the current line contains the content of c (= TAIL) set e=0


If you really can't use a proper XML parser for this, then I'd suggest awk e.g.

awk '/^HEAD/ {p=1; ++n} p {print > "context"n} /^TAIL/ {p=0}' file.xml  

will output the HEAD ... TAIL sections into numerically increasing filenames context1, context2 etc.

For easier sorting, you may want to improve it a bit by constructing a fixed-width numeric prefix e.g.

$ awk '/^HEAD/ {p=1; outfile = sprintf("context%03d", ++n)} p {print > outfile} /^TAIL/ {p=0}' file.xml    $ head context*  ==> context001 <==  HEAD  context A  TAIL    ==> context002 <==  HEAD  context B  TAIL    ==> context003 <==  HEAD  context C  TAIL  


Please check whether below script helps you :

#!/bin/bash  for x in {A..Z}; do      # check if the pattern exists in the file      if grep -qF "context $x" file.txt; then          # Store the lines between the 2 patterns including the matching lines in a text file          awk '/context '$x'/,/TAIL/' file.txt > context$x.txt      else         echo "Sorry this pattern does not exists in file"      fi  done  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »