Ubuntu: Using if an AWK command



Question:

I have a file which contains many sub-sections each starting with [begin] and ending with [end]sample of which is shown below:

[begin li1_1378184738754_91]  header=7075|lime|0|0|109582|0|1|2700073||0|0|0|[355]|1|0|ssb-li1-1378184738754-90||0||LIME |0|saved=true|0.002406508312038836|0|[ser=zu1:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=uzu6:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzs5:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=sv-stda-zu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=hzu8:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=lzu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=yzu2:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzu7:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer]|0|null|false|40||false|  attrs=0|0|0||0|  ptitle=690751404|1|1|1|Rest:1998636||||||2700401|175619|900.5636134725806|0.985486|39.166666666666664|$9.99|100.0|1|||  seller=1998636|1|9.99|1|-1||0|||||true||4.7937584|10412|false|  ptitle=5543369186|2|1|1|Rest:1533891||||||2700211|19615|886.8211044369053|0.776121|34.0|$119.99|100.0|1|||  seller=1533891|1|119.99|3|-1|1.0:text,In+size+6.0%2C7.0%2C8.0%2C8.5%2C9.0%2C9.5%2C10.0%2C...,0.0,,,,0,0,|2|||||true||2.95|20|true|  ptitle=622529158|3|1|1|||||||2700408|67402|796.5289827432475|0.893899|63.0|$5.27|100.0|1|||  seller=4281413|1|5.27|1|-1||0|||||true||4.695052|1769|true|  ptitle=5507199621|4|1|1|||||||2700220|56412|706.9031281251306|0.791171|45.0|$99.99|100.0|1|||  seller=4806107|1|-1.0|1|-1|1.0:sale,$,30.000000000000014,0.0,,,,0,0,:text,In+size+6.0%2C6.5%2C7.0%2C7.5%2C8.0%2C8.5%2C9.0%2C9...,0.0,,,,0,0,|2||||$130 $30.00 off|false||5.0|1|false|  ptitle=5502728013|5|1|1|||||||900000|0|698.7772340643119|0.836740|75.0|$40.95|100.0|1|||  seller=955448|1|40.95|1|-1||0|||||false||4.142857|7|false|  ptitle=840662011|6|1|1|Rest:265238||||||300233|62718|683.2927820751431|0.995513|52.0|$22.95|100.0|1|||  seller=265238|1|22.95|1|-1||0|||||false||4.478261|23|false|  ptitle=848084980|8|1|1|||||||2700073|145653|670.4809846773688|0.880587|60.0|$24.99|100.0|1|||  seller=5267046|1|24.99|1|-1||0|||||true||0.0|0|false|  ptitle=891200492|9|1|1|Rest:1030132||||||2701003|17215|668.8437575254773|0.825491|32.0|$519.99|100.0|1|||  seller=1030132|1|519.99|1|-1||0|||||false||4.7391305|23|false|  ptitle=641974054|10|1|1|||||||900000|69433|667.6678790058678|0.752129|57.0|$4.19|100.0|1|||  seller=3365158|1|4.19|1|-1||0|||||true||4.70907|4410|true|  ptitle=517591869|12|1|1|Rest:4802895||||||2700408|127644|643.0972570735605|0.893899|17.25|$23.95|100.0|1|||  seller=4318776|1|-1.0|3|-1||0|||||false||0.0|0|false|  ptitle=541549480|13|1|1|Rest:1180414||||||2702000|105832|597.4904572011968|0.752129|24.666666666666664|$8.27|100.0|1|||  seller=4636561|1|8.27|1|-1||0|||||false||4.8283377|734|true|  ptitle=1020561900|14|1|1|||||||2700063|159813|594.4717491579845|0.934869|75.0|$5.39|100.0|1|||  seller=4722645|1|5.39|1|-1|1.0:sale,$,0.6000000000000005,0.0,,,,0,0,:text,Free+Shipping+on+All+Orders%21,0.0,201301010000/,,,0,0,|2||||$5.99 $0.60 off|true||4.3942246|1593|true|  ptitle=507792308|15|1|1|Rest:4683455||||||2702000|105832|591.7739184402442|0.768311|22.5|$9.48|100.0|1|||  seller=4910651|1|-1.0|2|-1||0|||||false||5.0|1|false|  ptitle=1090571346|16|1|1|Rest:4452919||||||2700211|20824|776.4814913363535|0.776121|35.0|$59.99|100.0|1|||  seller=1533891|1|59.99|1|-1|1.0:sale,$,49.99999999999999,0.0,,,,0,0,:text,In+size+7.5%2C8.0%2C8.5%2C9.0%2C9.5%2C10.0%2C10.5...,0.0,,,,0,0,|2||||$110 $50.00 off|true||2.95|20|true|  ptitle=573017390|17|1|1|||||||2700073|91937|679.695660577044|0.880587|33.5|$14.85|100.0|1|||  seller=4281413|1|14.85|1|-1||0|||||true||4.695052|1769|true|  ptitle=5502723300|18|1|1|||||||900000|0|639.3095640940136|0.836740|75.0|$50.95|100.0|1|||  seller=955448|1|50.95|1|-1||0|||||false||4.142857|7|false|  ptitle=940022974|20|1|1|||||||2700600|58701|569.9503499778303|0.875839|59.0|$14.40|100.0|1|||  seller=4825227|1|14.4|1|12||0|||||true||4.0289855|276|true|  ptitle=5513277553|21|1|1|||||||2700220|56412|565.2712749001105|0.776121|44.33333333333333|$129.95|100.0|1|||  seller=4825252|1|129.95|1|23||0|||||true||4.0289855|276|true|  ptitle=890329961|22|1|1|||||||2700408|133796|564.7642425785796|0.837916|34.75|$61.95|100.0|1|||  seller=4825235|1|61.95|4|19||0|||||true||4.0289855|276|true|  ptitle=753852910|24|1|1|||||||2700073|146738|557.7419123688652|0.934869|47.69230769230769|$26.99|100.0|1|||  seller=4722645|1|26.99|10|-1|1.0:sale,$,3.0,0.0,,,,0,0,:text,Free+Shipping+on+All+Orders%21,0.0,201301010000/,,,0,0,|2||||$29.99 $3.00 off|true||4.3942246|1593|true|  ptitle=654738989|26|1|1|||||||900000|84012|554.7756559595525|0.752129|57.0|$3.19|100.0|1|||  seller=3365158|1|3.19|1|-1||0|||||true||4.70907|4410|true|  ptitle=707747307|27|1|1|Rest:4736009||||||2700063|76249|552.234395428327|0.889614|19.857142857142854|$6.39|100.0|1|||  seller=4736009|1|6.39|1|-1||0|||||false||4.8071113|15356|true|  ptitle=63531001|28|1|1|||||||2700408|82712|625.0421885589608|0.893899|47.166666666666664|$7.69|100.0|1|||  seller=4281413|1|7.69|3|-1||0|||||true||4.695052|1769|true|  ptitle=5502728016|29|1|1|||||||900000|0|605.9895507237038|0.836740|75.0|$503.00|100.0|1|||  seller=955448|1|503.0|1|-1||0|||||false||4.142857|7|false|  ptitle=507792308|31|1|1|Rest:4683455||||||2702000|105832|559.6902659046442|0.752129|22.5|$8.99|100.0|1|||  seller=5105812|1|-1.0|1|-1||0|||||false||0.0|0|false|  ptitle=753852910|32|1|1|||||||2700073|146738|545.9987095658629|0.870929|47.69230769230769|$22.49|100.0|1|||  seller=4143386|1|22.49|6|-1|1.0:sale,$,7.5,0.0,,,,0,0,:text,Free+Shipping+on+Orders+Over+%24100,0.0,201109010000/201409302359,,,0,0,|2||||$29.99 $7.50 off|false||4.7316346|2355|true|  ptitle=5513277553|33|1|1|Rest:1533891||||||2700220|56412|653.3133907916089|0.825491|44.33333333333333|$149.99|100.0|1|||  seller=1533891|1|149.99|3|-1|1.0:text,In+size+5.0%2C5.5%2C6.0%2C6.5%2C7.0%2C7.5%2C8.0%2C8...,0.0,,,,0,0,|2|||||true||2.95|20|true|  ptitle=63531001|34|1|1|||||||2700408|82712|541.8233547780552|0.893899|47.166666666666664|$7.72|100.0|1|||  seller=2370155|1|7.72|4|-1||0|||||false||4.85|40|false|  ptitle=1018957017|35|1|1|||||||2700073|145653|540.6093714604533|0.860614|56.0|$25.95|100.0|1|||  seller=5036683|1|25.95|1|-1||0|||||false||4.8405056|366|false|  ptitle=743682867|36|1|1|||||||2700073|63437|539.5985846455641|0.870929|58.0|$46.99|100.0|1|||  seller=193176|1|46.99|1|-1||0|||||true||4.8511987|1418|true|  ptitle=679858288|37|1|1|||||||2700063|188669|535.1360632897284|0.902031|30.0|$12.41|100.0|1|||  seller=4143386|1|12.41|2|-1|1.0:sale,$,1.379999999999999,0.0,,,,0,0,:text,Free+Shipping+on+Orders+Over+%24100,0.0,201109010000/201409302359,,,0,0,|2||||$13.79 $1.38 off|false||4.7316346|2355|true|  ptitle=994328713|38|1|1|||||||2700073|71463|534.7715925279717|0.870929|58.0|$1.29|100.0|1|||  seller=1787388|1|1.29|1|-1||0|||||false||4.680464|3624|false|  ptitle=886915818|40|1|1|||||||2700444|201835|529.7519801432289|0.934869|65.5|$44.99|100.0|1|||  seller=4561883|1|44.99|2|-1||0|||||true||4.7913384|508|false|  seller_hidden=227502|990765963|1147436601|-1  seller_hidden=5310958|622529158|5645627277|-1  seller_hidden=4825254|5543369186|5651114316|23  seller_hidden=5289138|5548930281|5653769481|-1  [end li1_1378184738754_91]  

I am trying to run the command cat /home/nextag/logs/OutpdirImpressions.log.2013-09-02 | awk -F "$begin" '{print $0}' | awk '$0 ~ "header=7075" {print $0}' As per this command i want to split the entire file into sub-sections beginning with the word 'begin'. Now in that i want those sub-sections which contains 'header=7075'

Expected output is that it will print the entire sub-section(those which contain that string), but i am getting only this portion as output:

header=7075|lime|0|0|109582|0|1|2700073||0|0|0|[355]|1|0|ssb-li1-1378184738754-90||0||LIME |0|saved=true|0.002406508312038836|0|[ser=zu1:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=uzu6:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzs5:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=sv-stda-zu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=hzu8:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=lzu3:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=yzu2:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer][ser=xzu7:mtu=model_other_20120806calibex.csv:mu=model_other_20120806calibex.csv:scorerClassUsed=LinearPersonalizedProductSearchScorer]|0|null|false|40||false|

I have tried using if in various ways, but it doesn't works. Can somebody please help me. I tried awk -F "$begin" '{if($0 ~ "header=7075") {print $0}}' /home/nextag/logs/OutpdirImpressions.log.2013-09-02. It gave the same result

Can somebody please suggest that how do i get the complete sub-section in the result


Solution:1

Awk script - section delimiters not passed + comments

This variant does not pass the [begin ...] and [end ...] delimiters.

#!/usr/bin/awk -f    BEGIN   {      insect=0    # we are out of a section      }    /^\[begin [a-z0-9_]+\]/ {      insect=1    # section opening      next      }    insect == 1 {      if($0 ~ /^header=7075\|/)   {          insect=2    # we are inside the right section          }      else    {          insect=0    # we are in a different section          next          }      }    /^\[end [a-z0-9_]+\]/ && (insect == 2 || insect == 1)   {      exit 0      # end of the right section -> stop processing      }    insect == 2 {      print       # we are inside the right section -> pass all lines      }  

Awk script - section delimiters passed + contracted code

#!/usr/bin/awk -f  BEGIN {ins=0}  /^\[begin [a-z0-9_]+\]/ {beg=$0; getline; if($0 ~ /^header=7075\|/) {print beg; ins=1}}  /^\[end [a-z0-9_]+\]/ && ins {print; exit 0}  ins  

as one-liner :) I do not understand the demand for one-liners but here it is:

awk 'BEGIN {ins=0} /^\[begin [a-z0-9_]+\]/ {beg=$0; getline; if($0 ~ /^header=7075\|/) {print beg; ins=1}} /^\[end [a-z0-9_]+\]/ && ins {print; exit 0} ins'  

The advantage of the programs in this answer is that they process the input line-by-line as is usual in Unix utilities. This makes the program to be able to process very log sections without extreme memory demands and allows the program to run in parallel (on multi-core CPUs) with other programs in a pipe.


Solution:2

Bash script

#!/bin/bash    section=""  insect=0  while read line      do if [ "$insect" -eq "1" ] || [ "x$(echo -e $line | grep '\[begin')" != "x" ]; then          insect=1          section="${section}${line}\n"      else          continue      fi      if [ "x$(echo -e $line | grep '\[end')" != "x" ]; then          if [ "x$(echo -e $section | grep 'header=7075')" != "x" ]; then              echo -e "$section"          fi          section=""          insect=0      fi  done < OutpdirImpressions.log  

Python script

(better performance than bash)

#!/usr/bin/env python    section = ''  insect = False  with open('OutpdirImpressions.log', 'r') as f:      while True:          line = f.readline()          if line == '':              break          if insect or line.startswith('[begin'):              insect = True              section += line          else:              continue          if line.startswith('[end'):              if 'header=7075' in section:                  print(section)              section = ''              insect = False  


Solution:3

I am not sure if this answers your question at all, i.e. whether you are interested in 'whatever works' or much rather in awk-specific answer, but it seemed to me you want a oneliner (not that these examples are very wieldy):

python3 -c "import re; print(*[rec for rec in re.findall('(?ms)\[begin.*?(?=\[begin|\Z)', open('OutpdirImpressions.log.2013-09-02').read()) if 'header=7075' in rec])"  

and for Python 2.6 or 2.7 (no 'star' unpacking for all I remember):

python -c "import re; print([rec for rec in re.findall('(?ms)\[begin.*?(?=\[begin|\Z)', open('OutpdirImpressions.log.2013-09-02').read()) if 'header=7075' in rec][0])"  


Solution:4

found the one liner solution for the same:-

awk '$1=="[end"{p=0}/^header=7075/{p=1}p' file  

or sed '/^\[end/G' input.txt | awk -vRS='' '/header=7075/'

both servers the purpose


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »