Tutorial :Hex dump parsing in perl



Question:

I have a hex dump of a message in a file which i want to get it in an array so i can perform the decoding logic on it.
I was wondering if that was a easier way to parse a message which looks like this.

37 39 30 35 32 34 35 34 3B 32 31 36 39 33 34 35
3B 32 31 36 39 33 34 36 00 00 01 08 40 00 00 15
6C 71 34 34 73 69 6D 31 5F 33 30 33 31 00 00 00
00 00 01 28 40 00 00 15 74 65 6C 63 6F 72 64 69
74 65 6C 63 6F 72 64 69

Note that the data can be max 16 bytes on any row. But any row can contain fewer bytes too (minimum :1 )
Is there a nice and elegant way rather than to read 2 chars at a time in perl ?


Solution:1

Perl has a hex operator that performs the decoding logic for you.

hex EXPR

hex

Interprets EXPR as a hex string and returns the corresponding value. (To convert strings that might start with either 0, 0x, or 0b, see oct.) If EXPR is omitted, uses $_.

print hex '0xAf'; # prints '175'  print hex 'aF'; # same  

Remember that the default behavior of split chops up a string at whitespace separators, so for example

$ perl -le '$_ = "a b c"; print for split'  a  b  c

For every line of the input, separate it into hex values, convert the values to numbers, and push them onto an array for later processing.

#! /usr/bin/perl    use warnings;  use strict;    my @values;  while (<>) {    push @values => map hex($_), split;  }    # for example  my $sum = 0;  $sum += $_ for @values;  print $sum, "\n";  

Sample run:

$ ./sumhex mtanish-input   4196


Solution:2

I would read a line at a time, strip the whitespace, and use pack 'H*' to convert it. It's hard to be more specific without knowing what kind of "decoding logic" you're trying to apply. For example, here's a version that converts each byte to decimal:

while (<>) {    s/\s+//g;    my @bytes = unpack('C*', pack('H*', $_));    print "@bytes\n";  }  

Output from your sample file:

55 57 48 53 50 52 53 52 59 50 49 54 57 51 52 53  59 50 49 54 57 51 52 54 0 0 1 8 64 0 0 21  108 113 52 52 115 105 109 49 95 51 48 51 49 0 0 0  0 0 1 40 64 0 0 21 116 101 108 99 111 114 100 105  116 101 108 99 111 114 100 105  


Solution:3

I think reading in two characters at a time is the appropriate way to parse a stream whose logical tokens are two-character units.

Is there some reason you think that's ugly?

If you're trying to extract a particular sequence, you could do that with whitespace-insensitive regular expressions.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »