Tutorial :Looping through files with perl



Question:

Okay I have 2 files. One file is data that is updated every 10 minutes while the second is data that was previously used. What I am trying to do is take one line from the new file and loop through each line of the second file and see if it matches one. If it does I dont want to use it, but if there is no match than I want to add it to a string. In what I have done so far it seems that the check does not ever find a match even though there is one. Here is what I have and a sample of the data I have been using from both files. CHECKHAIL and USEDHAIL are the two files

while(my $toBeChecked = <CHECKHAIL>){          my $found = 0;          seek USEDHAIL, 0, 0 or die "$0: seek: $!";          while(my $hailCheck = <USEDHAIL>){              if( $toBeChecked == $hailCheck){                  $found += 1;              }          }          print USEDHAIL $toBeChecked;          if ($found == 0){              $toEmail .= $toBeChecked;          }      }      print $toEmail;      return;  }  

CHECKHAIL sample data

2226  175   2 NE      LAWRENCE           DEADWOOD         SD    44.4    -103.7  (UNR)    2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)    2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)    2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)  

USEDHAIL sample data

2226  175   2 NE      LAWRENCE           DEADWOOD         SD    44.4    -103.7  (UNR)    2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)  


Solution:1

Why wouldn't you just create a hash for the first (used) file?

use strict;   use warnings;  my %fromUsedFile;  open USEDFILE, '<', '/the/data/file/that/is/10minutesold';  $fromUsedFile{$_}++  while <USEDFILE>;  close USEDFILE;    while ($toBeChecked = <CHECKHAIL>) {      if (defined $fromUsedFile{$toBeChecked}) {          # ... line is in both the new and old file      } else {          # ... line is only in the new file          $toBeEmailed .= $toBeChecked;      }  }  


Solution:2

It never has an opportunity to succeed because of

while(<USEDHAIL>){      my $hailCheck = $_;      if( $toBeChecked eq $hailCheck){          $found += 1;      }else{          return;  ### XXX      }  }  

On the first mismatch, the sub returns to its caller. You may have meant next instead, but for conciseness, you should remove the whole else clause. Remove the other else { return; } (corresponding to when $found is true) for the same reason.

Note that your algorithm has quadratic complexity and will be slow for large inputs. It'd be better to read the used records into a hash and then for each line of CHECKHAIL probe the %used hash to see whether it's been processed.

With those lines removed, I get

$ ./prog.pl     2305  200   2 S       SISKIYOU           GREENVIEW        CA    41.52   -122.9  2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)    2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)    2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

As you can see, that still has a bug. You need to rewind USEDHAIL for each line of CHECKHAIL:

seek USEDHAIL, 0, 0 or die "$0: seek: $!";  while(<USEDHAIL>){  ...  

This produces

$ ./prog.pl   2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)  2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)

For an example of a better way to do it, consider

#! /usr/bin/perl    use warnings;  use strict;    sub read_used_hail {    my($path) = @_;      my %used;      open my $fh, "<", $path or die "$0: open $path: $!";      local $" = " ";  # " fix Stack Overflow highlighting    while (<$fh>) {      chomp;      my @f = split " ", $_, 10;      next unless @f;      ++$used{"@f"};    }      wantarray ? %used : \%used;  }    my %used = read_used_hail "used-hail";  open my $check, "<", "check-hail" or die "$0: open: $!";    while (<$check>) {    chomp;    my @f = split " ", $_, 10;    next if !@f || $used{join " " => @f};    print $_, "\n";  }  

Sample run:

$ ./prog.pl   2350  200             DANIELS            E FLAXVILLE      MT    48.8    -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW)  2350  175   5 N       DANIELS            RICHLAND         MT    48.89   -106.05 DESTROYED CROPS (GGW)


Solution:3

Using $_ within an inner loop can cause problems. Try naming your lines first like so:

while(my $toBeChecked = <CHECKHAIL>){      my $found = 0;      while( my $hailCheck = <USEDHAIL>){  

Also perl sees numeric comparison and string comparison differently. You're using string comparison instead of numeric comparison:

 if ($found eq 0){  

Change to:

 if ($found == 0){  


Solution:4

This line sticks out for me:

if ($found eq 0){  

Since $found is a boolean, perform boolean tests on it:

if (not $found) {  

It also looks like your logic is a bit reversed -- in the first if, you return if the lines do not match, and then in the second if, you return if there was a match. Do you perhaps intend to say next; to skip out of the innermost loop, instead?


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »