
Question:
Okay I have 2 files. One file is data that is updated every 10 minutes while the second is data that was previously used. What I am trying to do is take one line from the new file and loop through each line of the second file and see if it matches one. If it does I dont want to use it, but if there is no match than I want to add it to a string. In what I have done so far it seems that the check does not ever find a match even though there is one. Here is what I have and a sample of the data I have been using from both files. CHECKHAIL and USEDHAIL are the two files
while(my $toBeChecked = <CHECKHAIL>){ my $found = 0; seek USEDHAIL, 0, 0 or die "$0: seek: $!"; while(my $hailCheck = <USEDHAIL>){ if( $toBeChecked == $hailCheck){ $found += 1; } } print USEDHAIL $toBeChecked; if ($found == 0){ $toEmail .= $toBeChecked; } } print $toEmail; return; }
CHECKHAIL sample data
2226 175 2 NE LAWRENCE DEADWOOD SD 44.4 -103.7 (UNR) 2305 200 2 S SISKIYOU GREENVIEW CA 41.52 -122.9 2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR) 2350 200 DANIELS E FLAXVILLE MT 48.8 -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW) 2350 175 5 N DANIELS RICHLAND MT 48.89 -106.05 DESTROYED CROPS (GGW)
USEDHAIL sample data
2226 175 2 NE LAWRENCE DEADWOOD SD 44.4 -103.7 (UNR) 2305 200 2 S SISKIYOU GREENVIEW CA 41.52 -122.9 2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR)
Solution:1
Why wouldn't you just create a hash for the first (used) file?
use strict; use warnings; my %fromUsedFile; open USEDFILE, '<', '/the/data/file/that/is/10minutesold'; $fromUsedFile{$_}++ while <USEDFILE>; close USEDFILE; while ($toBeChecked = <CHECKHAIL>) { if (defined $fromUsedFile{$toBeChecked}) { # ... line is in both the new and old file } else { # ... line is only in the new file $toBeEmailed .= $toBeChecked; } }
Solution:2
It never has an opportunity to succeed because of
while(<USEDHAIL>){ my $hailCheck = $_; if( $toBeChecked eq $hailCheck){ $found += 1; }else{ return; ### XXX } }
On the first mismatch, the sub returns to its caller. You may have meant next
instead, but for conciseness, you should remove the whole else
clause. Remove the other else { return; }
(corresponding to when $found
is true) for the same reason.
Note that your algorithm has quadratic complexity and will be slow for large inputs. It'd be better to read the used records into a hash and then for each line of CHECKHAIL
probe the %used
hash to see whether it's been processed.
With those lines removed, I get
$ ./prog.pl 2305 200 2 S SISKIYOU GREENVIEW CA 41.52 -122.9 2 INCH HAIL REPORTED WITH STORM JUST SOUTH OF GREENVIEW. (MFR) 2350 200 DANIELS E FLAXVILLE MT 48.8 -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW) 2350 175 5 N DANIELS RICHLAND MT 48.89 -106.05 DESTROYED CROPS (GGW)
As you can see, that still has a bug. You need to rewind USEDHAIL
for each line of CHECKHAIL
:
seek USEDHAIL, 0, 0 or die "$0: seek: $!"; while(<USEDHAIL>){ ...
This produces
$ ./prog.pl 2350 200 DANIELS E FLAXVILLE MT 48.8 -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW) 2350 175 5 N DANIELS RICHLAND MT 48.89 -106.05 DESTROYED CROPS (GGW)
For an example of a better way to do it, consider
#! /usr/bin/perl use warnings; use strict; sub read_used_hail { my($path) = @_; my %used; open my $fh, "<", $path or die "$0: open $path: $!"; local $" = " "; # " fix Stack Overflow highlighting while (<$fh>) { chomp; my @f = split " ", $_, 10; next unless @f; ++$used{"@f"}; } wantarray ? %used : \%used; } my %used = read_used_hail "used-hail"; open my $check, "<", "check-hail" or die "$0: open: $!"; while (<$check>) { chomp; my @f = split " ", $_, 10; next if !@f || $used{join " " => @f}; print $_, "\n"; }
Sample run:
$ ./prog.pl 2350 200 DANIELS E FLAXVILLE MT 48.8 -105.17 GOLF BALL TO HEN EGG SIZED HAIL (GGW) 2350 175 5 N DANIELS RICHLAND MT 48.89 -106.05 DESTROYED CROPS (GGW)
Solution:3
Using $_ within an inner loop can cause problems. Try naming your lines first like so:
while(my $toBeChecked = <CHECKHAIL>){ my $found = 0; while( my $hailCheck = <USEDHAIL>){
Also perl sees numeric comparison and string comparison differently. You're using string comparison instead of numeric comparison:
if ($found eq 0){
Change to:
if ($found == 0){
Solution:4
This line sticks out for me:
if ($found eq 0){
Since $found
is a boolean, perform boolean tests on it:
if (not $found) {
It also looks like your logic is a bit reversed -- in the first if
, you return if the lines do not match, and then in the second if
, you return if there was a match. Do you perhaps intend to say next;
to skip out of the innermost loop, instead?
Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
EmoticonEmoticon