Tutorial :Why don't my Perl regexes correctly extract a filename from a path?



Question:

I am trying to parse the filename from paths. I have this:

my $filepath = "/Users/Eric/Documents/foldername/filename.pdf";  $filepath =~ m/^.*\\(.*[.].*)$/;  print "Linux path:";  print $1 . "\n\n";  print "-------\n";    my $filepath = "c:\\Windows\eric\filename.pdf";  $filepath =~ m/^.*\\(.*[.].*)$/;  print "Windows path:";  print $1 . "\n\n";  print "-------\n";    my $filepath = "filename.pdf";  $filepath =~ m/^.*\\(.*[.].*)$/;  print "Without path:";  print $1 . "\n\n";  print "-------\n";  

But that returns:

Linux path:    -------  Windows path:Windowsic                        ilename.pdf    -------  Without path:Windowsic                        ilename.pdf    -------  

I am expecting this:

Linux path:  filename.pdf  -------  Windows path:  filename.pdf  -------  Without path:  filename.pdf  -------  

Can somebody please point out what I am doing wrong?

Thanks! :)


Solution:1

Well, the answer to what is happening would be: various errors.

my $filepath = "/Users/Eric/Documents/foldername/filename.pdf";  $filepath =~ m/^.*\\(.*[.].*)$/;  print "Linux path:";  print $1 . "\n\n";  print "-------\n";  

$filepath doesn't have any \\s in it, so it won't match and there's no $1. You put /s in it. Your expression would have to be:

# regular expression matches return their captures in a list context.  my ( $path ) = $filepath =~ m|/([^/.]*\.[^/.]*)$|;  print "Linux path:$path\n\n-------\n"; # little need to . a " string    my $filepath = "c:\\Windows\eric\filename.pdf";  $filepath =~ m/^.*\\(.*[.].*)$/;  print "Windows path:";  print $1 . "\n\n";  print "-------\n";  

You're using double quotes, which, taking their cue from UNIX shells, are more active than single quote strings. Thus, you need to escape all your backslashes, like this:

my $filepath = "c:\\Windows\\eric\\filename.pdf";  

or just use single quotes:

my $filepath = 'c:\Windows\eric\filename.pdf';  

Actually, since perl understands '/' for windows, this works too (but not for the regex.)

my $filepath = "c:/Windows/eric/filename.pdf";  

As long as you fix it before handing it back to Windows.

my $filepath = "filename.pdf";  $filepath =~ m/^.*\\(.*[.].*)$/;  print "Without path:";  print $1 . "\n\n";  print "-------\n";  

This didn't match, so $1 is still the last match. That's why it's repeated. But this points up the value of catching the captures instead of referring to $1.


Solution:2

In this case, as others have said, the mistake is to do it by hand.

In addition to File::Basename, you should take a look at File::Spec and Path::Class. They offer well-tested, cross-platform methods for handling files and directories. Path::Class in particular provides helper methods for dealing with file and directory names that are foreign to the system the script lives on. It looks like that might come in handy here.

#!/usr/bin/env perl  use strict;  use warnings;  use Path::Class qw/file foreign_file/;    my $nix = "/Users/Eric/Documents/foldername/filename.pdf";  my $win = 'c:\\Windows\eric\filename.pdf'; # single quote to avoid escape issues    print file($nix)->basename(), "\n";  print foreign_file('Win32', $win)->basename(), "\n";  


Solution:3

Why not use File::Basename?

$name = basename($filepath)  print $name  

The regex

m/^.*\\(.*[.].*)$/  #    ^^  

assumes a separator \, so case 1 and 3 will never match. In case 2,

"c:\\Windows\eric\filename.pdf";  

\e and \f are both special characters in Perl. So the code "correctly" returns Windows\eric\filename.pdf as the filename. Remember to use \\!


Solution:4

Perl provides this capability: http://perldoc.perl.org/File/Basename.html

You also need to be wary of string escapes - your Windows path string is being escaped on '\', '\f' and '\e' - it's been a while since I've dealt with Perl escapes, but I'm guessing the \e is also swallowing the 'r' after it. This explains the unexpected output.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »