Tutorial :How can I sort an array so that certain file extensions sort to the top?



Question:

I have an array containing a list of files. I want to sort it in a way that it will let me have .txt files in the beginning of the array and the rest of files after that.

This is what I'm doing now, which works fine.

@files = (grep(/\.txt$/,@files),grep(!/\.txt$/,@files));  

Is there a better way to do it though?


Solution:1

Sort takes an optional block as first argument, though in this case a Schwartzian transform would be quicker.

@files = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, !/\.txt$/ ] } @files;  


Solution:2

You asked a follow-up comment about doing this for more than one file extension. In that case, I'd build off the Schwartzian Transform. If you're new to the ST, I recommend Joseph Hall's explanation in Effective Perl Programming. Although the Second Edition is coming out very soon, we basically left his explanation as is so the first edition is just as good. Google Books seems to only show one inch of each page for the first edition, so you're out of luck there.

In this answer, I use a weighting function to decide which extensions should move to the top. If an extension doesn't have an explicit weight, I just sort it lexigraphically. You can fool around with the sort to get exactly the order that you want:

@files = qw(      buster.pdf      mimi.xls      roscoe.doc      buster.txt      mimi.txt      roscoe.txt      buster.rpm      mimi.rpm      );    my %weights = qw(      txt 10      rpm  9      );    my @sorted =       map { $_->{name} }      sort {           $b->{weight} <=> $a->{weight}           ||          $a->{ext}    cmp $b->{ext}           ||          $a cmp $b          }      map {          my( $ext ) = /\.([^.]+)\z/;              { # anonymous hash constructor              name => $_,              ext => $ext,              weight => $weights{$ext} || 0,              }          }      @files;    $" = "\n";  print "@sorted\n";  


Solution:3

 

@sorted = sort { $b=~/\.txt$/ <=> $a=~/\.txt$/  ||  $a cmp $b } @files  

will put .txt files first and otherwise sort lexicographically (alphabetically).

@sorted = sort { $b=~/\.txt$/ <=> $a=~/\.txt$/ } @files  

will put .txt files first and otherwise preserve the original order (sort is stable since Perl 5.8)


Solution:4

You just need to add a sort in front of each of your greps:

 my @sorted =     (     sort( grep /\.txt\z/,   @files ),     sort( grep ! /\.txt\z/, @files )     );  

The trick here is that you are partitioning the list then sorting each partition independently. Depending on what you are doing, this might be a lot better than trying to do everything in one sort operation. Conversely, it might not always be better.

There are various other ways to get this done, but they aren't this simple. :)

Here's a quick benchmark on my MacBook Air with vanilla Perl 5.10.1:

There are 600 files to sort       brian:  3 wallclock secs @ 369.75/s (n=1161)     control:  3 wallclock secs @ 1811.99/s (n=5744)        leon:  4 wallclock secs @ 146.98/s (n=463)     mobrule:  3 wallclock secs @ 101.57/s (n=324)        sort:  4 wallclock secs @ 559.62/s (n=1746)  

Here's the script:

use Benchmark;    use vars qw(@files);    @files = qw(      buster.pdf      mimi.xls      roscoe.doc      buster.txt      mimi.txt      roscoe.txt      ) x 100;      printf "There are %d files to sort\n", scalar @files;    sub leon {        my @sorted =           map { $_->[0] }           sort { $a->[1] <=> $b->[1] }           map { [ $_, !/\.txt$/ ]           } @files;      }    sub brian {       my @sorted =         (         sort( grep /\.txt\z/,   @files ),         sort( grep ! /\.txt\z/, @files )         );      }    sub mobrule {      my @sorted =           sort { ($b=~/\.txt\z/) <=> ($a=~/\.txt\z/)  ||  $a cmp $b }           @files;      }    sub plain_sort {      my @sorted = sort @files;      }    sub control {      my @sorted = @files;      }    timethese( -3,       {       brian   => \&brian,       leon    => \&leon,       mobrule => \&mobrule,       control => \&control,       sort    => \&plain_sort,       }       );  


Solution:5

To handle multiple extensions efficiently, you could modify brian d foy's sorted greps by partitioning your array in one pass, and then sort each partition independently.

use strict;  use warnings;    use List::MoreUtils qw(part);    my @files = qw(      bar        Bar.pm       bar.txt      bar.jpeg   foo          foo.pm      foo.jpeg   zebra.txt    zebra.pm      foo.bat    foo.c        foo.pl      Foo.pm     foo.png      foo.tt      orange     apple        zebra.stripe  );      my @parts = part { get_extension_priority($_) } @files;    my @sorted = map { sort( @{ $_ || [] } ) } @parts;     print map "$_\n", @sorted;    BEGIN {        # Set extension priority order      my @priority = qw( stripe txt nomatch pl jpeg  );        # make a hash to look up priority by extension      my %p = map { $priority[$_], $_ } 0..$#priority;        sub get_extension_priority {          my $file = shift;            return scalar @priority               unless /[.](\w*)$/;            return scalar @priority               unless exists $p{$1};            return $p{$1};      }  }  


Solution:6

Code golf? This will not produce nasty warnings:

@files = map { $_->[0] } sort { @$b <=> @$a } map { [$_, /\.txt$/] } @files  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »