How do I replace an arbitrary number of backreferences in sed or Perl? (for obfuscating mailto)


I'm looking for a way to obfuscate mailtos in the source code of a web site. I'd like to go from this:


To this:

href="" onmouseover="this.href='mai'+'lto:'+'pre'+'sid'+'ent'+'@wh'+'ite'+'hou'+'se.'+'gov'"</code>  

I'm probably going to go with a PHP solution instead, like this (that way I only have to globally replace the entire mailto, and the source on my end will look better), but I spent too much time looking at sed and Perl and now I can't stop thinking about how this could be done! Any ideas?

Update: Based heavily on eclark's solution, I eventually came up with this:

#!/usr/bin/env perl -pi  if (/href="mailto/i) {      my $start = (length $`) +6;      my $len = index($_,'"',$start)-$start;      substr($_,$start,$len,'" onmouseover="this.href=' .      join('+',map qq{'$_'}, substr($_,$start,$len) =~ /(.{1,3})/g));  }


Building on Sinan's idea, here's a short perl script that will process a file line by line.

#!/usr/bin/env perl -p    my $start = index($_,'href="') +6;  my $len = index($_,'"',$start)-$start;  substr($_,$start,$len+1,'" onmouseover="this.href=' .    join('+',map qq{'$_'}, substr($_,$start,$len) =~ /(.{1,3})/g)  );  

If you're going to use it, make sure you have your old files committed to source control and change the -p option to -i, which will rewrite a file in place.


#!/usr/bin/perl    use strict; use warnings;    my $s = 'mailto:president@whitehouse.gov';    my $obfuscated = join('+' => map qq{'$_'}, $s =~ /(.{1,3})/g );    print $obfuscated, "\n";  



Note that 'lto: is four characters, whereas it looks like you want three character groups.


Is this close enough?

use strict;   use warnings;     my $old = 'href="mailto:president@whitehouse.gov"';  $old =~ s/href="(.*)"/$1/;  my $new = join '+', map { qq('$_') } grep { length $_ } split /(.{3})/, $old;  $new = qq(href=""\nonmouseover="this.href=$new\n");  print "$new\n";    __END__    href=""  onmouseover="this.href='mai'+'lto'+':pr'+'esi'+'den'+'t@w'+'hit'+'eho'+'use'+'.go'+'v'  "  


Just an example.

$ echo $s  href="mailto:president@whitehouse.gov"    $ echo $s | sed 's|\(...\)|\1+|g' | sed 's/hre+f=\"/href="" onmouseover="this.href=/'  href="" onmouseover="this.href=+mai+lto+:pr+esi+den+t@w+hit+eho+use+.go+v"  


Ack! Thppfft! I offer you this hairball:

s='href="mailto:president@whitehouse.gov"'  echo "$s" | sed -n 's/=/=\x22\x22\n/;  h;  s/\n.*//;  x;  s/[^\n]*\n//;  s/"//g;  s/\(...\)/\x27&\x27+/g;  s/.*/onmouseover=\x22this.href=&\x22/;  x;  G;  s/\n//2;  s/+\([^\x22]\{1,2\}\)\x22$/+\x27\1\x27\x22/;  s/+\x22$/\x22/;  p'  

