Tutorial :Does this violate the 'leftmost longest' principle?



Question:

I am trying to write a regex to recognize a single line of text, with underscore ( _ ) recognized as a line continuation character. For example, "foo_\nbar" should be considered a single line, because the "foo" ends in an underscore. I am trying:

$txt = "foo_\nbar";  print "$&\n" if $txt =~ /.*(_\n.*)*/;  

However, this prints only:

foo_  

This seems to violate the "leftmost longest" rule for Perl regexes!

Interestingly, if I remove the last star (*) in the regex, i.e.:

$txt = "foo_\nbar";  print "$&\n" if $txt =~ /.*(_\n.*)/;  

it does print:

foo_  bar  

But I need the star to recognize "0 or more" continuations!

What am I doing wrong?


Solution:1

Why is this happening was explained by @ysth. To fix it you may use the following regex:

/([^_\n]|_.)*/s  


Solution:2

Perl doesn't do "leftmost longest"; instead, each regex feature has a well-defined way of acting. Your initial * will match as many times as possible, so long as the rest of the regex can match at all. To prevent it from swallowing the _, do something like:

/(.*(?!(?<=_)\n)_\n)*.*/  


Solution:3

There are two basic flavors of regular expression designs:

POSIX defines the leftmost-longest flavor. For example: changing any "a|b" to "b|a" does nothing to the full match.

PERL defines the left-biased flavor. Each "a|b" checks the left-branch "a" and if this can match then "b" is never checked. Thus "a|b" is rarely the same as "b|a". Here a* is like ()|a|aa|aaa|aaaa|...


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »