Tutorial :Matching on repeated substrings in a regex


Is it possible for a regex to match based on other parts of the same regex?

For example, how would I match lines that begins and end with the same sequence of 3 characters, regardless of what the characters are?


abcabc  xyz abc xyz  

Doesn't Match:


Undefined: (Can match or not, whichever is easiest)

ababa  a  

Ideally, I'd like something in the perl regex flavor. If that's not possible, I'd be interested to know if there are any flavors that can do it.


Use capture groups and backreferences.


The \1 refers back to whatever is matched by the contents of the first capture group (the contents of the ()). Regexes in most languages allow something like this.


You need backreferences. The idea is to use a capturing group for the first bit, and then refer back to it when you're trying to match the last bit. Here's an example of matching a pair of HTML start and end tags (from the link given earlier):


This regex contains only one pair of parentheses, which capture the string matched by [A-Z][A-Z0-9]* into the first backreference. This backreference is reused with \1 (backslash one). The / before it is simply the forward slash in the closing HTML tag that we are trying to match.

Applying this to your case:


(Yes, that's the regex that Brian Carper posted. There just aren't that many ways to do this.)

A detailed explanation for posterity's sake (please don't be insulted if it's beneath you):

  • ^ matches the start of the line.
  • (.{3}) grabs three characters of any type and saves them in a group for later reference.
  • .* matches anything for as long as possible. (You don't care what's in the middle of the line.)
  • \1 matches the group that was captured in step 2.
  • $ matches the end of the line.


For the same characters at the beginning and end:


This is a backreference.


This works:

my $test = 'abcabc';  print $test =~ m/^([a-z]{3}).*(\1)$/;  

For matching the beginning and the end you should add ^ and $ anchors.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »