Tutorial :Get start location of capturing group within regex pattern


Basically, I want to find the index for the first occurrence of any of the substrings: "ABC", "DEF", or "GHI", so long as they occur in an interval of three. The regex that I wrote to match this pattern is:

regex = compile ("(?:[a-zA-Z]{3})*?(ABC|DEF|GHI)")   

The *? ensures that I get the first match, since it's non-greedy. I'm using a capturing group since I assume that that is the only way to actually get the index (of the substring) that I'm actually looking for. I don't care where the match itself starts, just where the capturing group starts. The ...{3}... mandates that the pattern occur in an interval of 3, ie:

example_1 = "BNDABCDJML"    example_2 = "JKMJABCKME"  

example_1 would match since "ABC" occurs at position 3 but example_2 would not match since "ABC" occurs at position 4.

Ideally, given the string:

text = "STCABCFFC"  

this matches, but if I simply get the start of the match, it will give me 0, since that's the beginning index of the match, where what I want is 3

I'd like to do this:

print match(regex, text).group(1).start()  

but, of course, this doesn't work, since start() is not a method for strings, plus the string is now independent of text. I can't simply search for the starting index of the substring in the capturing group, because that won't guarantee me that it follows the regex pattern (only occur in intervals of 3). Perhaps I'm overlooking something, I don't write too much in python, so forgive me if this is a trivial question.


You were on the right track. start is a method for the MatchObject. Here's the example they give in the docs:

>>> email = "tony@tiremove_thisger.net"  >>> m = re.search("remove_this", email)  >>> email[:m.start()] + email[m.end():]  'tony@tiger.net'  

Basically, instead of match(regex, text).group(1).start() you should do match(regex, text).start(1).


You can get the start and end index from the match object - re.MatchObject.start(group), re.MatchObject.end(group):

regex = compile ("(?:[a-zA-Z]{3})*?(ABC|DEF|GHI)")     for m in re.finditer(regex, "STCABCFFC"):      print m.start(1), m.end(1)      print m.span(1)  # Prints 2-element tuple `(start, end)`  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »