Tutorial :Find string between two substrings [duplicate]


This question already has an answer here:

How do I find a string between two substrings ('123STRINGabc' -> 'STRING')?

My current method is like this:

>>> start = 'asdf=5;'  >>> end = '123jasd'  >>> s = 'asdf=5;iwantthis123jasd'  >>> print((s.split(start))[1].split(end)[0])  iwantthis  

However, this seems very inefficient and un-pythonic. What is a better way to do something like this?

Forgot to mention: The string might not start and end with start and end. They may have more characters before and after.


import re    s = 'asdf=5;iwantthis123jasd'  result = re.search('asdf=5;(.*)123jasd', s)  print(result.group(1))  


s = "123123STRINGabcabc"    def find_between( s, first, last ):      try:          start = s.index( first ) + len( first )          end = s.index( last, start )          return s[start:end]      except ValueError:          return ""    def find_between_r( s, first, last ):      try:          start = s.rindex( first ) + len( first )          end = s.rindex( last, start )          return s[start:end]      except ValueError:          return ""      print find_between( s, "123", "abc" )  print find_between_r( s, "123", "abc" )  



I thought it should be noted - depending on what behavior you need, you can mix index and rindex calls or go with one of the above versions (it's equivalent of regex (.*) and (.*?) groups).


start = 'asdf=5;'  end = '123jasd'  s = 'asdf=5;iwantthis123jasd'  print s[s.find(start)+len(start):s.rfind(end)]  






String formatting adds some flexibility to what Nikolaus Gradwohl suggested. start and end can now be amended as desired.

import re    s = 'asdf=5;iwantthis123jasd'  start = 'asdf=5;'  end = '123jasd'    result = re.search('%s(.*)%s' % (start, end), s).group(1)  print(result)  


Just converting the OP's own solution into an answer:

def find_between(s, start, end):    return (s.split(start))[1].split(end)[0]  


Here is one way to do it

_,_,rest = s.partition(start)  result,_,_ = rest.partition(end)  print result  

Another way using regexp

import re  print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]  


print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)  


source='your token _here0@df and maybe _here1@df or maybe _here2@df'  start_sep='_'  end_sep='@df'  result=[]  tmp=source.split(start_sep)  for par in tmp:    if end_sep in par:      result.append(par.split(end_sep)[0])    print result  

must show: here0, here1, here2

the regex is better but it will require additional lib an you may want to go for python only


If you don't want to import anything, try the string method .index():

text = 'I want to find a string between two substrings'  left = 'find a '  right = 'between two'    # Output: 'string'  print text[text.index(left)+len(left):text.index(right)]  


To extract STRING, try:

myString = '123STRINGabc'  startString = '123'  endString = 'abc'    mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]  


My method will be to do something like,

find index of start string in s => i  find index of end string in s => j    substring = substring(i+len(start) to j-1)  


This is essentially cji's answer - Jul 30 '10 at 5:58. I changed the try except structure for a little more clarity on what was causing the exception.

def find_between( inputStr, firstSubstr, lastSubstr ):  '''  find between firstSubstr and lastSubstr in inputStr  STARTING FROM THE LEFT      http://stackoverflow.com/questions/3368969/find-string-between-two-substrings          above also has a func that does this FROM THE RIGHT     '''  start, end = (-1,-1)  try:      start = inputStr.index( firstSubstr ) + len( firstSubstr )  except ValueError:      print '    ValueError: ',      print "firstSubstr=%s  -  "%( firstSubstr ),       print sys.exc_info()[1]    try:      end = inputStr.index( lastSubstr, start )         except ValueError:      print '    ValueError: ',      print "lastSubstr=%s  -  "%( lastSubstr ),       print sys.exc_info()[1]    return inputStr[start:end]      


These solutions assume the start string and final string are different. Here is a solution I use for an entire file when the initial and final indicators are the same, assuming the entire file is read using readlines():

def extractstring(line,flag='$'):      if flag in line: # $ is the flag          dex1=line.index(flag)          subline=line[dex1+1:-1] #leave out flag (+1) to end of line          dex2=subline.index(flag)          string=subline[0:dex2].strip() #does not include last flag, strip whitespace      return(string)  


lines=['asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd',      'afafoaltat $I GOT BETTER!$ derpity derp derp']  for line in lines:      string=extractstring(line,flag='$')      print(string)  




You can simply use this code or copy the function below. All neatly in one line.

def substring(whole, sub1, sub2):      return whole[whole.index(sub1) : whole.index(sub2)]  

If you run the function as follows.

print(substring("5+(5*2)+2", "(", "("))  

You will pobably be left with the output:


rather than


If you want to have the sub-strings on the end of the output the code must look like below.

return whole[whole.index(sub1) : whole.index(sub2) + 1]  

But if you don't want the substrings on the end the +1 must be on the first value.

return whole[whole.index(sub1) + 1 : whole.index(sub2)]  


Here is a function I did to return a list with a string(s) inbetween string1 and string2 searched.

def GetListOfSubstrings(stringSubject,string1,string2):      MyList = []      intstart=0      strlength=len(stringSubject)      continueloop = 1        while(intstart < strlength and continueloop == 1):          intindex1=stringSubject.find(string1,intstart)          if(intindex1 != -1): #The substring was found, lets proceed              intindex1 = intindex1+len(string1)              intindex2 = stringSubject.find(string2,intindex1)              if(intindex2 != -1):                  subsequence=stringSubject[intindex1:intindex2]                  MyList.append(subsequence)                  intstart=intindex2+len(string2)              else:                  continueloop=0          else:              continueloop=0      return MyList      #Usage Example  mystring="s123y123o123pp123y6"  List = GetListOfSubstrings(mystring,"1","y68")  for x in range(0, len(List)):                 print(List[x])  output:      mystring="s123y123o123pp123y6"  List = GetListOfSubstrings(mystring,"1","3")  for x in range(0, len(List)):                print(List[x])  output:      2      2      2      2    mystring="s123y123o123pp123y6"  List = GetListOfSubstrings(mystring,"1","y")  for x in range(0, len(List)):                 print(List[x])  output:  23  23o123pp123  


This I posted before as code snippet in Daniweb:

# picking up piece of string between separators  # function using partition, like partition, but drops the separators  def between(left,right,s):      before,_,a = s.partition(left)      a,_,after = a.partition(right)      return before,a,after    s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) 'Daniweb forum' tcha tcha tchaa"  print between('<a>','</a>',s)  print between('(',')',s)  print between("'","'",s)    """ Output:  ('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa")  ('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa")  ('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa')  """  


from timeit import timeit  from re import search, DOTALL      def partition_find(string, start, end):      return string.partition(start)[2].rpartition(end)[0]      def re_find(string, start, end):      # applying re.escape to start and end would be safer      return search(start + '(.*)' + end, string, DOTALL).group(1)      def index_find(string, start, end):      return string[string.find(start) + len(start):string.rfind(end)]      # The wikitext of "Alan Turing law" article form English Wikipeida  # https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886  string = """..."""  start = '==Proposals=='  end = '==Rival bills=='    assert index_find(string, start, end) \         == partition_find(string, start, end) \         == re_find(string, start, end)    print('index_find', timeit(      'index_find(string, start, end)',      globals=globals(),      number=100_000,  ))    print('partition_find', timeit(      'partition_find(string, start, end)',      globals=globals(),      number=100_000,  ))    print('re_find', timeit(      're_find(string, start, end)',      globals=globals(),      number=100_000,  ))  


index_find 0.35047444528454114  partition_find 0.5327825636197754  re_find 7.552149639286381  

re_find was almost 20 times slower than index_find in this example.


Parsing text with delimiters from different email platforms posed a larger-sized version of this problem. They generally have a START and a STOP. Delimiter characters for wildcards kept choking regex. The problem with split is mentioned here & elsewhere - oops, delimiter character gone. It occurred to me to use replace() to give split() something else to consume. Chunk of code:

nuke = '~~~'  start = '|*'  stop = '*|'  julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke))  keep = [chunk for chunk in julien if start in chunk and stop in chunk]  logging.info('keep: %s',keep)  


Further from Nikolaus Gradwohl answer, I needed to get version number (i.e., 0.0.2) between('ui:' and '-') from below file content (filename: docker-compose.yml):

    version: '3.1'  services:    ui:      image: repo-pkg.dev.io:21/website/ui:0.0.2-QA1      #network_mode: host      ports:        - 443:9999      ulimits:        nofile:test  

and this is how it worked for me (python script):

import re, sys    f = open('docker-compose.yml', 'r')  lines = f.read()  result = re.search('ui:(.*)-', lines)  print result.group(1)      Result:  0.0.2  


This seems much more straight forward to me:

import re    s = 'asdf=5;iwantthis123jasd'  x= re.search('iwantthis',s)  print(s[x.start():x.end()])  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »