Tutorial :Find string between two substrings [duplicate]



Question:

This question already has an answer here:

How do I find a string between two substrings ('123STRINGabc' -> 'STRING')?

My current method is like this:

>>> start = 'asdf=5;'  >>> end = '123jasd'  >>> s = 'asdf=5;iwantthis123jasd'  >>> print((s.split(start))[1].split(end)[0])  iwantthis  

However, this seems very inefficient and un-pythonic. What is a better way to do something like this?

Forgot to mention: The string might not start and end with start and end. They may have more characters before and after.


Solution:1

import re    s = 'asdf=5;iwantthis123jasd'  result = re.search('asdf=5;(.*)123jasd', s)  print(result.group(1))  


Solution:2

s = "123123STRINGabcabc"    def find_between( s, first, last ):      try:          start = s.index( first ) + len( first )          end = s.index( last, start )          return s[start:end]      except ValueError:          return ""    def find_between_r( s, first, last ):      try:          start = s.rindex( first ) + len( first )          end = s.rindex( last, start )          return s[start:end]      except ValueError:          return ""      print find_between( s, "123", "abc" )  print find_between_r( s, "123", "abc" )  

gives:

123STRING  STRINGabc  

I thought it should be noted - depending on what behavior you need, you can mix index and rindex calls or go with one of the above versions (it's equivalent of regex (.*) and (.*?) groups).


Solution:3

start = 'asdf=5;'  end = '123jasd'  s = 'asdf=5;iwantthis123jasd'  print s[s.find(start)+len(start):s.rfind(end)]  

gives

iwantthis  


Solution:4

s[len(start):-len(end)]  


Solution:5

String formatting adds some flexibility to what Nikolaus Gradwohl suggested. start and end can now be amended as desired.

import re    s = 'asdf=5;iwantthis123jasd'  start = 'asdf=5;'  end = '123jasd'    result = re.search('%s(.*)%s' % (start, end), s).group(1)  print(result)  


Solution:6

Just converting the OP's own solution into an answer:

def find_between(s, start, end):    return (s.split(start))[1].split(end)[0]  


Solution:7

Here is one way to do it

_,_,rest = s.partition(start)  result,_,_ = rest.partition(end)  print result  

Another way using regexp

import re  print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]  

or

print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)  


Solution:8

source='your token _here0@df and maybe _here1@df or maybe _here2@df'  start_sep='_'  end_sep='@df'  result=[]  tmp=source.split(start_sep)  for par in tmp:    if end_sep in par:      result.append(par.split(end_sep)[0])    print result  

must show: here0, here1, here2

the regex is better but it will require additional lib an you may want to go for python only


Solution:9

If you don't want to import anything, try the string method .index():

text = 'I want to find a string between two substrings'  left = 'find a '  right = 'between two'    # Output: 'string'  print text[text.index(left)+len(left):text.index(right)]  


Solution:10

To extract STRING, try:

myString = '123STRINGabc'  startString = '123'  endString = 'abc'    mySubString=myString[myString.find(startString)+len(startString):myString.find(endString)]  


Solution:11

My method will be to do something like,

find index of start string in s => i  find index of end string in s => j    substring = substring(i+len(start) to j-1)  


Solution:12

This is essentially cji's answer - Jul 30 '10 at 5:58. I changed the try except structure for a little more clarity on what was causing the exception.

def find_between( inputStr, firstSubstr, lastSubstr ):  '''  find between firstSubstr and lastSubstr in inputStr  STARTING FROM THE LEFT      http://stackoverflow.com/questions/3368969/find-string-between-two-substrings          above also has a func that does this FROM THE RIGHT     '''  start, end = (-1,-1)  try:      start = inputStr.index( firstSubstr ) + len( firstSubstr )  except ValueError:      print '    ValueError: ',      print "firstSubstr=%s  -  "%( firstSubstr ),       print sys.exc_info()[1]    try:      end = inputStr.index( lastSubstr, start )         except ValueError:      print '    ValueError: ',      print "lastSubstr=%s  -  "%( lastSubstr ),       print sys.exc_info()[1]    return inputStr[start:end]      


Solution:13

These solutions assume the start string and final string are different. Here is a solution I use for an entire file when the initial and final indicators are the same, assuming the entire file is read using readlines():

def extractstring(line,flag='$'):      if flag in line: # $ is the flag          dex1=line.index(flag)          subline=line[dex1+1:-1] #leave out flag (+1) to end of line          dex2=subline.index(flag)          string=subline[0:dex2].strip() #does not include last flag, strip whitespace      return(string)  

Example:

lines=['asdf 1qr3 qtqay 45q at $A NEWT?$ asdfa afeasd',      'afafoaltat $I GOT BETTER!$ derpity derp derp']  for line in lines:      string=extractstring(line,flag='$')      print(string)  

Gives:

A NEWT?  I GOT BETTER!  


Solution:14

You can simply use this code or copy the function below. All neatly in one line.

def substring(whole, sub1, sub2):      return whole[whole.index(sub1) : whole.index(sub2)]  

If you run the function as follows.

print(substring("5+(5*2)+2", "(", "("))  

You will pobably be left with the output:

(5*2  

rather than

5*2  

If you want to have the sub-strings on the end of the output the code must look like below.

return whole[whole.index(sub1) : whole.index(sub2) + 1]  

But if you don't want the substrings on the end the +1 must be on the first value.

return whole[whole.index(sub1) + 1 : whole.index(sub2)]  


Solution:15

Here is a function I did to return a list with a string(s) inbetween string1 and string2 searched.

def GetListOfSubstrings(stringSubject,string1,string2):      MyList = []      intstart=0      strlength=len(stringSubject)      continueloop = 1        while(intstart < strlength and continueloop == 1):          intindex1=stringSubject.find(string1,intstart)          if(intindex1 != -1): #The substring was found, lets proceed              intindex1 = intindex1+len(string1)              intindex2 = stringSubject.find(string2,intindex1)              if(intindex2 != -1):                  subsequence=stringSubject[intindex1:intindex2]                  MyList.append(subsequence)                  intstart=intindex2+len(string2)              else:                  continueloop=0          else:              continueloop=0      return MyList      #Usage Example  mystring="s123y123o123pp123y6"  List = GetListOfSubstrings(mystring,"1","y68")  for x in range(0, len(List)):                 print(List[x])  output:      mystring="s123y123o123pp123y6"  List = GetListOfSubstrings(mystring,"1","3")  for x in range(0, len(List)):                print(List[x])  output:      2      2      2      2    mystring="s123y123o123pp123y6"  List = GetListOfSubstrings(mystring,"1","y")  for x in range(0, len(List)):                 print(List[x])  output:  23  23o123pp123  


Solution:16

This I posted before as code snippet in Daniweb:

# picking up piece of string between separators  # function using partition, like partition, but drops the separators  def between(left,right,s):      before,_,a = s.partition(left)      a,_,after = a.partition(right)      return before,a,after    s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) 'Daniweb forum' tcha tcha tchaa"  print between('<a>','</a>',s)  print between('(',')',s)  print between("'","'",s)    """ Output:  ('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa")  ('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa")  ('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa')  """  


Solution:17

from timeit import timeit  from re import search, DOTALL      def partition_find(string, start, end):      return string.partition(start)[2].rpartition(end)[0]      def re_find(string, start, end):      # applying re.escape to start and end would be safer      return search(start + '(.*)' + end, string, DOTALL).group(1)      def index_find(string, start, end):      return string[string.find(start) + len(start):string.rfind(end)]      # The wikitext of "Alan Turing law" article form English Wikipeida  # https://en.wikipedia.org/w/index.php?title=Alan_Turing_law&action=edit&oldid=763725886  string = """..."""  start = '==Proposals=='  end = '==Rival bills=='    assert index_find(string, start, end) \         == partition_find(string, start, end) \         == re_find(string, start, end)    print('index_find', timeit(      'index_find(string, start, end)',      globals=globals(),      number=100_000,  ))    print('partition_find', timeit(      'partition_find(string, start, end)',      globals=globals(),      number=100_000,  ))    print('re_find', timeit(      're_find(string, start, end)',      globals=globals(),      number=100_000,  ))  

Result:

index_find 0.35047444528454114  partition_find 0.5327825636197754  re_find 7.552149639286381  

re_find was almost 20 times slower than index_find in this example.


Solution:18

Parsing text with delimiters from different email platforms posed a larger-sized version of this problem. They generally have a START and a STOP. Delimiter characters for wildcards kept choking regex. The problem with split is mentioned here & elsewhere - oops, delimiter character gone. It occurred to me to use replace() to give split() something else to consume. Chunk of code:

nuke = '~~~'  start = '|*'  stop = '*|'  julien = (textIn.replace(start,nuke + start).replace(stop,stop + nuke).split(nuke))  keep = [chunk for chunk in julien if start in chunk and stop in chunk]  logging.info('keep: %s',keep)  


Solution:19

Further from Nikolaus Gradwohl answer, I needed to get version number (i.e., 0.0.2) between('ui:' and '-') from below file content (filename: docker-compose.yml):

    version: '3.1'  services:    ui:      image: repo-pkg.dev.io:21/website/ui:0.0.2-QA1      #network_mode: host      ports:        - 443:9999      ulimits:        nofile:test  

and this is how it worked for me (python script):

import re, sys    f = open('docker-compose.yml', 'r')  lines = f.read()  result = re.search('ui:(.*)-', lines)  print result.group(1)      Result:  0.0.2  


Solution:20

This seems much more straight forward to me:

import re    s = 'asdf=5;iwantthis123jasd'  x= re.search('iwantthis',s)  print(s[x.start():x.end()])  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »