Tutorial :Iterate over a string 2 (or n) characters at a time in Python


Earlier today I needed to iterate over a string 2 characters at a time for parsing a string formatted like "+c-R+D-E" (there are a few extra letters).

I ended up with this, which works, but it looks ugly. I ended up commenting what it was doing because it felt non-obvious. It almost seems pythonic, but not quite.

# Might not be exact, but you get the idea, use the step  # parameter of range() and slicing to grab 2 chars at a time  s = "+c-R+D-e"  for op, code in (s[i:i+2] for i in range(0, len(s), 2)):    print op, code  

Are there some better/cleaner ways to do this?


Dunno about cleaner, but there's another alternative:

for (op, code) in zip(s[0::2], s[1::2]):      print op, code  

A no-copy version:

from itertools import izip, islice  for (op, code) in izip(islice(s, 0, None, 2), islice(s, 1, None, 2)):      print op, code  


Maybe this would be cleaner?

s = "+c-R+D-e"  for i in xrange(0, len(s), 2):      op, code = s[i:i+2]      print op, code  

You could perhaps write a generator to do what you want, maybe that would be more pythonic :)


Triptych inspired this more general solution:

def slicen(s, n, truncate=False):      assert n > 0      while len(s) >= n:          yield s[:n]          s = s[n:]      if len(s) and not truncate:          yield s    for op, code in slicen("+c-R+D-e", 2):      print op,code  


from itertools import izip_longest  def grouper(iterable, n, fillvalue=None):      args = [iter(iterable)] * n      return izip_longest(*args, fillvalue=fillvalue)  def main():      s = "+c-R+D-e"      for item in grouper(s, 2):          print ' '.join(item)  if __name__ == "__main__":      main()  ##output  ##+ c  ##- R  ##+ D  ##- e  

izip_longest requires Python 2.6( or higher). If on Python 2.4 or 2.5, use the definition for izip_longest from the document or change the grouper function to:

from itertools import izip, chain, repeat  def grouper(iterable, n, padvalue=None):      return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)  


The other answers work well for n = 2, but for the general case you could try this:

def slicen(s, n, truncate=False):      nslices = len(s) / n      if not truncate and (len(s) % n):          nslices += 1      return (s[i*n:n*(i+1)] for i in range(nslices))    >>> s = '+c-R+D-e'  >>> for op, code in slicen(s, 2):  ...     print op, code  ...   + c  - R  + D  - e    >>> for a, b, c in slicen(s, 3):  ...     print a, b, c  ...   + c -  R + D  Traceback (most recent call last):    File "<stdin>", line 1, in ?  ValueError: need more than 2 values to unpack    >>> for a, b, c in slicen(s,3,True):  ...     print a, b, c  ...   + c -  R + D  


Great opportunity for a generator. For larger lists, this will be much more efficient than zipping every other elemnent. Note that this version also handles strings with dangling ops

def opcodes(s):      while True:          try:              op   = s[0]              code = s[1]              s    = s[2:]          except IndexError:              return          yield op,code              for op,code in opcodes("+c-R+D-e"):     print op,code  

edit: minor rewrite to avoid ValueError exceptions.


This approach support an arbitrary number of elements per result, evaluates lazily, and the input iterable can be a generator (no indexing is attempted):

import itertools    def groups_of_n(n, iterable):      c = itertools.count()      for _, gen in itertools.groupby(iterable, lambda x: c.next() / n):          yield gen  

Any left-over elements are returned in a shorter list.

Example usage:

for g in groups_of_n(4, xrange(21)):      print list(g)    [0, 1, 2, 3]  [4, 5, 6, 7]  [8, 9, 10, 11]  [12, 13, 14, 15]  [16, 17, 18, 19]  [20]  


>>> s = "+c-R+D-e"  >>> s  '+c-R+D-e'  >>> s[::2]  '+-+-'  >>>  


Maybe not the most efficient, but if you like regexes...

import re  s = "+c-R+D-e"  for op, code in re.findall('(.)(.)', s):      print op, code  


I ran into a similar problem. Ended doing something like this:

ops = iter("+c-R+D-e")  for op in ops      code = ops.next()        print op, code  

I felt it was the most readable.


Here's my answer, a little bit cleaner to my eyes:

for i in range(0, len(string) - 1):      if i % 2 == 0:          print string[i:i+2]  


Consider pip installing more_itertools, which already ships with a chunked implementation along with other helpful tools:

import more_itertools     for op, code in more_itertools.chunked(s, 2):      print(op, code)  


+ c  - R  + D  - e  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »