Tutorial :Regex Subtitution in Python


I have a CSV file with several entries, and each entry has 2 unix timestamp formatted dates.

I have a method called convert(), which takes in the timestamp and converts it to YYYYMMDD.

Now, since I have 2 timestamps in each line, how would I replace each one with the new value?

EDIT: Just to clarify, I would like to convert each occurrence of the timestamp into the YYYYMMDD format. This is what is bugging me, as re.findall() returns a list.


I assume that by "unix timestamp formatted date" you mean a number of seconds since the epoch. This assumes that every number in the file is a UNIX timestamp. If that isn't the case you'll need to adjust the regex:

import re, sys    # your convert function goes here    regex = re.compile(r'(\d+)')  for line in sys.stdin:    sys.stdout.write(regex.sub(lambda m:    convert(int(m.group(1))), line))  

This reads from stdin and calls convert on each number found.

The "trick" here is that re.sub can take a function that transforms from a match object into a string. I'm assuming your convert function expects an int and returns a string, so I've used a lambda as an adapter function to grab the first group of the match, convert it to an int, and then pass that resulting int to convert.


If you know the replacement:

p = re.compile( r',\d{8},')  p.sub( ','+someval+',', csvstring )  

if it's a format change:

p = re.compile( r',(\d{4})(\d\d)(\d\d),')  p.sub( r',\3-\2-\1,', csvstring )  

EDIT: sorry, just realised you said python, modified above


Not able to comment your question, but did you take a look at the CSV module of python? http://docs.python.org/library/csv.html#module-csv


I'd use something along these lines. A lot like Laurence's response but with the timestamp conversion that you requested and takes the filename as a param. This code assumes you are working with recent dates (after 9/9/2001). If you need earlier dates, lower 10 to 9 or less.

import re, sys, time    regex = re.compile(r'(\d{10,})')    def convert(unixtime):    return time.strftime("%Y%m%d", time.gmtime(unixtime))    for line in open(sys.argv[1]):    sys.stdout.write(regex.sub(lambda m: convert(int(m.group(0))), line))  

EDIT: Cleaned up the code.

Sample Input

foo,1234567890,bar,1243310263  cat,1243310263,pants,1234567890  baz,987654321,raz,1  


foo,20090213,bar,20090526  cat,20090526,pants,20090213  baz,987654321,raz,1 # not converted (too short to be a recent)  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »