Tutorial :Regex Subtitution in Python



Question:

I have a CSV file with several entries, and each entry has 2 unix timestamp formatted dates.

I have a method called convert(), which takes in the timestamp and converts it to YYYYMMDD.

Now, since I have 2 timestamps in each line, how would I replace each one with the new value?

EDIT: Just to clarify, I would like to convert each occurrence of the timestamp into the YYYYMMDD format. This is what is bugging me, as re.findall() returns a list.


Solution:1

I assume that by "unix timestamp formatted date" you mean a number of seconds since the epoch. This assumes that every number in the file is a UNIX timestamp. If that isn't the case you'll need to adjust the regex:

import re, sys    # your convert function goes here    regex = re.compile(r'(\d+)')  for line in sys.stdin:    sys.stdout.write(regex.sub(lambda m:    convert(int(m.group(1))), line))  

This reads from stdin and calls convert on each number found.

The "trick" here is that re.sub can take a function that transforms from a match object into a string. I'm assuming your convert function expects an int and returns a string, so I've used a lambda as an adapter function to grab the first group of the match, convert it to an int, and then pass that resulting int to convert.


Solution:2

If you know the replacement:

p = re.compile( r',\d{8},')  p.sub( ','+someval+',', csvstring )  

if it's a format change:

p = re.compile( r',(\d{4})(\d\d)(\d\d),')  p.sub( r',\3-\2-\1,', csvstring )  

EDIT: sorry, just realised you said python, modified above


Solution:3

Not able to comment your question, but did you take a look at the CSV module of python? http://docs.python.org/library/csv.html#module-csv


Solution:4

I'd use something along these lines. A lot like Laurence's response but with the timestamp conversion that you requested and takes the filename as a param. This code assumes you are working with recent dates (after 9/9/2001). If you need earlier dates, lower 10 to 9 or less.

import re, sys, time    regex = re.compile(r'(\d{10,})')    def convert(unixtime):    return time.strftime("%Y%m%d", time.gmtime(unixtime))    for line in open(sys.argv[1]):    sys.stdout.write(regex.sub(lambda m: convert(int(m.group(0))), line))  

EDIT: Cleaned up the code.

Sample Input

foo,1234567890,bar,1243310263  cat,1243310263,pants,1234567890  baz,987654321,raz,1  

Output

foo,20090213,bar,20090526  cat,20090526,pants,20090213  baz,987654321,raz,1 # not converted (too short to be a recent)  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »