Tutorial :Producing an view of a text's revision history in Python



Question:

I have two versions of a piece of text and I want to produce an HTML view of its revision similar to what Google Docs or Stack Overflow displays. I need to do this in Python. I don't know what this technique is called but I assume that it has a name and hopefully there is a Python library that can do it.

Version 1:

William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.

Version 2:

William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen. He is American.

The desired output:

William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen. He is American.

Using the diff command doesn't work because it tells me which lines are different but not which columns/words are different.

$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.' > oldfile  $ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  He is American.' > newfile  $ diff -u oldfile newfile  --- oldfile 2010-04-30 13:32:43.000000000 -0700  +++ newfile 2010-04-30 13:33:09.000000000 -0700  @@ -1 +1 @@  -William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  +William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  He is American.' > oldfile  


Solution:1

Google Diff Merge Patch has a pretty good diff implementation in pure python.


Solution:2

You can use wdiff. I don't know if there's a Python implementation:

$ wdiff oldfile newfile  William Henry "Bill" Gates III (born October 28, 1955)[2] is [-an American-] {+a+} business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  {+He is American.+}  


Solution:3

The difflib module might be of assistance with this problem.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »