Ubuntu: Convert a (txt|srt) document from Western(ISO-8859-15) to UTF-8



Question:

I am having problems with subtitles in my language. They are encoded as Western(ISO-8859-15) and therefore some characters are not displayed correctly. I am tired of replacing manually using gedit and ctrl-h and then saving as UTF-8. How to automate this process?


Solution:1

You can use iconv:

If the file is named chapter1.srt, then run:

iconv -f iso88591 -t utf8 chapter1.srt > outputfile.srt  

and it will create the file, although it'll have a different name. If you move them out into another directory, then you can easily pipe them back in.


Solution:2

Another option is

konwert Install konwert

konwert isolatin1-utf8 inputfile.srt > outputfile.srt  

Besides conversion, konwert can also be used as an encoding detector:

konwert any/en-test inputfile.srt  

Which is great to get the input file encoding needed for the conversion, since both konwert and iconv requires that as argument. You must however provide a language parameter: en in any/en-test means English

It also has an option for in-place conversion, so you don't have to move and rename files afterwards:

konwert isolatin1-utf8 -O inputfile.srt  


Solution:3

Also, since you're dealing with .srt files, you should really check pysrt. It has many features to manipulate subtitles, like shifting and rescaling times.

Installing:

sudo pip install pysrt  

Converting to UTF-8 (it auto-detects input file encoding using either chardet or charade)

srt -i --encoding 'utf-8' shift 0s mysubtitle.srt  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »