Ubuntu: filename encoding issue



Question:

I am getting a file with a faroese name and trying to save it in a PHP script:

2010_08_Útflutningur.xls  

In Ubuntu 10.04 LTS is saving it as:

2010_08_�tflutningur.xls (invalid encoding)  

I've installed and run utf8-migration-tool, but with no effect.

Is this an Ubuntu error that I can fix or I just have to give up and modify the name in php?

Is there a document which states what is the acceptable charset for a filename in Ubuntu, or what are the encoding specs?

Thanks


Solution:1

This looks like an encoding issue. Unfortunately, PHP needs a bit of hand-holding when it comes to encodings, because its strings are single-byte by default. If you are creating the filename within PHP, utf8_encode() should be helpful; note, however, that it assumes ISO-8859-1 encoding for the input.

On the other hand, if you are using the filename submitted by a client, perhaps you can request the the client do the encoding for you. That is done with the accept-charset attribute of the <form> tag, and/or by setting the charset of the page that the form is on. Certain clients may use one or the other, so for best results use UTF-8 for each.


Solution:2

By default Ubuntu uses UTF-8 for filenames. Most modern linux distros and many other operating systems do so (Windows/NTFS is the best known exception with UTF-16).

To fix files that have names in the wrong encoding like the one you show, you can try to use nautilus-filename-repairer

sudo apt-get install nautilus-filename-repairer  

You can use the PHP iconv functions to convert strings (filenames) from one encoding to the other. Of course that requires that you know what encoding they are in to begin with.

To get correctly encoded filenames from the client, you can try the technique explained by eswald.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »