Tutorial :best character set and collation for European based website



Question:

I am going to be building a application which will be used by people all over Europe. I need to know which collation and character set would be best suited for user inputted data. Or should I make a separate table for each language. A article to something explaining this would be great.

Thanks :)


Solution:1

Unicode is a very large character set including nearly all characters from nearly all languages.

There are a number of ways to store Unicode text as a sequence of bytes - these ways are called encodings. All Unicode encodings (well, all complete Unicode encodings) can store all Unicode text as a sequence of bytes, in some format - but the number of bytes that any given piece of text takes will depend on the encoding used.

UTF-8 is a Unicode encoding that is optimized for English and other languages which use very few characters outside the Latin alphabet. UTF-16 is a Unicode encoding which is possibly more appropriate for text in a variety of European languages. Java and .NET store all text in-memory (the String class) as UTF-16 encoded Unicode.


Solution:2

Character set, without doubt, UTF-8. Collation, I am not sure there is a good answer to that, but you might want to read this report.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »