Tutorial :Convert UTF-16 to UTF-8 under Windows and Linux, in C



Question:

I was wondering if there is a recommended 'cross' Windows and Linux method for the purpose of converting strings from UTF-16LE to UTF-8? or one should use different methods for each environment?

I've managed to google few references to 'iconv' , but for somreason I can't find samples of basic conversions, such as - converting a wchar_t UTF-16 to UTF-8.

Anybody can recommend a method that would be 'cross', and if you know of references or a guide with samples, would very appreciate it.

Thanks, Doori Bar


Solution:1

If you don't want to use ICU,

  1. Windows: WideCharToMultiByte
  2. Linux: iconv (Glibc)


Solution:2

Change encoding to UTF-8 with PowerShell:

powershell -Command "Get-Content PATH\temp.txt -Encoding Unicode | Set-Content -Encoding UTF8 PATH2\temp.txt"  


Solution:3

The open source ICU library is very commonly used.


Solution:4

I have run into this problem too, I solve it by using boost locale library

try  {                 std::string utf8 = boost::locale::conv::utf_to_utf<char, short>(                          (short*)wcontent.c_str(),                           (short*)(wcontent.c_str() + wcontent.length()));      content = boost::locale::conv::from_utf(utf8, "ISO-8859-1");  }  catch (boost::locale::conv::conversion_error e)  {      std::cout << "Fail to convert from UTF-8 to " << toEncoding << "!" << std::endl;      break;  }  

The boost::locale::conv::utf_to_utf function try to convert from a buffer that encoded by UTF-16LE to UTF-8, The boost::locale::conv::from_utf function try to convert from a buffer that encoded by UTF-8 to ANSI, make sure the encoding is right(Here I use encoding for Latin-1, ISO-8859-1).

Another reminder is, in Linux std::wstring is 4 bytes long, but in Windows std::wstring is 2 bytes long, so you would better not use std::wstring to contain UTF-16LE buffer.


Solution:5

wchar_t *src = ...;  int srclen = ...;  char *dst = ...;  int dstlen = ...;  iconv_t conv = iconv_open("UTF-8", "UTF-16");  iconv(conv, (char*)&src, &srclen, &dst, &dstlen);  iconv_close(conv);  


Solution:6

There's also utfcpp, which is a header-only library.


Solution:7

Thanks guys, this is how I managed to solve the 'cross' windows and linux requirement:

  1. Downloaded and installed: MinGW , and MSYS
  2. Downloaded the libiconv source package
  3. Compiled libiconv via MSYS.

That's about it.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »