Tutorial :php: file_get_contents encoding problem



Question:

My task is simple: make a post request to translate.google.com and get the translation. In the following example I'm using the word "hello" to translate into russian.

header('Content-Type: text/plain; charset=utf-8');  // optional  error_reporting(E_ALL | E_STRICT);    $context = stream_context_create(array(      'http' => array(          'method' => 'POST',          'header' => implode("\r\n", array(              'Content-type: application/x-www-form-urlencoded',              'Accept-Language: en-us,en;q=0.5', // optional              'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' // optional          )),          'content' => http_build_query(array(              'prev'  =>  '_t',              'hl'    =>  'en',              'ie'    =>  'UTF-8',              'text'  =>  'hello',              'sl'    =>  'en',              'tl'    =>  'ru'          ))      )  ));    $page = file_get_contents('http://translate.google.com/translate_t', false, $context);    require '../simplehtmldom/simple_html_dom.php';  $dom = str_get_html($page);  $translation = $dom->find('#result_box', 0)->plaintext;  echo $translation;  

Lines marked as optional are those without which the output is the same. But I'm getting weird characters...

������  

I tried

echo mb_convert_encoding($translation, 'UTF-8');  

But I get

ÐÃ'É×ÅÃ"  

Does anybody know how to solve this problem?

UPDATE:

  1. Forgot to mention that all my php files are encoded in UTF-8 without BOM
  2. When i change the "to" language to "en", that is translate from english to english, it works ok.
  3. I do not think the library I'm using is messing it up, because I tried to output the whole $page without passing it to the library functions.
  4. I'm using PHP 5


Solution:1

First off, is your browser set to UTF-8? In Firefox you can set your text encoding in View->Character Encoding. Make sure you have "Unicode (UTF-8)" selected. I would also set View->Character Encoding->Auto-Detect to "Universal."

Secondly, you could try passing the FILE_TEXT flag, like so:

$page = file_get_contents('http://translate.google.com/translate_t', FILE_TEXT, $context);  


Solution:2

Try to see this post if it can help CURL import character encoding problem

Also you can try this snippet (taken from php.net)

<?php  function file_get_contents_utf8($fn) {       $content = file_get_contents($fn);        return mb_convert_encoding($content, 'UTF-8',            mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));  }  ?>  


Solution:3

Accept-Charset is not really that optional. You should specify UTF8 there. Russian characters are not valid in ISO_8859-1


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »