
Question:
When i run this code and similar some Chinese the ni (ä½ ï¼ character (maybe others) gets chopped of and broken.
$sample = "ä½ ä¸å欢 é¦è å"; $parts = preg_split("/[\s,]+/", $sample); var_dump($parts); //outputs array(4) { [0]=> string(2) "�" [1]=> string(9) "ä¸å欢" [2]=> string(6) "é¦è" [3]=> string(3) "å" } //in æ'è§å¾ ä½ å¾ éº»ç¦ //out array(4) { [0]=> string(9) "æ'è§å¾" [1]=> string(2) "�" [2]=> string(3) "å¾" [3]=> string(6) "麻ç¦" }
Is my regex wrong?
Solution:1
If your string is in UTF-8, you must use the u
modifier:
$sample = "ä½ ä¸å欢 é¦è å"; $parts = preg_split("/[\\s,]+/u", $sample); var_dump($parts);
If it's in another encoding, see unicornaddict's answer.
Solution:2
Since the input string is multi-byte, I guess you'll have to use mb_split
in place of preg_split.
Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
EmoticonEmoticon