同事采集东西回来写到SQLite后有类似这样的源码:

trituración

这玩意儿前台展示出来是正常的- -至少看起来,但是后台处理就不一样了,需要转码下才能到正常字符串,检索了一下,找到一篇完整的php函数,原封不动摘录代码如下:

<?php
function uni($str){
$ret = '';
for ($i = 0; $i < mb_strlen($str, 'utf-8'); $i = $i + 1) {
$ret .= "&#" . uniord(mb_substr($str, $i, 1, 'utf-8')) . ";";
}
return $ret;
}
function uniord($u){
$c = unpack("N", mb_convert_encoding($u, 'UCS-4BE', 'UTF-8'));
return $c[1];
}
function unichr($u){
return mb_convert_encoding(pack("N", $u), mb_internal_encoding(), 'UCS-4BE');
}
function u2utf8($c){
$str = "";
if ($c < 0x80) {
$str .= chr($c);
} else if ($c < 0x800) {
$str .= chr(0xC0 | $c >> 6);
$str .= chr(0x80 | $c & 0x3F);
} else if ($c < 0x10000) {
$str .= chr(0xE0 | $c >> 12);
$str .= chr(0x80 | $c >> 6 & 0x3F);
$str .= chr(0x80 | $c & 0x3F);
} else if ($c < 0x200000) {
$str .= chr(0xF0 | $c >> 18);
$str .= chr(0x80 | $c >> 12 & 0x3F);
$str .= chr(0x80 | $c >> 6 & 0x3F);
$str .= chr(0x80 | $c & 0x3F);
}
return $str;
}
$source = '好好学习,天天向上';
$source = uni($source);
var_dump($source);
// string(69) "&#22909;&#22909;&#23398;&#20064;&#44;&#22825;&#22825;&#21521;&#19978;" //
preg_match_all("/&#([0-9]+);/", $source, $regs);
var_dump($regs);
foreach ($regs[1] as $v) {
$source = str_replace("&#$v;", u2utf8($v), $source);
}
var_dump($source);
// string(25) "好好学习,天天向上" //

代码完整搬自:http://blog.chinaunix.net/uid-20410459-id-443189.html

Related Posts: unicode十进制内码与utf-8编码字符串相互转化 :