This topic has been archived. It cannot be replied.
-
工作学习 / 学科技术讨论 / unicode求助在改一个bug,问题是这样的,用户在页面上输入数据等,点保存,页面上的信息转成xml存入数据库,load时parse xml.现在问题是用户数据的文字中出现了字符'\u0001',这样保存没问题,load parse时就出错了。不知道用户怎么搞出来这个字符的,初步怀疑是 copy & paste。头儿让我弄清楚这个'\u0001'到底是怎么来的。大家出出主意,谢谢了。
-aaronding(流浪的八毛儿);
2007-12-4
{316}
(#4098742@0)
-
08 CTS买了没有?
-j30(人傻、钱多、速来!);
2007-12-4
(#4098773@0)
-
nnd,不降价,反而涨利率,加上现在上班不开车了,持币观望中
-aaronding(流浪的八毛儿);
2007-12-4
(#4099263@0)
-
native to ascii
-denegation(denegation);
2007-12-4
(#4098873@0)
-
(R/MT)
-aaronding(流浪的八毛儿);
2007-12-4
(#4099327@0)
-
What is the encoding from Client to Server?
What is the encode of you program language?
What is the encoding of your database?
What is the encoding for you xml parser?
-stevensun2000(小胖子);
2007-12-5
(#4101333@0)
-
问题解决了。经过我一天的努力,排除了字符集的问题,虽然String是UTF-8,数据库表是cp1252,也导致了一些小问题,一些字符变成了?。但0x1不是UTF-8的合法字符。我充分发挥想象力,继续寻找,终于google到一个pdf文件,在Linux下用Adobe Reader打开,copy时能copy出0x1,windows下和Linux下其他软件copy出来的是20(空格)。真相终于大白于天下。
-aaronding(流浪的八毛儿);
2007-12-6
{272}
(#4102997@0)
-
Still encoding problem,
U0001 is a control code for Unicode . SOH (Start of Heading).
You just found another way to reproduce the error, but not why.
-stevensun2000(小胖子);
2007-12-6
(#4103121@0)
-
0x1是ASCII SOH,解决办法与字符集无关,就是滤掉所有xml非法字符。
-aaronding(流浪的八毛儿);
2007-12-7
(#4105420@0)
-
Where do those invalid xml characters come from?
If all encodingsare right, you should not get those invalid character at all ?
I still think your root problem is encoding problem.
-stevensun2000(小胖子);
2007-12-7
(#4107001@0)
-
说了,是用Linux下的acobat reader从一个pdf里copy出来的。我copy一段文字,0x1是不应该到clipboard里的。window版本的acrobat reader就没有这个问题,copy出来的是空格。所以说是acrobat reader的bug。
-aaronding(流浪的八毛儿);
2007-12-7
(#4107014@0)
-
OK, So you think the text U0001 actually from user input, and the user inputted by by copy and paste from somewhere?
Also you said there were some ? characters in your database. where did those ? come from?
-stevensun2000(小胖子);
2007-12-7
(#4107043@0)
-
我也说了,从UTF8到cp1252出现了?,这是字符集问题,但头儿不关心,他们关心的是生成的xml不能parse,这个才是bug。?那个也是问题,但用户没有题。
-aaronding(流浪的八毛儿);
2007-12-7
(#4107117@0)
-
This is what PDF used to enforce PDF document should be treated in binary mode.
-frankwoo(柳五随风);
2007-12-6
(#4105262@0)
-
但adobe reader怎么也不应该让0x1进入到ClipBoard中,windows下的adobe reader和Linux其他软件就没有这个问题。
-aaronding(流浪的八毛儿);
2007-12-7
(#4105415@0)
-
That is what we called the bug:)
-frankwoo(柳五随风);
2007-12-7
(#4106440@0)
-
You maybe need to filter the unprintable text characters , or translate the them to blanks or likeso when you need to print them on screen.
-frankwoo(柳五随风);
2007-12-6
(#4105266@0)
-
No matter how the 0x1 is entered, for robustness of applications, non-XML acceptabel characters needs to be escaped and unescaped.
-wangqingshui(忘情水);
2007-12-6
(#4105290@0)
-
I believe you already have some escape mechanism, like < and & must be escaped.
-wangqingshui(忘情水);
2007-12-7
(#4105293@0)