Ask a Question related to PHP Bugs, Design and Development.
-
forward at hongyu dot org #1
#40762 [NEW]: xml_parser failed to parse mixed coding file
From: forward at hongyu dot org
Operating system: Linux and Windows
PHP version: 5.2.1
PHP Bug Type: *XML functions
Bug description: xml_parser failed to parse mixed coding file
Description:
------------
My RSS parser failed after I upgrade the PHP version on my server from 4.x
to 5.2. When I debugged the code, I found the error was caused by the
xml_parse() function's failure to parse the UTF-8 encoded RSS message,
which is originally converted from a GB18030 string.
The error message looks like:
"Warning: xml_parse() [function.xml-parse]: input conversion failed due to
input error, bytes 0x9B 0xE6 ..."
The orginal GB encoded string consists of Chinese characters, but I
converted it to UTF-8 coding using function iconv(). I can view the
converted string correctly on web browsers, which means that there is no
converting error. So the failure only comes from xml_parse() function, I
believe.
For your testing purpose, an example of the original GB18030 string can be
downloaded at [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]
Thanks!
Reproduce code:
---------------
// variable $gb contains the GB encoded string, e.g., from
// web address [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]
// variable $utf contains the UTF-8 string converted from
// the original GB encoded string
$urf = iconv('GB18030','UTF-8', $gb);
// function feed_start_end and feed_end_element etc. are from
// the package Magpierss [url]http://magpierss.sourceforge.net/[/url]
xml_set_object( $parser, $this );
xml_set_element_handler($parser,
'feed_start_element', 'feed_end_element' );
xml_set_character_data_handler( $parser, 'feed_cdata' );
$status = xml_parse( $parser, $utf );
Expected result:
----------------
No error message.
Actual result:
--------------
Warning: xml_parse() [function.xml-parse]: input conversion failed due to
input error, bytes 0x9B 0xE6 ...
--
Edit bug report at [url]http://bugs.php.net/?id=40762&edit=1[/url]
--
Try a CVS snapshot (PHP 4.4): [url]http://bugs.php.net/fix.php?id=40762&r=trysnapshot44[/url]
Try a CVS snapshot (PHP 5.2): [url]http://bugs.php.net/fix.php?id=40762&r=trysnapshot52[/url]
Try a CVS snapshot (PHP 6.0): [url]http://bugs.php.net/fix.php?id=40762&r=trysnapshot60[/url]
Fixed in CVS: [url]http://bugs.php.net/fix.php?id=40762&r=fixedcvs[/url]
Fixed in release: [url]http://bugs.php.net/fix.php?id=40762&r=alreadyfixed[/url]
Need backtrace: [url]http://bugs.php.net/fix.php?id=40762&r=needtrace[/url]
Need Reproduce Script: [url]http://bugs.php.net/fix.php?id=40762&r=needscript[/url]
Try newer version: [url]http://bugs.php.net/fix.php?id=40762&r=oldversion[/url]
Not developer issue: [url]http://bugs.php.net/fix.php?id=40762&r=support[/url]
Expected behavior: [url]http://bugs.php.net/fix.php?id=40762&r=notwrong[/url]
Not enough info: [url]http://bugs.php.net/fix.php?id=40762&r=notenoughinfo[/url]
Submitted twice: [url]http://bugs.php.net/fix.php?id=40762&r=submittedtwice[/url]
register_globals: [url]http://bugs.php.net/fix.php?id=40762&r=globals[/url]
PHP 3 support discontinued: [url]http://bugs.php.net/fix.php?id=40762&r=php3[/url]
Daylight Savings: [url]http://bugs.php.net/fix.php?id=40762&r=dst[/url]
IIS Stability: [url]http://bugs.php.net/fix.php?id=40762&r=isapi[/url]
Install GNU Sed: [url]http://bugs.php.net/fix.php?id=40762&r=gnused[/url]
Floating point limitations: [url]http://bugs.php.net/fix.php?id=40762&r=float[/url]
No Zend Extensions: [url]http://bugs.php.net/fix.php?id=40762&r=nozend[/url]
MySQL Configuration Error: [url]http://bugs.php.net/fix.php?id=40762&r=mysqlcfg[/url]
forward at hongyu dot org Guest
-
Having CF parse another file type
I'm running IIS 5, Windows 2000, and MX. I can't seem to get it to work. Lets say I have a new file type I named ".xyz". How do I get Cold Fusion... -
File-format Module Cannot Parse the File?
A family member is sending me photos in Compu-Serve GIF format and they are averaging 3-4 MB per photo. I don't think she knows about compression.... -
Can't parse Illustrator file
Have saved illustrator file as .ai file extension and when tried to open in Photoshop v5, got message "could not open the document because the parser... -
Parse Text File and Output to File
I am using Perl to parse a text file and output to another file. The text file has data on Unix virtual memory (vmstat) and I want to delete lines... -
Parse HTML file, output to csv file
Message-ID: <2d0bce63.0307041035.78a2bd75@posting.google.com> from riddlermarc contained the following: If you only want to do this... -
chregu@php.net #2
#40762 [Opn->Bgs]: xml_parser failed to parse mixed coding file
ID: 40762
Updated by: [email]chregu@php.net[/email]
Reported By: forward at hongyu dot org
-Status: Open
+Status: Bogus
Bug Type: *XML functions
Operating System: Linux and Windows
PHP Version: 5.2.1
New Comment:
You have to change this line in the XML, too
<?xml version="1.0" encoding="gb2312" ?>
Previous Comments:
------------------------------------------------------------------------
[2007-03-08 21:56:57] forward at hongyu dot org
Description:
------------
My RSS parser failed after I upgrade the PHP version on my server from
4.x to 5.2. When I debugged the code, I found the error was caused by
the xml_parse() function's failure to parse the UTF-8 encoded RSS
message, which is originally converted from a GB18030 string.
The error message looks like:
"Warning: xml_parse() [function.xml-parse]: input conversion failed due
to input error, bytes 0x9B 0xE6 ..."
The orginal GB encoded string consists of Chinese characters, but I
converted it to UTF-8 coding using function iconv(). I can view the
converted string correctly on web browsers, which means that there is
no converting error. So the failure only comes from xml_parse()
function, I believe.
For your testing purpose, an example of the original GB18030 string can
be downloaded at [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]
Thanks!
Reproduce code:
---------------
// variable $gb contains the GB encoded string, e.g., from
// web address [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]
// variable $utf contains the UTF-8 string converted from
// the original GB encoded string
$urf = iconv('GB18030','UTF-8', $gb);
// function feed_start_end and feed_end_element etc. are from
// the package Magpierss [url]http://magpierss.sourceforge.net/[/url]
xml_set_object( $parser, $this );
xml_set_element_handler($parser,
'feed_start_element', 'feed_end_element' );
xml_set_character_data_handler( $parser, 'feed_cdata' );
$status = xml_parse( $parser, $utf );
Expected result:
----------------
No error message.
Actual result:
--------------
Warning: xml_parse() [function.xml-parse]: input conversion failed due
to input error, bytes 0x9B 0xE6 ...
------------------------------------------------------------------------
--
Edit this bug report at [url]http://bugs.php.net/?id=40762&edit=1[/url]
chregu@php.net Guest
-
forward at hongyu dot org #3
#40762 [Bgs]: xml_parser failed to parse mixed coding file
ID: 40762
User updated by: forward at hongyu dot org
Reported By: forward at hongyu dot org
Status: Bogus
Bug Type: *XML functions
Operating System: Linux and Windows
PHP Version: 5.2.1
New Comment:
Exactly what you said. Thanks a lot!
Previous Comments:
------------------------------------------------------------------------
[2007-03-09 06:11:13] [email]chregu@php.net[/email]
You have to change this line in the XML, too
<?xml version="1.0" encoding="gb2312" ?>
------------------------------------------------------------------------
[2007-03-08 21:56:57] forward at hongyu dot org
Description:
------------
My RSS parser failed after I upgrade the PHP version on my server from
4.x to 5.2. When I debugged the code, I found the error was caused by
the xml_parse() function's failure to parse the UTF-8 encoded RSS
message, which is originally converted from a GB18030 string.
The error message looks like:
"Warning: xml_parse() [function.xml-parse]: input conversion failed due
to input error, bytes 0x9B 0xE6 ..."
The orginal GB encoded string consists of Chinese characters, but I
converted it to UTF-8 coding using function iconv(). I can view the
converted string correctly on web browsers, which means that there is
no converting error. So the failure only comes from xml_parse()
function, I believe.
For your testing purpose, an example of the original GB18030 string can
be downloaded at [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]
Thanks!
Reproduce code:
---------------
// variable $gb contains the GB encoded string, e.g., from
// web address [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]
// variable $utf contains the UTF-8 string converted from
// the original GB encoded string
$urf = iconv('GB18030','UTF-8', $gb);
// function feed_start_end and feed_end_element etc. are from
// the package Magpierss [url]http://magpierss.sourceforge.net/[/url]
xml_set_object( $parser, $this );
xml_set_element_handler($parser,
'feed_start_element', 'feed_end_element' );
xml_set_character_data_handler( $parser, 'feed_cdata' );
$status = xml_parse( $parser, $utf );
Expected result:
----------------
No error message.
Actual result:
--------------
Warning: xml_parse() [function.xml-parse]: input conversion failed due
to input error, bytes 0x9B 0xE6 ...
------------------------------------------------------------------------
--
Edit this bug report at [url]http://bugs.php.net/?id=40762&edit=1[/url]
forward at hongyu dot org Guest



Reply With Quote

