#40762 [NEW]: xml_parser failed to parse mixed coding file

Ask a Question related to PHP Bugs, Design and Development.

  1. #1

    Default #40762 [NEW]: xml_parser failed to parse mixed coding file

    From: forward at hongyu dot org
    Operating system: Linux and Windows
    PHP version: 5.2.1
    PHP Bug Type: *XML functions
    Bug description: xml_parser failed to parse mixed coding file

    Description:
    ------------
    My RSS parser failed after I upgrade the PHP version on my server from 4.x
    to 5.2. When I debugged the code, I found the error was caused by the
    xml_parse() function's failure to parse the UTF-8 encoded RSS message,
    which is originally converted from a GB18030 string.

    The error message looks like:
    "Warning: xml_parse() [function.xml-parse]: input conversion failed due to
    input error, bytes 0x9B 0xE6 ..."

    The orginal GB encoded string consists of Chinese characters, but I
    converted it to UTF-8 coding using function iconv(). I can view the
    converted string correctly on web browsers, which means that there is no
    converting error. So the failure only comes from xml_parse() function, I
    believe.

    For your testing purpose, an example of the original GB18030 string can be
    downloaded at [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]

    Thanks!



    Reproduce code:
    ---------------
    // variable $gb contains the GB encoded string, e.g., from
    // web address [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]

    // variable $utf contains the UTF-8 string converted from
    // the original GB encoded string

    $urf = iconv('GB18030','UTF-8', $gb);

    // function feed_start_end and feed_end_element etc. are from
    // the package Magpierss [url]http://magpierss.sourceforge.net/[/url]

    xml_set_object( $parser, $this );
    xml_set_element_handler($parser,
    'feed_start_element', 'feed_end_element' );

    xml_set_character_data_handler( $parser, 'feed_cdata' );

    $status = xml_parse( $parser, $utf );


    Expected result:
    ----------------
    No error message.

    Actual result:
    --------------
    Warning: xml_parse() [function.xml-parse]: input conversion failed due to
    input error, bytes 0x9B 0xE6 ...


    --
    Edit bug report at [url]http://bugs.php.net/?id=40762&edit=1[/url]
    --
    Try a CVS snapshot (PHP 4.4): [url]http://bugs.php.net/fix.php?id=40762&r=trysnapshot44[/url]
    Try a CVS snapshot (PHP 5.2): [url]http://bugs.php.net/fix.php?id=40762&r=trysnapshot52[/url]
    Try a CVS snapshot (PHP 6.0): [url]http://bugs.php.net/fix.php?id=40762&r=trysnapshot60[/url]
    Fixed in CVS: [url]http://bugs.php.net/fix.php?id=40762&r=fixedcvs[/url]
    Fixed in release: [url]http://bugs.php.net/fix.php?id=40762&r=alreadyfixed[/url]
    Need backtrace: [url]http://bugs.php.net/fix.php?id=40762&r=needtrace[/url]
    Need Reproduce Script: [url]http://bugs.php.net/fix.php?id=40762&r=needscript[/url]
    Try newer version: [url]http://bugs.php.net/fix.php?id=40762&r=oldversion[/url]
    Not developer issue: [url]http://bugs.php.net/fix.php?id=40762&r=support[/url]
    Expected behavior: [url]http://bugs.php.net/fix.php?id=40762&r=notwrong[/url]
    Not enough info: [url]http://bugs.php.net/fix.php?id=40762&r=notenoughinfo[/url]
    Submitted twice: [url]http://bugs.php.net/fix.php?id=40762&r=submittedtwice[/url]
    register_globals: [url]http://bugs.php.net/fix.php?id=40762&r=globals[/url]
    PHP 3 support discontinued: [url]http://bugs.php.net/fix.php?id=40762&r=php3[/url]
    Daylight Savings: [url]http://bugs.php.net/fix.php?id=40762&r=dst[/url]
    IIS Stability: [url]http://bugs.php.net/fix.php?id=40762&r=isapi[/url]
    Install GNU Sed: [url]http://bugs.php.net/fix.php?id=40762&r=gnused[/url]
    Floating point limitations: [url]http://bugs.php.net/fix.php?id=40762&r=float[/url]
    No Zend Extensions: [url]http://bugs.php.net/fix.php?id=40762&r=nozend[/url]
    MySQL Configuration Error: [url]http://bugs.php.net/fix.php?id=40762&r=mysqlcfg[/url]
    forward at hongyu dot org Guest

  2. Similar Questions and Discussions

    1. Having CF parse another file type
      I'm running IIS 5, Windows 2000, and MX. I can't seem to get it to work. Lets say I have a new file type I named ".xyz". How do I get Cold Fusion...
    2. File-format Module Cannot Parse the File?
      A family member is sending me photos in Compu-Serve GIF format and they are averaging 3-4 MB per photo. I don't think she knows about compression....
    3. Can't parse Illustrator file
      Have saved illustrator file as .ai file extension and when tried to open in Photoshop v5, got message "could not open the document because the parser...
    4. Parse Text File and Output to File
      I am using Perl to parse a text file and output to another file. The text file has data on Unix virtual memory (vmstat) and I want to delete lines...
    5. Parse HTML file, output to csv file
      Message-ID: <2d0bce63.0307041035.78a2bd75@posting.google.com> from riddlermarc contained the following: If you only want to do this...
  3. #2

    Default #40762 [Opn->Bgs]: xml_parser failed to parse mixed coding file

    ID: 40762
    Updated by: [email]chregu@php.net[/email]
    Reported By: forward at hongyu dot org
    -Status: Open
    +Status: Bogus
    Bug Type: *XML functions
    Operating System: Linux and Windows
    PHP Version: 5.2.1
    New Comment:

    You have to change this line in the XML, too

    <?xml version="1.0" encoding="gb2312" ?>




    Previous Comments:
    ------------------------------------------------------------------------

    [2007-03-08 21:56:57] forward at hongyu dot org

    Description:
    ------------
    My RSS parser failed after I upgrade the PHP version on my server from
    4.x to 5.2. When I debugged the code, I found the error was caused by
    the xml_parse() function's failure to parse the UTF-8 encoded RSS
    message, which is originally converted from a GB18030 string.

    The error message looks like:
    "Warning: xml_parse() [function.xml-parse]: input conversion failed due
    to input error, bytes 0x9B 0xE6 ..."

    The orginal GB encoded string consists of Chinese characters, but I
    converted it to UTF-8 coding using function iconv(). I can view the
    converted string correctly on web browsers, which means that there is
    no converting error. So the failure only comes from xml_parse()
    function, I believe.

    For your testing purpose, an example of the original GB18030 string can
    be downloaded at [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]

    Thanks!



    Reproduce code:
    ---------------
    // variable $gb contains the GB encoded string, e.g., from
    // web address [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]

    // variable $utf contains the UTF-8 string converted from
    // the original GB encoded string

    $urf = iconv('GB18030','UTF-8', $gb);

    // function feed_start_end and feed_end_element etc. are from
    // the package Magpierss [url]http://magpierss.sourceforge.net/[/url]

    xml_set_object( $parser, $this );
    xml_set_element_handler($parser,
    'feed_start_element', 'feed_end_element' );

    xml_set_character_data_handler( $parser, 'feed_cdata' );

    $status = xml_parse( $parser, $utf );


    Expected result:
    ----------------
    No error message.

    Actual result:
    --------------
    Warning: xml_parse() [function.xml-parse]: input conversion failed due
    to input error, bytes 0x9B 0xE6 ...



    ------------------------------------------------------------------------


    --
    Edit this bug report at [url]http://bugs.php.net/?id=40762&edit=1[/url]
    chregu@php.net Guest

  4. #3

    Default #40762 [Bgs]: xml_parser failed to parse mixed coding file

    ID: 40762
    User updated by: forward at hongyu dot org
    Reported By: forward at hongyu dot org
    Status: Bogus
    Bug Type: *XML functions
    Operating System: Linux and Windows
    PHP Version: 5.2.1
    New Comment:

    Exactly what you said. Thanks a lot!


    Previous Comments:
    ------------------------------------------------------------------------

    [2007-03-09 06:11:13] [email]chregu@php.net[/email]

    You have to change this line in the XML, too

    <?xml version="1.0" encoding="gb2312" ?>



    ------------------------------------------------------------------------

    [2007-03-08 21:56:57] forward at hongyu dot org

    Description:
    ------------
    My RSS parser failed after I upgrade the PHP version on my server from
    4.x to 5.2. When I debugged the code, I found the error was caused by
    the xml_parse() function's failure to parse the UTF-8 encoded RSS
    message, which is originally converted from a GB18030 string.

    The error message looks like:
    "Warning: xml_parse() [function.xml-parse]: input conversion failed due
    to input error, bytes 0x9B 0xE6 ..."

    The orginal GB encoded string consists of Chinese characters, but I
    converted it to UTF-8 coding using function iconv(). I can view the
    converted string correctly on web browsers, which means that there is
    no converting error. So the failure only comes from xml_parse()
    function, I believe.

    For your testing purpose, an example of the original GB18030 string can
    be downloaded at [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]

    Thanks!



    Reproduce code:
    ---------------
    // variable $gb contains the GB encoded string, e.g., from
    // web address [url]http://www.la-chinese.com/forum/rss.php?f=19[/url]

    // variable $utf contains the UTF-8 string converted from
    // the original GB encoded string

    $urf = iconv('GB18030','UTF-8', $gb);

    // function feed_start_end and feed_end_element etc. are from
    // the package Magpierss [url]http://magpierss.sourceforge.net/[/url]

    xml_set_object( $parser, $this );
    xml_set_element_handler($parser,
    'feed_start_element', 'feed_end_element' );

    xml_set_character_data_handler( $parser, 'feed_cdata' );

    $status = xml_parse( $parser, $utf );


    Expected result:
    ----------------
    No error message.

    Actual result:
    --------------
    Warning: xml_parse() [function.xml-parse]: input conversion failed due
    to input error, bytes 0x9B 0xE6 ...



    ------------------------------------------------------------------------


    --
    Edit this bug report at [url]http://bugs.php.net/?id=40762&edit=1[/url]
    forward at hongyu dot org Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139