Professional Web Applications Themes

[XML::Simple-2.12] problems parsing non ASCII strings - PERL Modules

Jul wrote: > module: XML::Simple-2.12 (also tried 2.14) > perl version: 5.00503 Wahouh! Do you know how old this is? 5, 6 years old? > I need to p and write a XML configuration file wich contains > non-ASCII caraters (like 'é', in french). > I've choosen, XML::Simple with XML::Pr for these tasks, but everything > works fine if and only if I do not include any special carater in the > file, otherwise the HASH returned by XMLin() is totaly messed up. What is the encoding of your file? My guess is that it is in either ISO-8859-1 (or ...

  1. #1

    Default Re: [XML::Simple-2.12] problems parsing non ASCII strings

    Jul wrote:
    > module: XML::Simple-2.12 (also tried 2.14)
    > perl version: 5.00503
    Wahouh! Do you know how old this is? 5, 6 years old?
    > I need to p and write a XML configuration file wich contains
    > non-ASCII caraters (like 'é', in french).
    > I've choosen, XML::Simple with XML::Pr for these tasks, but everything
    > works fine if and only if I do not include any special carater in the
    > file, otherwise the HASH returned by XMLin() is totaly messed up.
    What is the encoding of your file? My guess is that it is in either
    ISO-8859-1 (or -15) or some kind of windows-12nn

    What happens is that the data is read, probably by expat, and converted
    to UTF-8. The "totaly messed up" characters are in fact perfectly valid
    UTF-8 characters, that your terminal (or whatever you use to display
    them) is not set to display.

    If XML::Simple can read it then the encoding must be declared in the XML
    declaration, at the beginning of the XML file.

    Your choices are either to convert those characters back to the original
    encoding, look at the Unicode::* modules on CPAN, or to bite the Unicode
    bullet and learn how to work with UTF-8 data. In the long run the second
    option makes more sense, but YMMV.

    But really, processing XML with perl 5.00503 seems like a bad idea to me.

    --
    mirod
    Michel Rodriguez Guest

  2. #2

    Default [XML::Simple-2.12] problems parsing non ASCII strings

    module: XML::Simple-2.12 (also tried 2.14)
    perl version: 5.00503


    Hello,

    I need to p and write a XML configuration file wich contains
    non-ASCII caraters (like 'é', in french).
    I've choosen, XML::Simple with XML::Pr for these tasks, but everything
    works fine if and only if I do not include any special carater in the
    file, otherwise the HASH returned by XMLin() is totaly messed up.
    Below is the comparison of the configuration file 'website.xml'

    Thank you for any help you can provide.


    Julien


    # website.xml

    <opt>
    <contact>
    <email>personfoo.ext</email>
    <label>Informations générales</label>
    </contact>
    <contact>
    <email>personfoo.ext</email>
    <label>Directeur de collection</label>
    </contact>
    <contact>
    <email>personfoo.ext</email>
    <label>Webmestre</label>
    </contact>
    <sitename>Full name</sitename>
    </opt>


    # Data::Dump of the returned HASH ref

    {
    contact => {
    email => "person\foo.ext",
    label => "Informations g\n \n \n person\foo.ext\n Directeur de collection\n \n \n person\foo.ext\n Webmestre\n \n Full name\n",
    },
    }
    Jul Guest

  3. #3

    Default Re: [XML::Simple-2.12] problems parsing non ASCII strings

    Le Tue, 12 Jul 2005 19:16:53 +0200, Michel Rodriguez a écrit:
    > Jul wrote:
    >> module: XML::Simple-2.12 (also tried 2.14)
    >> perl version: 5.00503
    >
    > Wahouh! Do you know how old this is? 5, 6 years old?
    I know it's very very old, that's why I mentionned it, I'm looking for a
    way to trick it, like I did for other perl5.6 modules used :o)
    I guess we can sometimes rename "hosting solutions" to "hosting problems",
    but it would be less attractive to the custommer ;-)
    >> I need to p and write a XML configuration file wich contains
    >> non-ASCII caraters (like 'é', in french). I've choosen, XML::Simple
    >> with XML::Pr for these tasks, but everything works fine if and only
    >> if I do not include any special carater in the file, otherwise the HASH
    >> returned by XMLin() is totaly messed up.
    >
    > What is the encoding of your file? My guess is that it is in either
    > ISO-8859-1 (or -15) or some kind of windows-12nn
    >
    > What happens is that the data is read, probably by expat, and converted
    > to UTF-8. The "totaly messed up" characters are in fact perfectly valid
    > UTF-8 characters, that your terminal (or whatever you use to display
    > them) is not set to display.
    >
    > If XML::Simple can read it then the encoding must be declared in the XML
    > declaration, at the beginning of the XML file.
    The default encoding protocol should be ISO-8859-1 or -15, that's why I
    expected to retreive the same encoding type.
    With the encoding attribute set in the declaration, it goes better, yo'ure
    right, and I've been surprised to see that UTF-8 is also supported, even
    with perl 5.005 :-)
    > Your choices are either to convert those characters back to the original
    > encoding, look at the Unicode::* modules on CPAN, or to bite the Unicode
    > bullet and learn how to work with UTF-8 data. In the long run the second
    > option makes more sense, but YMMV.
    Now, the original caracter is displayed as ISO-8859-15, but coded
    with UTF-8. You're right again! lol
    At this time, I wonder wether UTF-8 is the default ct or wether there
    is an option available for XML::Simple or XML::Pr. I took a look into
    those modules doentation but didn't get much.
    Otherwise, I'll try to convert data outside XML::Simple.
    > But really, processing XML with perl 5.00503 seems like a bad idea to me.
    I agree with you, but I have no choice right now. I got perl 5.005 in one
    hand and a project to rise on the other. Here is what I have to deal with.
    Maybe another way to p a configuration file would be easier, but I
    like the idea to have a reason to play with XML, and I didn't really found
    what I want with the modules previously tested.


    Thank you very much for your help, it's been really usefull to me.


    Julien
    Jul Guest

  4. #4

    Default Re: [XML::Simple-2.12] problems parsing non ASCII strings

    Jul wrote:
    > Now, the original caracter is displayed as ISO-8859-15, but coded
    > with UTF-8. You're right again! lol
    > At this time, I wonder wether UTF-8 is the default ct or wether there
    > is an option available for XML::Simple or XML::Pr. I took a look into
    > those modules doentation but didn't get much.
    > Otherwise, I'll try to convert data outside XML::Simple.
    There is no easy way to get back to the original encoding in
    XML::Simple. To get the file written back as ISO-8859-15 you can pipe
    the output through iconv.

    <plug mode="shameless">You could also use XML::Twig:
    my $options= { ...}; # XML::Simple options
    my $twig= XML::Twig->new( keep_encoding => 1)
    ->pfile( $file)
    ->root
    ->simplify
    ;

    This will do exactly the same thing as XMLin, except for the bit where
    it keeps the original encoding.
    </plug>

    Does it help?

    --
    mirod
    Michel Rodriguez Guest

  5. #5

    Default Re: [XML::Simple-2.12] problems parsing non ASCII strings

    Michel Rodriguez a émis l'idée suivante :
    > Jul wrote:
    >
    >> Now, the original caracter is displayed as ISO-8859-15, but coded
    >> with UTF-8. You're right again! lol
    >> At this time, I wonder wether UTF-8 is the default ct or wether there
    >> is an option available for XML::Simple or XML::Pr. I took a look into
    >> those modules doentation but didn't get much.
    >> Otherwise, I'll try to convert data outside XML::Simple.
    >
    > There is no easy way to get back to the original encoding in XML::Simple. To
    > get the file written back as ISO-8859-15 you can pipe the output through
    > iconv.
    >
    > <plug mode="shameless">You could also use XML::Twig:
    > my $options= { ...}; # XML::Simple options
    > my $twig= XML::Twig->new( keep_encoding => 1)
    > ->pfile( $file)
    > ->root
    > ->simplify
    > ;
    >
    > This will do exactly the same thing as XMLin, except for the bit where it
    > keeps the original encoding.
    > </plug>
    >
    > Does it help?

    Hello Michel,

    I've let down ISO-8859-15 for UTF-8.
    As the web browser interface tranfers text fields strings in the page
    encoding, I've set it to utf-8.

    Thank you again,


    Julien

    --
    Jul... réapparru comme par enchantement

    Jul Guest

Similar Threads

  1. parsing strings - question
    By edhusar in forum Macromedia ColdFusion
    Replies: 2
    Last Post: May 25th, 11:39 PM
  2. Simple Question: Converting lists to strings
    By lingo smith in forum Macromedia Director Basics
    Replies: 6
    Last Post: April 17th, 04:04 PM
  3. newbie simple strings question
    By Frank in forum Mac Programming
    Replies: 3
    Last Post: October 1st, 05:29 PM
  4. Parsing Comma Delimited Strings
    By Word of Mouth Productions in forum Macromedia Director Lingo
    Replies: 5
    Last Post: September 17th, 08:38 PM
  5. parsing large (>1gb) strings..how?
    By Patrick Cotner in forum PERL Miscellaneous
    Replies: 2
    Last Post: July 31st, 12:30 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139