XML ( WML in particular)

Ask a Question related to PERL Miscellaneous, Design and Development.

  1. #1

    Default XML ( WML in particular)

    Hi,

    I have a script that generates some WML, and I would like this to be written
    to a file using the UTF-8 characterset. Currently it is just written
    normally , with the affect that my phone wont recognise it.

    How is this done in Perl?

    Thanks


    Brian Guest

  2. #2

    Default Re: XML ( WML in particular)

    "Brian" <b.dara@tester.com> wrote in message news:<xW3ab.1961$DM5.18462@newsfep4-glfd.server.ntli.net>...
    > Hi,
    >
    > I have a script that generates some WML, and I would like this to be written
    > to a file using the UTF-8 characterset. Currently it is just written
    > normally , with the affect that my phone wont recognise it.
    >
    > How is this done in Perl?
    >
    > Thanks
    open (FILE, ">:utf8", "/somepath/somefile.wml")

    This will write your file as UTF-8, change "somepath" to your target
    path and file name.

    (Slightly OT to Perl) WML is very unforgiving with its range of
    displayable characters. Others here may disagree with this method, but
    I've found it to be very helpful.

    s/([\x{80}-\x{FFFF}])/'&#' . ord($1) . ';'/gse;

    That line will change characters above the first 128 ASCII characters
    to their numeric equivalent, which WML should be able to display. I
    have not personally encountered very high range characters in my WML,
    such as "?" and other ligature type characters, but letters with
    accents and umlauts display fine in WML when they are converted to
    their numeric equivalent.

    Also (off-topic to Perl), make sure you keep your file sizes under
    1400 bytes. Older phone browsers don't support larger bytestreams.
    Shambo Guest

  3. #3

    Default Re: XML ( WML in particular)


    [administrivia at end]

    On Thu, Sep 18, Shambo inscribed on the eternal scroll:
    > (Slightly OT to Perl)
    but since you insist on raising it, don't be too surprised to get a
    response ( - and now both of us should not be too surprised to be told
    to get ourselves hence to a more appropriate newsgroup...)
    > WML is very unforgiving with its range of displayable characters.
    Could you be more specific? I didn't have time to read the license
    terms and consult with my employer and their lawyers whether I was OK
    to accept them, so I wasn't able to get to the actual WML
    specification at wapforum, but I gathered from what was written here
    that utf-8 was wanted, and I answered on that basis.
    > Others here may disagree with this method, but
    > I've found it to be very helpful.
    >
    > s/([\x{80}-\x{FFFF}])/'&#' . ord($1) . ';'/gse;
    But what benefit does this bring relative to utf-8 coding, if utf-8
    coding is part of the specification?

    By the way, where does your data come from, what encoding was it in,
    did you remember (obPerl) to tell perl what encoding it was in when
    you read or otherwise retrieved it?

    I see two things wrong already. The Unicode characters from x80 to
    x9f are obscure control characters, not displayable glyphs; if you get
    any of those as display characters then something is wrong (probably
    Bill Gates got a bit too close to your WML) and presumably you should
    reject them, if WML is meant to be as unforgiving as you say.

    And then Unicode no longer stops at xffff, there are characters beyond
    there. I don't know WML well enough, as you can see, but if it allows
    them then why wouldn't you support them; but if they aren't, then why
    wouldn't you want to report them as an error?
    > That line will change characters above the first 128 ASCII characters
    Ahem, *bogosity alert*. ASCII is a 7-bit code, the "first 128 ASCII
    characters" is all that there are.
    > to their numeric equivalent, which WML should be able to display.
    So it should, but if utf-8 encoding is expected, why not use it? It's
    not as if the max document size is generous, so I'd say use the most
    compact representation that you are able to. What am I missing?
    > have not personally encountered very high range characters in my WML,
    > such as "?" and other ligature type characters,
    ^^^

    eh?? Quick check of your headers shows

    | Content-Type: text/plain; charset=ISO-8859-1

    Whatever it was that you were trying to type, it seems iso-8859-1
    doesn't have it on offer.
    > Also (off-topic to Perl), make sure you keep your file sizes under
    > 1400 bytes.
    Yup, that was my point above.

    cheers

    --

    administrivia: my usual news posting server seems to be suffering some
    kind of constipation - my attempt to post yesterday is still nowhere
    to be seen. My apologies if it turns up later.
    Alan J. Flavell Guest

  4. #4

    Default Re: XML ( WML in particular)

    > > WML is very unforgiving with its range of displayable characters.
    >
    > Could you be more specific?
    I should have been more clear. WML only allows 7 NAMED references to
    be used: &quot; &amp; &lt; &gt; &apos; &nbsp; ­ Anything else,
    such as &cent; will cause the file not to display. But, as you
    indicated, it shouldn't matter since WML does support UTF-8 encoding
    (I did not know that before today).
    > > That line will change characters above the first 128 ASCII characters
    >
    > Ahem, *bogosity alert*. ASCII is a 7-bit code, the "first 128 ASCII
    > characters" is all that there are.
    D'oh!

    Obviously, the encoding concept is still taking a while to seep into
    my brain.

    I probably should have just stopped at "open (FILE, ">:utf8",
    "/somepath/somefile.wml")", but you're right to say Brian should know
    what his source encoding is (see, Alan, I do listen to you ;-).
    Shambo Guest

  5. #5

    Default Re: XML ( WML in particular)

    On Fri, 19 Sep 2003, Shambo wrote:
    > I should have been more clear. WML only allows 7 NAMED references to
    > be used: &quot; &amp; &lt; &gt; &apos; &nbsp; ­ Anything else,
    > such as &cent; will cause the file not to display.
    Thanks ;-)
    > But, as you
    > indicated, it shouldn't matter since WML does support UTF-8 encoding
    > (I did not know that before today).
    Well, neither did I, to be honest; I was basing my reply on what I'd
    been told on this thread. (I already made my protest against
    specifications that refuse to be read without taking legal opinion
    first...)
    > D'oh!
    >
    > Obviously, the encoding concept is still taking a while to seep into
    > my brain.
    Oh, there's a lot of it about, don't take it personally if I
    occasionally get a bit sharp...
    > I probably should have just stopped at "open (FILE, ">:utf8",
    well, as long as you mention that older Perls don't have this...
    > "/somepath/somefile.wml")",
    ... or die "could not open output file $!";
    > but you're right to say Brian should know
    > what his source encoding is (see, Alan, I do listen to you ;-).
    It's OK, I'm just trying to share what I know. Character coding isn't
    hard if one can come to it with a clear head and get a clear mental
    picture. I must admit that I had the early discipline (in the 1970's)
    of a machine which used a character coding that was used almost
    nowhere else, and so it was taken for granted that all input would
    have to be translated from its external coding, whatever it might be,
    and all output would have to be translated to a user-defined output
    coding. That sets a challenge which somehow never went away. RFC2070
    was nice ;-) if you care for that sort of thing.

    It's so frustrating when folks come to it with some misunderstood
    picture of custom fonts and DOS codepages and I don't know what else
    all jumbled up together. But don't mind me...

    cheers
    Alan J. Flavell Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139