Ask a Question related to PERL Miscellaneous, Design and Development.
-
Brian #1
XML ( WML in particular)
Hi,
I have a script that generates some WML, and I would like this to be written
to a file using the UTF-8 characterset. Currently it is just written
normally , with the affect that my phone wont recognise it.
How is this done in Perl?
Thanks
Brian Guest
-
Shambo #2
Re: XML ( WML in particular)
"Brian" <b.dara@tester.com> wrote in message news:<xW3ab.1961$DM5.18462@newsfep4-glfd.server.ntli.net>...
open (FILE, ">:utf8", "/somepath/somefile.wml")> Hi,
>
> I have a script that generates some WML, and I would like this to be written
> to a file using the UTF-8 characterset. Currently it is just written
> normally , with the affect that my phone wont recognise it.
>
> How is this done in Perl?
>
> Thanks
This will write your file as UTF-8, change "somepath" to your target
path and file name.
(Slightly OT to Perl) WML is very unforgiving with its range of
displayable characters. Others here may disagree with this method, but
I've found it to be very helpful.
s/([\x{80}-\x{FFFF}])/'&#' . ord($1) . ';'/gse;
That line will change characters above the first 128 ASCII characters
to their numeric equivalent, which WML should be able to display. I
have not personally encountered very high range characters in my WML,
such as "?" and other ligature type characters, but letters with
accents and umlauts display fine in WML when they are converted to
their numeric equivalent.
Also (off-topic to Perl), make sure you keep your file sizes under
1400 bytes. Older phone browsers don't support larger bytestreams.
Shambo Guest
-
Alan J. Flavell #3
Re: XML ( WML in particular)
[administrivia at end]
On Thu, Sep 18, Shambo inscribed on the eternal scroll:
but since you insist on raising it, don't be too surprised to get a> (Slightly OT to Perl)
response ( - and now both of us should not be too surprised to be told
to get ourselves hence to a more appropriate newsgroup...)
Could you be more specific? I didn't have time to read the license> WML is very unforgiving with its range of displayable characters.
terms and consult with my employer and their lawyers whether I was OK
to accept them, so I wasn't able to get to the actual WML
specification at wapforum, but I gathered from what was written here
that utf-8 was wanted, and I answered on that basis.
But what benefit does this bring relative to utf-8 coding, if utf-8> Others here may disagree with this method, but
> I've found it to be very helpful.
>
> s/([\x{80}-\x{FFFF}])/'&#' . ord($1) . ';'/gse;
coding is part of the specification?
By the way, where does your data come from, what encoding was it in,
did you remember (obPerl) to tell perl what encoding it was in when
you read or otherwise retrieved it?
I see two things wrong already. The Unicode characters from x80 to
x9f are obscure control characters, not displayable glyphs; if you get
any of those as display characters then something is wrong (probably
Bill Gates got a bit too close to your WML) and presumably you should
reject them, if WML is meant to be as unforgiving as you say.
And then Unicode no longer stops at xffff, there are characters beyond
there. I don't know WML well enough, as you can see, but if it allows
them then why wouldn't you support them; but if they aren't, then why
wouldn't you want to report them as an error?
Ahem, *bogosity alert*. ASCII is a 7-bit code, the "first 128 ASCII> That line will change characters above the first 128 ASCII characters
characters" is all that there are.
So it should, but if utf-8 encoding is expected, why not use it? It's> to their numeric equivalent, which WML should be able to display.
not as if the max document size is generous, so I'd say use the most
compact representation that you are able to. What am I missing?
^^^> have not personally encountered very high range characters in my WML,
> such as "?" and other ligature type characters,
eh?? Quick check of your headers shows
| Content-Type: text/plain; charset=ISO-8859-1
Whatever it was that you were trying to type, it seems iso-8859-1
doesn't have it on offer.
Yup, that was my point above.> Also (off-topic to Perl), make sure you keep your file sizes under
> 1400 bytes.
cheers
--
administrivia: my usual news posting server seems to be suffering some
kind of constipation - my attempt to post yesterday is still nowhere
to be seen. My apologies if it turns up later.
Alan J. Flavell Guest
-
Shambo #4
Re: XML ( WML in particular)
> > WML is very unforgiving with its range of displayable characters.
I should have been more clear. WML only allows 7 NAMED references to>
> Could you be more specific?
be used: " & < > ' Anything else,
such as ¢ will cause the file not to display. But, as you
indicated, it shouldn't matter since WML does support UTF-8 encoding
(I did not know that before today).
D'oh!>> > That line will change characters above the first 128 ASCII characters
> Ahem, *bogosity alert*. ASCII is a 7-bit code, the "first 128 ASCII
> characters" is all that there are.
Obviously, the encoding concept is still taking a while to seep into
my brain.
I probably should have just stopped at "open (FILE, ">:utf8",
"/somepath/somefile.wml")", but you're right to say Brian should know
what his source encoding is (see, Alan, I do listen to you ;-).
Shambo Guest
-
Alan J. Flavell #5
Re: XML ( WML in particular)
On Fri, 19 Sep 2003, Shambo wrote:
Thanks ;-)> I should have been more clear. WML only allows 7 NAMED references to
> be used: " & < > ' Anything else,
> such as ¢ will cause the file not to display.
Well, neither did I, to be honest; I was basing my reply on what I'd> But, as you
> indicated, it shouldn't matter since WML does support UTF-8 encoding
> (I did not know that before today).
been told on this thread. (I already made my protest against
specifications that refuse to be read without taking legal opinion
first...)
Oh, there's a lot of it about, don't take it personally if I> D'oh!
>
> Obviously, the encoding concept is still taking a while to seep into
> my brain.
occasionally get a bit sharp...
well, as long as you mention that older Perls don't have this...> I probably should have just stopped at "open (FILE, ">:utf8",
... or die "could not open output file $!";> "/somepath/somefile.wml")",
It's OK, I'm just trying to share what I know. Character coding isn't> but you're right to say Brian should know
> what his source encoding is (see, Alan, I do listen to you ;-).
hard if one can come to it with a clear head and get a clear mental
picture. I must admit that I had the early discipline (in the 1970's)
of a machine which used a character coding that was used almost
nowhere else, and so it was taken for granted that all input would
have to be translated from its external coding, whatever it might be,
and all output would have to be translated to a user-defined output
coding. That sets a challenge which somehow never went away. RFC2070
was nice ;-) if you care for that sort of thing.
It's so frustrating when folks come to it with some misunderstood
picture of custom fonts and DOS codepages and I don't know what else
all jumbled up together. But don't mind me...
cheers
Alan J. Flavell Guest



Reply With Quote

