Ask a Question related to PERL Modules, Design and Development.
-
Ben Bullock #1
CGI.pm: encoding problems
I have a problem with inputing utf-8 via a text window using CGI.pm. This
problem concerns UTF8 so apologies for posting something with Chinese
characters in it.
The following code is a minimal working example of the problem with a lot of
extraneous material removed. It needs to be run under a web server to see
the problem. When the text is submitted using the form, the default text of
Chinese characters (they are the numbers from one to four) are munged into
some gibberish stuff, and the test of the input, which checks whether the
input is valid Chinese numerals, fails:
Input text:
一二三四
Output of program:
Input ä¸äºä¸å was not a valid number
Thank you very much for any assistance, suggestions or advice about this
problem.
#!/usr/bin/perl>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Begin script (to end of message)
use warnings;
use strict;
use CGI;
use utf8;
binmode (STDOUT, ":utf8");
my $query = CGI->new();
$query->charset('UTF-8');
print $query->header();
my $kanji;
if ($query->param('kanji')) {
my $inputnumber = $query->param('kanji');
if ($inputnumber =~ /^([一二三四五*七八九十]+)$/) {
$kanji = $1;
} else {
print "<p>Input $inputnumber was not a valid number</p>";
$kanji = "";
}
} else {
$kanji = "一二三四";
}
print $query->start_form(-method => 'POST',-action => $query->url());
print $query->textarea(-name => 'kanji',
-default => $kanji);
print $query->submit();
print $query->endform();
print "<table><tr>\n<th>Value</th><td>",
$kanji, "</td></tr>\n", "</table>\n</form>\n<p>\n";
print $query->end_html();
Ben Bullock Guest
-
problems encoding hebrew text from access 97
Hello, I've moved a web site that I've built from a server that uses CF 5 to a server that uses CF MX. As seen in the page -... -
Encoding Problems
Hi, I'm using the MSXML4 XMLHTTP object to send soap messages. Yesterday all worked fine, and I think the installation og either XP SP2 or .net... -
encoding problems with MX
i dont know if there is a way to do it or a patch to fix it or how many times this question has been asked here but i will ask it. how can i use... -
FileMaker and encoding problems
Hi, My friend is using filemaker to store his workes. Using their working days & hours, personal info. He has an encoding problem. Can anyone... -
Apache character encoding problems [ FIXED; thanks ]
Thanks to the guys who answered in private (but do answer to the list next time!) I had to comment out this line: AddDefaultCharset on in... -
Dr.Ruud #2
Re: CGI.pm: encoding problems
Ben Bullock schreef:
Try to replace those 5 lines with these (reordered) 4:> use warnings;
> use strict;
> use CGI;
> use utf8;
> binmode (STDOUT, ":utf8");
use strict;
use warnings;
use encoding 'utf8' ;
use CGI;
This would also set the PerlIO layer of STDIN to ':utf8'.
See perldoc encoding.
--
Affijn, Ruud
"Gewoon is een tijger."
Dr.Ruud Guest
-
Mumia W. #3
Re: CGI.pm: encoding problems
Ben Bullock wrote:
I made a few changes to your program. I don't know exactly what the> I have a problem with inputing utf-8 via a text window using CGI.pm.
> This problem concerns UTF8 so apologies for posting something with
> Chinese characters in it.
>
> The following code is a minimal working example of the problem with a
> lot of extraneous material removed. It needs to be run under a web
> server to see the problem. When the text is submitted using the form,
> the default text of Chinese characters (they are the numbers from one to
> four) are munged into some gibberish stuff, and the test of the input,
> which checks whether the input is valid Chinese numerals, fails:
>
> Input text:
>
> 一二三四
>
> Output of program:
>
> Input ä¸äºä¸å was not a valid number
>
> Thank you very much for any assistance, suggestions or advice about this
> problem.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Begin script (to end of message)
> #!/usr/bin/perl
> use warnings;
> use strict;
> use CGI;
> use utf8;
> binmode (STDOUT, ":utf8");
> my $query = CGI->new();
> $query->charset('UTF-8');
> print $query->header();
> my $kanji;
> if ($query->param('kanji')) {
> my $inputnumber = $query->param('kanji');
> if ($inputnumber =~ /^([一二三四五*七八九十]+)$/) {
> $kanji = $1;
> } else {
> print "<p>Input $inputnumber was not a valid number</p>";
> $kanji = "";
> }
> } else {
> $kanji = "一二三四";
> }
> print $query->start_form(-method => 'POST',-action => $query->url());
> print $query->textarea(-name => 'kanji',
> -default => $kanji);
> print $query->submit();
> print $query->endform();
> print "<table><tr>\n<th>Value</th><td>",
> $kanji, "</td></tr>\n", "</table>\n</form>\n<p>\n";
> print $query->end_html();
>
problem is, but I hope that this sheds some light on it:
#!/usr/bin/perl
use warnings;
use strict;
use CGI;
use utf8;
use Encode (); # changed
binmode (STDOUT, ":utf8");
my $query = CGI->new();
$query->charset('UTF-8');
print $query->header('-cache-control' => 'no-cache'); # changed
my $kanji;
if ($query->param('kanji')) {
my $inputnumber = $query->param('kanji');
print <<EOF;
<p> Interesting decodings of
"$inputnumber" <br>
UTF-8: @{[ Encode::decode('utf8', $inputnumber) ]} <br>
</p>
<hr>
EOF
# Add this to decode the number:
$inputnumber = Encode::decode('utf8', $inputnumber);
if ($inputnumber =~ /^([一二三四五*七八九十]+)$/) {
$kanji = $1;
} else {
print "<p>Input $inputnumber was not a valid number</p>";
$kanji = "";
}
} else {
$kanji = "一二三四";
}
print <<EOF;
<p> The value if \$kanji is: $kanji
</p>
EOF
print $query->start_form(
-method => 'POST',
-action => $query->url()
);
print $query->textarea(-name => 'kanji',
-default => $kanji);
print <<EOF;
<textarea name=alternate>
DATA = $kanji
</textarea>
EOF
print $query->submit();
print $query->endform();
print "<table><tr>\n<th>Value</th><td>",
$kanji, "</td></tr>\n", "</table>\n</form>\n<p>\n";
print $query->end_html();
Mumia W. Guest
-
Mumia W. #4
Re: CGI.pm: encoding problems
Dr.Ruud wrote:
I still get the problem when running Ben's program. The problem is that> Ben Bullock schreef:
>>>> use warnings;
>> use strict;
>> use CGI;
>> use utf8;
>> binmode (STDOUT, ":utf8");
> Try to replace those 5 lines with these (reordered) 4:
>
> use strict;
> use warnings;
> use encoding 'utf8' ;
> use CGI;
>
> This would also set the PerlIO layer of STDIN to ':utf8'.
>
> See perldoc encoding.
>
using the CGI module to initialize the textarea works the first time and
not the second; however, bypassing CGI.pm and writing the textarea
directly using print seems to work consistently.
The bug might be logic related, but it's more likely CGI.pm-related.
There is a "hint" that the CGI.pm on my Sarge system is not UTF-8 ready.
This appears at the top of every page of output:
<?xml version="1.0" encoding="iso-8859-1"?>
This happens even when the HTTP header says utf8.
Mumia W. Guest
-
Ben Bullock #5
Re: CGI.pm: encoding problems
Thanks to Dr. Ruud and Mumia W. for their replies. Thanks to Dr. Ruud I was
able to get this working, but I also noticed a couple of interesting
phenomena in debugging this program. As Mumia W. says the text in the box is
done incorrectly. Also, if I use my own "<input" box the input is mangled,
and if I use the "straight" function calls of CGI.pm rather than the
object-oriented ones, things stop working again, so it does look rather like
there is something wrong inside CGI.pm. If anyone is interested, let me know
and I'll post example code.
Thanks again.
Ben Bullock Guest
-
Mumia W. #6
Re: CGI.pm: encoding problems
Ben Bullock wrote:
How were you able to get it working? Re-ordering the prologue and using> Thanks to Dr. Ruud and Mumia W. for their replies. Thanks to Dr. Ruud I
> was able to get this working, but I also noticed a couple of interesting
> phenomena in debugging this program. As Mumia W. says the text in the
> box is done incorrectly. Also, if I use my own "<input" box the input is
> mangled, and if I use the "straight" function calls of CGI.pm rather
> than the object-oriented ones, things stop working again, so it does
> look rather like there is something wrong inside CGI.pm. If anyone is
> interested, let me know and I'll post example code.
>
> Thanks again.
>
utf8 didn't work for me.
Mumia W. Guest
-
Mumia W. #7
Re: CGI.pm: encoding problems
Ben Bullock wrote:
It's not a bug; it's a feature ;)> Thanks to Dr. Ruud and Mumia W. for their replies. Thanks to Dr. Ruud I
> was able to get this working, but I also noticed a couple of interesting
> phenomena in debugging this program. As Mumia W. says the text in the
> box is done incorrectly. Also, if I use my own "<input" box the input is
> mangled, and if I use the "straight" function calls of CGI.pm rather
> than the object-oriented ones, things stop working again, so it does
> look rather like there is something wrong inside CGI.pm. If anyone is
> interested, let me know and I'll post example code.
>
> Thanks again.
>
For whatever reason, on my system, CGI.pm always interprets the STDIN
data in raw mode, regardless of the script encoding, so form elements
have to be explicitly decoded.
And CGI.pm has a nifty feature that allows the programmer to
automatically create forms with the same values that were in the posted
data.
These two behaviors combine to create the problems you had. The
workarounds are to explicitly decode the form elements and to delete the
old form element before creating another one with the same name.
This program should demonstrate the issue and workarounds:
#!/usr/bin/perl
# kanji-2.cgi
use strict;
use warnings;
use encoding 'utf8';
use CGI ();
use CGI::Carp 'fatalsToBrowser';
$\ = "\n";
# Invoke this script without a query string to
# get the default (broken) behavior.
#
# Invoke this script with a query string of 'recode'
# to get the 'kanji' form element recoded into
# utf8. Example:
#
# [url]http://server.com/kanji-2.cgi?recode[/url]
#
# Or, if you want the old textarea data deleted
# upon successive invocations of the form, add
# a query string of 'delete' like so:
#
# [url]http://server.com/kanji-2.cgi?delete[/url]
my $RECODE_QUERY = 0;
my $DELETE_QUERY = 0;
$RECODE_QUERY = 1 if $ENV{QUERY_STRING} =~ m/recode/;
$DELETE_QUERY = 1 if $ENV{QUERY_STRING} =~ m/delete/;
my $kanji;
my $text;
my $query = new CGI;
print $query->header(
-type => 'text/html',
-charset => 'utf8',
);
print $query->start_html(
-title => 'Kanji Test',
-head => CGI::meta ({-http_equiv => 'Content-Type',
-content => 'text/html; charset=utf8' ,
}),
),
$query->h1('Kanji Test');
print <<EOF;
<p> Let's see if it's possible to send
and receive kanji numeric characters.
</p>
EOF
if (! defined $query->param('kanji')) {
$kanji = "一二三四";
} else {
$kanji = $query->param('kanji');
$kanji = Encode::decode('utf8', $kanji);
my $old_kanji = $query->param('kanji');
if ($RECODE_QUERY) {
$query->param('kanji', $kanji);
}
if ($DELETE_QUERY) {
$query->delete('kanji');
}
($text = <<EOF) =~ s/^\s*//mg;
<pre> The data received was:
ORIGINAL: $old_kanji
DECODED: $kanji
</pre>
EOF
print $text;
}
my $qs = '' eq $ENV{QUERY_STRING} ? '' :
"?$ENV{QUERY_STRING}" ;
print $query->start_form(
-method => 'POST',
-action => $query->url() . $qs );
print $query->textarea(
-name => 'kanji',
-default => $kanji,
);
print $query->submit();
print $query->end_form();
print $query->end_html;
Mumia W. Guest
-
harryfmudd [AT] comcast [DOT] net #8
Re: CGI.pm: encoding problems
Mumia W. wrote:
Interesting. I found that the following program blew up on the> Ben Bullock wrote:
>>>> Thanks to Dr. Ruud and Mumia W. for their replies. Thanks to Dr. Ruud
>> I was able to get this working, but I also noticed a couple of
>> interesting phenomena in debugging this program. As Mumia W. says the
>> text in the box is done incorrectly. Also, if I use my own "<input"
>> box the input is mangled, and if I use the "straight" function calls
>> of CGI.pm rather than the object-oriented ones, things stop working
>> again, so it does look rather like there is something wrong inside
>> CGI.pm. If anyone is interested, let me know and I'll post example code.
>>
>> Thanks again.
>>
> It's not a bug; it's a feature ;)
>
> For whatever reason, on my system, CGI.pm always interprets the STDIN
> data in raw mode, regardless of the script encoding, so form elements
> have to be explicitly decoded.
>
> And CGI.pm has a nifty feature that allows the programmer to
> automatically create forms with the same values that were in the posted
> data.
>
> These two behaviors combine to create the problems you had. The
> workarounds are to explicitly decode the form elements and to delete the
> old form element before creating another one with the same name.
>
> This program should demonstrate the issue and workarounds:
Encode::decode, but that $kanji_orig appeared to display correctly.
Also, the 'kanji' element displayed correctly even if I did not specify
a query string. Do we have a version problem? I'm
Perl 5.8.6
CGI.pm 3.20
OS: Darwin 7.9.0 (a.k.a. Mac OS X)
Server: Apache 1.3.33
Browser: Firefox 1.5.0.4 (though I doubt this has anything to do with it).
#!/usr/local/bin/perl># I found I got redundant meta headers with the original> # kanji-2.cgi
> use strict;
> use warnings;
> use encoding 'utf8';
> use CGI ();
> use CGI::Carp 'fatalsToBrowser';
>
> $\ = "\n";
>
> # Invoke this script without a query string to
> # get the default (broken) behavior.
> #
> # Invoke this script with a query string of 'recode'
> # to get the 'kanji' form element recoded into
> # utf8. Example:
> #
> # [url]http://server.com/kanji-2.cgi?recode[/url]
> #
> # Or, if you want the old textarea data deleted
> # upon successive invocations of the form, add
> # a query string of 'delete' like so:
> #
> # [url]http://server.com/kanji-2.cgi?delete[/url]
> my $RECODE_QUERY = 0;
> my $DELETE_QUERY = 0;
> $RECODE_QUERY = 1 if $ENV{QUERY_STRING} =~ m/recode/;
> $DELETE_QUERY = 1 if $ENV{QUERY_STRING} =~ m/delete/;
>
> my $kanji;
> my $text;
> my $query = new CGI;
>
> print $query->header(
> -type => 'text/html',
> -charset => 'utf8',
> );
>
# script, so:## -head => CGI::meta ({-http_equiv => 'Content-Type',> print $query->start_html(
> -title => 'Kanji Test',
## -content => 'text/html; charset=utf8' ,
## }),eval {$kanji = Encode::decode('utf8', $kanji)};> ),
> $query->h1('Kanji Test');
>
> print <<EOF;
> <p> Let's see if it's possible to send
> and receive kanji numeric characters.
> </p>
> EOF
>
> if (! defined $query->param('kanji')) {
>
> $kanji = "一二三四";
>
> } else {
>
> $kanji = $query->param('kanji');
$@ and $kanji = $@;Tom Wyant> my $old_kanji = $query->param('kanji');
>
> if ($RECODE_QUERY) {
> $query->param('kanji', $kanji);
> }
>
> if ($DELETE_QUERY) {
> $query->delete('kanji');
> }
>
> ($text = <<EOF) =~ s/^\s*//mg;
> <pre> The data received was:
> ORIGINAL: $old_kanji
> DECODED: $kanji
> </pre>
> EOF
>
>
> print $text;
> }
>
> my $qs = '' eq $ENV{QUERY_STRING} ? '' :
> "?$ENV{QUERY_STRING}" ;
>
> print $query->start_form(
> -method => 'POST',
> -action => $query->url() . $qs );
>
> print $query->textarea(
> -name => 'kanji',
> -default => $kanji,
> );
>
> print $query->submit();
>
> print $query->end_form();
>
>
> print $query->end_html;
>
harryfmudd [AT] comcast [DOT] net Guest
-
Mumia W. #9
Re: CGI.pm: encoding problems
harryfmudd [AT] comcast [DOT] net wrote:
Quite likely. I have perl 5.8.4 and CGI.pm 3.04 (old). That's probably> Mumia W. wrote:>>> [...]
>> This program should demonstrate the issue and workarounds:
> Interesting. I found that the following program blew up on the
> Encode::decode, but that $kanji_orig appeared to display correctly.
> Also, the 'kanji' element displayed correctly even if I did not specify
> a query string. Do we have a version problem? [...]
why Dr. Ruud's advice of moving the "use" statements around didn't work
for me.
So it seems that re-decoding the data is a bad idea with newer versions
of the module. As you were everybody.
Mumia W. Guest
-
Ben Bullock #10
Re: CGI.pm: encoding problems
If anyone cares, the original program is on the web as follows:
[url]http://www.sljfaq.org/cgi/numbers.cgi[/url]
[url]http://www.sljfaq.org/cgi/kanjinumbers.cgi[/url]
The bottom one was the one with the problems.
Ordering the statements correctly solved the problem with the encoding, but
some problems remained.
Thanks for the help.
Ben Bullock Guest
-
Mumia W. #11
Re: CGI.pm: encoding problems
Ben Bullock wrote:
I'm not having any problems with it. Am I supposed to?> If anyone cares, the original program is on the web as follows:
> [...]
> [url]http://www.sljfaq.org/cgi/kanjinumbers.cgi[/url]
>
> [...]
Mumia W. Guest
-
Ben Bullock #12
Re: CGI.pm: encoding problems
"Mumia W." <mumia.w.18.spam+nospam.usenet@earthlink.net> wrote in message
news:Pr%jg.13048$921.9261@newsread4.news.pas.earth link.net...No, not really. But one interesting problem occurs if you type in numbers> Ben Bullock wrote:>>> If anyone cares, the original program is on the web as follows:
>> [...]
>> [url]http://www.sljfaq.org/cgi/kanjinumbers.cgi[/url]
>>
>> [...]
> I'm not having any problems with it. Am I supposed to?
like this:
一ニ三四五xyz
then the xyz is preserved after you convert. If you go the other way round,
12345xyz
then the xyz disappears. The code is exactly the same going either way, so
you tell me why that should be.
Ben Bullock Guest
-
Mumia W. #13
Re: CGI.pm: encoding problems
Ben Bullock wrote:
I don't know, but perhaps you can create your own character class that> "Mumia W." <mumia.w.18.spam+nospam.usenet@earthlink.net> wrote in
> message news:Pr%jg.13048$921.9261@newsread4.news.pas.earth link.net...>>> Ben Bullock wrote:>>>>> If anyone cares, the original program is on the web as follows:
>>> [...]
>>> [url]http://www.sljfaq.org/cgi/kanjinumbers.cgi[/url]
>>>
>>> [...]
>> I'm not having any problems with it. Am I supposed to?
> No, not really. But one interesting problem occurs if you type in
> numbers like this:
>
> 一ニ三四五xyz
>
> then the xyz is preserved after you convert. If you go the other way round,
>
> 12345xyz
>
> then the xyz disappears. The code is exactly the same going either way,
> so you tell me why that should be.
matches only numbers from the various languages you're using.
Mumia W. Guest



Reply With Quote

