Ask a Question related to Ruby, Design and Development.
-
Eric-Roger Bruecklmeier #1
A question about Charsets
Hello Rubyists,
in a new application i have to read dBase III Files which were generated
in a DOS enviroment. How can i convert the Data into Windows codepages
from ruby?
Thanks for any hints.
Eric.
Eric-Roger Bruecklmeier Guest
-
multibyte charsets?
I'm currious - what multibyte charsets does MySQL support? utf8 is a multibyte charset, I think, as is sjis and maybe ujis. What other ones... -
Charsets!
Without any experience outside of regular European / Unicode charsets, I'm a bit stuck with a PHP app that needs to be able to handle foreign... -
problems with charsets
I've a long csv-file that needs to be imported into a sql-database. My problem now is, that I dont know the charset this file is encoded in and... -
#9188 [Com]: it seems that mysql-module doesn't support any charsets except latin1
ID: 9188 Comment by: makisfm at mailbox dot gr Reported By: changx at www-test dot ygi dot edu dot cn Status: ... -
Josef 'Jupp' SCHUGT #2
Re: A question about Charsets
Hi!
* Eric-Roger Bruecklmeier; 2003-11-20, 20:30 UTC:Map each Byte to the corresponding one using a hash. You need the> in a new application i have to read dBase III Files which were generated
> in a DOS enviroment. How can i convert the Data into Windows codepages
> from ruby?
codepages.
DOS codepages are listed here:
[url]http://dwd.da.ru/charsets/index.html#dos-specific[/url]
Windows codepages are listed here:
[url]http://dwd.da.ru/charsets/index.html#windows-specific[/url]
The mapping is troublesome because of two reasons: First of all all
DOS characters have Windows standard codepage counterparts (greek
letters for example) and 0..31 can be either control chars or
pictograms.
So the best you can do is use the above tables and create Arrays or
hashes that do the mapping.
For cp850 and cp866 you can use iconv, otherwise you can use recode.
This can be done from Ruby but it requires the appropriate software
being in place. Bad if you want software to be portable.
Josef 'Jupp' Schugt
--
.-------.
message > 100 kB? / | |
sender = spammer? / | R.I.P.|
text = spam? / ___| |___
Josef 'Jupp' SCHUGT Guest
-
Eric-Roger Bruecklmeier #3
Re: A question about Charsets
Josef 'Jupp' SCHUGT schrieb:
That's the way i do it now, but it's slow :-(>>>>in a new application i have to read dBase III Files which were generated
>>in a DOS enviroment. How can i convert the Data into Windows codepages
>>from ruby?
>
> Map each Byte to the corresponding one using a hash. You need the
> codepages.
Exactly that's the problem, the software has to be portable :-(>
> For cp850 and cp866 you can use iconv, otherwise you can use recode.
> This can be done from Ruby but it requires the appropriate software
> being in place. Bad if you want software to be portable.
Thanks anyhow!
C YA
Eric.
Eric-Roger Bruecklmeier Guest
-
Josef 'Jupp' SCHUGT #4
Re: A question about Charsets
Hi!
* Eric-Roger Bruecklmeier; 2003-11-21, 13:01 UTC:When I find my code in tons of trouble, friends and collegues come to> Josef 'Jupp' SCHUGT schrieb:>>>>>> in a new application i have to read dBase III Files which were
>>> generated in a DOS enviroment. How can i convert the Data into
>>> Windows codepages from ruby?
>> Map each Byte to the corresponding one using a hash. You need the
>> codepages.
> That's the way i do it now, but it's slow :-(
me, speaking words of wisdom: write in C. (Sung to: 'Let it be' by
the Beatles).
Speedup calls for a C extension. I'll skip the 'intro to C
extensions' stuff (Thomas and Hunt have that) and directly go to the
implementation of the mapping algorithm.
Suppose s points to array of char to be converted. Suppose you simply
need to map code 0 to 1 and vice versa. In that case use this:
for (p = s; *p; p++) {
switch (*p) {
case 0: *p = 1; break;
case 1: *p = 0; break;
}
}
You don't need to map any char in the ASCII printable range which
saves a lot of coding. The resulting code is extremely fast.
The above code is extremely portable. An additional advantage: You> Exactly that's the problem, the software has to be portable :-(
can give the codes in decimal or hexadecimal values.
For 16 Bit codes things are more complicated. You then need
for (p = s; *p; p+=2) {
switch (*p << 8 + *(p+1)) { /* or the other way round, depends */
case 0: *p = 1; break;
case 1: *p = 0; break;
}
}
and lots of additional cases.
Viel Erfolg,
Josef 'Jupp' Schugt
--
.-------.
message > 100 kB? / | |
sender = spammer? / | R.I.P.|
text = spam? / ___| |___
Josef 'Jupp' SCHUGT Guest



Reply With Quote

