Ask a Question related to Ruby, Design and Development.

  1. #1

    Default utf8 -> latin2



    Hi,

    How can I convert utf-8 encoded strings to latin-2?
    I have tried it using libuconv with little success:

    require 'uconv'

    class String
    def un_utf8
    Uconv.u8tou16(self).gsub(/\000/, '')
    end

    def to_utf8
    tmp = ""
    self.each_byte { |b|
    tmp += b.chr + "\000"
    }
    Uconv.u16tou8(tmp)
    end
    end

    This program is ugly, and does not exactly what I want.
    u8tou16 generates a string with 16 bit long characters,
    for example "test".un_utf8 == "t\000e\000s\000t\000".
    gsub clears the unnecessery "\000" characters from
    the string. But there are characters in Hungarian,
    that has non-zero second byte in the output of the
    u8tou16, so they fail to convert. Anyway this is an
    ugly hack.

    How is it done nicely?

    --
    bSanyI

    Bedo Sandor Guest

  2. Similar Questions and Discussions

    1. Problem with DBD::DB2 and UTF8.
      Folks, Here is my setup: DB2 V8.2 ESE running on Linux with utf-8 code set. Database territory = us...
    2. utf8 & Jos? Feliciano
      MULTILANGUAGE AND UTF8 Here it is... when I import "Jos? Feliciano" into my application... the MySQL db stores it as the string you see on the...
    3. UTF8 encode
      Hello there, I need help! :-) How do we pass variables to php in UTF 8 encode? I have one php, whicth sends emails, and the subject of them...
    4. problem with utf8
      hi people, i have a script : <? $num=0; $backendURL = "http://rezo.net/backend/afp"; $limit = 1; // nombre maxi de news à afficher. $file...
    5. ISO-Latin and UTF8
      "Philip M. Gollucci" <pgollucci@ejpress.com> wrote in... ISO-LATIN. You might want to try the newer Encode in Perl 5.8.1. my $utf8_line =...
  3. #2

    Default Re: utf8 -> latin2

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    On Friday 14 November 2003 7:46 am, Bedo Sandor wrote:
    > Hi,
    >
    > How can I convert utf-8 encoded strings to latin-2?
    > I have tried it using libuconv with little success:
    >
    > require 'uconv'
    >
    > class String
    > def un_utf8
    > Uconv.u8tou16(self).gsub(/\000/, '')
    > end
    >
    > def to_utf8
    > tmp = ""
    > self.each_byte { |b|
    > tmp += b.chr + "\000"
    > }
    > Uconv.u16tou8(tmp)
    > end
    > end
    >
    > This program is ugly, and does not exactly what I want.
    > u8tou16 generates a string with 16 bit long characters,
    > for example "test".un_utf8 == "t\000e\000s\000t\000".
    > gsub clears the unnecessery "\000" characters from
    > the string. But there are characters in Hungarian,
    > that has non-zero second byte in the output of the
    > u8tou16, so they fail to convert. Anyway this is an
    > ugly hack.
    >
    > How is it done nicely?
    I think the iconv module handles this nicely:

    require 'iconv'
    Iconv.conv("utf-8","latin2","this is a test")

    - --
    Wesley J. Landaker - [email]wjl@icecavern.net[/email]
    OpenPGP FP: 4135 2A3B 4726 ACC5 9094 0097 F0A9 8A4C 4CD6 E3D2

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.3 (GNU/Linux)

    iD8DBQE/tPU68KmKTEzW49IRAj8nAJ9QgkOiYp4UKf7gFBFrUW6qm0NW8g CfXMIy
    fz4uq1FmNXxDvhkGSdyWJIA=
    =ntbU
    -----END PGP SIGNATURE-----


    Wesley J Landaker Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139