Message catalogs (I18N) overnight hack...

Ask a Question related to Ruby, Design and Development.

  1. #1

    Default Re: Message catalogs (I18N) overnight hack...

    I would side Josef. Do not change print, puts, etc. It may have some
    weird side effects.

    > - For "literal" (untranslated) output, I'm exposing the aliases
    > lputs, lprint, and lprintf
    I would do it the other way around. Keep print, etc. unchanged and add
    tprint, etc. for translated printing.

    Guillaume.

    On Sun, 2003-06-29 at 18:46, Hal E. Fulton wrote:
    > ----- Original Message -----
    > From: "Josef 'Jupp' Schugt" <jupp@gmx.de>
    > To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
    > Sent: Sunday, June 29, 2003 2:30 PM
    > Subject: Re: Message catalogs (I18N) overnight hack
    >
    > [snip]
    >
    > Thanks for these notes. But my feeble knowledge of
    > German is not the point here.
    >
    > Your mention of the decimal point and similar issues
    > are things I had not thought of, however.
    >
    > > Technical side: According to me redefining 'print' is an unacceptable
    > > violation of the principle of least surprise. The command reads
    > > 'print' as in 'print that book' not as in 'publish a translation of
    > > that book'.
    >
    > I see what you mean. But if one can turn this on/off dynamically,
    > it may not be so bad.
    >
    > > What about this?
    > >
    > > String.locale('C', 'de_DE')
    > > puts 'Alternatives exist'.l10n
    > >
    > > First argument of String.locale is language strings are expected to
    > > be in ('C' is default), second is target language).
    >
    > I see the advantages of this. But one of my goals
    > was to be able to internationalize without too many
    > changes to the original program. And another is to
    > preserve readability and simplicity -- I dislike
    > having to call a method every time I use a string.
    >
    > It remains to be seen whether this will be released,
    > anyway. There are issues to resolve.
    >
    > Thanks,
    > Hal
    >
    > --
    > Hal Fulton
    > [email]hal9000@hypermetrics.com[/email]
    >
    >


    Guillaume Marcais Guest

  2. Similar Questions and Discussions

    1. Constant actionscript language changes - old code obolete overnight
      Hi I am fairly new too Flex (about 2 months playing around). Before I invest too much time in this I have a question..... It seems to me that...
    2. Mortgage repaid overnight :)
      <HTML> <HEAD> <META NAME="GENERATOR" Content="Microsoft DHTML Editing Control"> <TITLE></TITLE> </HEAD> <BODY> <P>I don't mind someone paying...
    3. i18n
      Hi! Are there any turorials how to internationalize a ruby program? Any help is welcome. thx Gergo -- +----------+ |...
    4. gettext i18n
      Hi, Thanks for the reply, but it doesn't work this way either. in /www/locale I do have de/ LC_MESSAGES/ messages.mo de_DE/ LC_MESSAGES/
    5. Message catalogs (I18N) overnight hack
      Hi, everyone. I've been thinking about message catalogs (as in I18N) for the last day or so. I've hacked something together that works like...
  3. #2

    Default Re: Message catalogs (I18N) overnight hack...

    Saluton!
    > > Technical side: According to me redefining 'print' is an unacceptable
    > > violation of the principle of least surprise. The command reads
    > > 'print' as in 'print that book' not as in 'publish a translation of
    > > that book'.
    >
    > I see what you mean. But if one can turn this on/off dynamically,
    > it may not be so bad.
    Suggestion:
    Do not have the require statement redefine 'puts' but use a 'locale'
    statement that has this calling convention:

    locale('de_AT', 'fr_FR')
    locale('pt_BR')
    locale

    Version 1: Source is German as used in Austria (Austrians do not use
    the same vocabulary as Germans), target is French as used
    in France (differs from French as used in Canada)

    Version 2: Source is Portugese as used in Brazil and so is target.
    This is to deal with l10n issues like numbers and the
    like.

    Version 3: Source is in 'C' locale and so is target.

    > And another is to preserve readability and simplicity -- I dislike
    > having to call a method every time I use a string.
    IMHO redefining a built-in command's effect on a built-in data type
    makes code unreadable.

    Once a linear algebra teacher did proof that humans have severe
    problems if their expectations are not met.

    Usually problems did read 'Sei K ein Koerper, sei V ein
    Vektorraum...' which means 'K be a commutative field, V be a vector
    space'.The mathematical details only play a role as far as a
    field and a vector space are *completely* different objects and the
    use of the letters was mnemonic.

    One day he did write 'Sei K ein Vektorraum, sei V ein Koerper'. You
    thing only a handful of guys did mange to solve the problems? You are
    right.

    Gis,

    Josef 'Jupp' Schugt
    --
    Someone even submitted a fingerprint for Debian Linux running on the
    Microsoft Xbox. You have to love that irony :).
    -- Fyodor on [email]nmap-hackers@insecure.org[/email]

    Josef 'Jupp' Schugt Guest

  4. #3

    Default Re: Message catalogs (I18N) overnight hack...

    ----- Original Message -----
    From: "Josef 'Jupp' Schugt" <jupp@gmx.de>
    To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
    Sent: Monday, June 30, 2003 3:40 PM
    Subject: Re: Message catalogs (I18N) overnight hack...

    > > And another is to preserve readability and simplicity -- I dislike
    > > having to call a method every time I use a string.
    >
    > IMHO redefining a built-in command's effect on a built-in data type
    > makes code unreadable.
    I see your point, and someone else expressed the same opinion.

    But I stand by this principle. I like it personally, whether
    others do or not.

    Personally I don't want to sprinkle 'tputs' and such throughout
    my code. I want to modify my code as little as possible.

    Besides, a name like tputs will lead the reader subconsciously to
    think that something unusual is happening, when it is really (in
    effect) just a puts. Sure, it's doing a translation, but I want
    the programmer NOT to think about translation as he looks at
    the code.

    In other words, I want translation to be as transparent and
    behind-the-scenes as possible.

    It's not true AOP, but it has an AOP-like flavor to me.

    At any rate, the library may not ever be finished, as no one
    has expressed real interest in it (other than in changing the
    design). I don't want to make *fundamental* changes in my
    design without a reason that I agree with; if I did, it wouldn't
    be my project anymore. :) If someone likes the overall idea, but
    dislikes my concept, then they can write their own code.

    In fact, I have discovered there are similar libraries already
    in the RAA -- they differ from mine only in having a larger,
    more intrusive footprint in the code (and obviously they have
    more features).

    Cheers,
    Hal


    Hal E. Fulton Guest

  5. #4

    Default Re: Message catalogs (I18N) overnight hack...

    Saluton!
    > Besides, a name like tputs will lead the reader subconsciously to
    > think that something unusual is happening, when it is really (in
    > effect) just a puts.
    I can only speak of me but my concept of localization is 'translate
    the text and *then* output the result' where translating is the hard
    task that needs extra care and the output simply works. To reflect
    this concept there has to be a function that translates the text.

    Besides that intellectual problem there is also a practical one: If
    the output of localized and unlocalized texts precisely looks the
    same this can result in very hard to find errors. If you use a
    function call for translation the forgotten 'require' automatically
    results in an undefined function error.
    > In other words, I want translation to be as transparent and
    > behind-the-scenes as possible.
    Maybe some day I will understand why 'opaque' and 'transparent' are
    synonyms in the field of programming.

    Gis,

    Josef 'Jupp' Schugt
    --
    Someone even submitted a fingerprint for Debian Linux running on the
    Microsoft Xbox. You have to love that irony :).
    -- Fyodor on [email]nmap-hackers@insecure.org[/email]

    Josef 'Jupp' Schugt Guest

  6. #5

    Default Re: Message catalogs (I18N) overnight hack...

    On Tue, Jul 01, 2003 at 06:10:51AM +0900, Hal E. Fulton wrote:
    > I see your point, and someone else expressed the same opinion.
    >
    > But I stand by this principle. I like it personally, whether
    > others do or not.
    >
    > Personally I don't want to sprinkle 'tputs' and such throughout
    > my code. I want to modify my code as little as possible.
    >
    > Besides, a name like tputs will lead the reader subconsciously to
    > think that something unusual is happening, when it is really (in
    > effect) just a puts. Sure, it's doing a translation, but I want
    > the programmer NOT to think about translation as he looks at
    > the code.
    >
    > In other words, I want translation to be as transparent and
    > behind-the-scenes as possible.
    Maybe such a low-level transparent translation belongs either in String, or
    in the IO class, rather than overriding all the various Kernel#puts-type
    methods.

    Then you could make it explicit:

    $defout = TranslatingIO.new("en","de",STDOUT)
    puts "Hello world" # >> "Hallo Weldt" or whatever

    It might then be more general - for example it could be used on StringIO
    objects - but less intrusive.

    But I take your point that it's your project so it's up to you to design it
    how you like :-)

    I don't think I'd use the proposed style of library. Firstly it would be
    very difficult to ensure complete coverage of all strings having
    translations [unless there was a utility to parse the Ruby source to extract
    all strings, and tie them up against all translations, and highlight any
    missing ones]

    Also I'd be a bit concerned about phrases where the word order might need to
    be different in different languages:

    printf("I gave the %s to %s", thing, recipient)

    It could perhaps use tags which were automatically stripped out in
    translation:

    printf("I gave the <THING:%s> to <RECIP:%s>", thing, recipient)

    Also, do we worry about languages where word endings change dependent on
    function? Languages which require noun.capitalize ?

    Regards,

    Brian.

    Brian Candler Guest

  7. #6

    Default Re: Message catalogs (I18N) overnight hack...

    Hi,

    At Tue, 1 Jul 2003 19:01:06 +0900,
    Brian Candler wrote:
    > Also I'd be a bit concerned about phrases where the word order might need to
    > be different in different languages:
    >
    > printf("I gave the %s to %s", thing, recipient)
    >
    > It could perhaps use tags which were automatically stripped out in
    > translation:
    >
    > printf("I gave the <THING:%s> to <RECIP:%s>", thing, recipient)
    printf("I gave the %1$s to %2$s", thing, recipient)

    --
    Nobu Nakada

    nobu.nokada@softhome.net Guest

  8. #7

    Default Re: Message catalogs (I18N) overnight hack...

    ----- Original Message -----
    From: "Brian Candler" <B.Candler@pobox.com>
    To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
    Sent: Tuesday, July 01, 2003 5:01 AM
    Subject: Re: Message catalogs (I18N) overnight hack...

    > Maybe such a low-level transparent translation belongs either in String,
    or
    > in the IO class, rather than overriding all the various Kernel#puts-type
    > methods.
    Well, that's a thought. I'm not sure I see all of the implications
    at this hour of the morning. Too much blood in my caffeine stream.
    > But I take your point that it's your project so it's up to you to design
    it
    > how you like :-)
    Ha... well, not all design changes are created equal.

    If I'm designing a horse, and someone says, "If you added a horn, you
    could have a unicorn" -- well, that is interesting. But if someone
    says, "Drop the hooves and hair, skip the mammal bit, change the legs
    and add four more, and make it ocean-living -- you could have an
    octopus!" -- well, that is different. :)
    > I don't think I'd use the proposed style of library. Firstly it would be
    > very difficult to ensure complete coverage of all strings having
    > translations [unless there was a utility to parse the Ruby source to
    extract
    > all strings, and tie them up against all translations, and highlight any
    > missing ones]
    What would partially address this would be the warning and logging
    features I mentioned (not implemented).

    Logging would capture all strings as they were output, for later
    translation.
    Warning would print an explicit warning when an untranslated string was
    found. These would of course have to be turned on explicitly. Then you would
    just need good code coverage, as from a set of testcases.
    > Also I'd be a bit concerned about phrases where the word order might need
    to
    > be different in different languages:
    >
    > printf("I gave the %s to %s", thing, recipient)
    I address this issue. The prepared message can contain %n markers like
    %1, %2, %3... in matching, these basically become (.*?) patterns.

    Some of my contrived examples dealt with this issue, like the "User foo..."
    example.
    > Also, do we worry about languages where word endings change dependent on
    > function? Languages which require noun.capitalize ?
    Word endings are an issue. Sometimes people store plurals separately, e.g.,
    file=>Datei, files=>Dateien and so on. This is an area which pushes the
    limits of my knowledge both of I18N programming and languages in general.

    As for capitalizing... hmm. I don't offhand see where anything would ever
    have to be capitalized that was not hardcoded in the translated message.

    But there are several little issues like that, that I'm not addressing yet.
    Someone mentioned the decimal point issue. I hate to think about that.

    Hal


    Hal E. Fulton Guest

  9. #8

    Default Re: Message catalogs (I18N) overnight hack...

    ----- Original Message -----
    From: <nobu.nokada@softhome.net>
    To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
    Sent: Tuesday, July 01, 2003 7:50 AM
    Subject: Re: Message catalogs (I18N) overnight hack...

    > At Tue, 1 Jul 2003 19:01:06 +0900,
    > Brian Candler wrote:
    > > Also I'd be a bit concerned about phrases where the word order might
    need to
    > > be different in different languages:
    > >
    > > printf("I gave the %s to %s", thing, recipient)
    > >
    > > It could perhaps use tags which were automatically stripped out in
    > > translation:
    > >
    > > printf("I gave the <THING:%s> to <RECIP:%s>", thing, recipient)
    >
    > printf("I gave the %1$s to %2$s", thing, recipient)
    >
    > --
    > Nobu Nakada
    Yes, that is how I have done it in C on AIX. :)

    But I simplified it in my code -- it does not require
    any change to printf or to its format string. I use
    markers in the translated strings themselves:

    "I gave the %1 to %2." -->
    "Ich habe den %2 zu %1 gegeben." # Word ending issues!

    Apparently you know something about I18N. How are issues
    like word endings usually handled? With different messages??

    For example, "I saw the %1" in English -- in German,
    "Ich sah den Wagen" (I saw the car) but "Ich sah die Bruecke"
    (I saw the bridge).

    My German is flawed, but you see what I am asking about
    den/die I think.

    Hal

    --
    Hal Fulton
    [email]hal9000@hypermetrics.com[/email]




    Hal E. Fulton Guest

  10. #9

    Default Re: Message catalogs (I18N) overnight hack...

    On Tue, Jul 01, 2003 at 10:25:51PM +0900, Hal E. Fulton wrote:
    > But there are several little issues like that, that I'm not addressing yet.
    > Someone mentioned the decimal point issue. I hate to think about that.
    And then you get into date formats (US middle-endian), three-letter
    abbreviations for month names, ... ugh.

    I'm sure these problems must have been gone through before. I noted a while
    ago under FreeBSD that gmake had some strange dependencies:

    # pkg_add gmake-3.79.1_1.tgz
    pkg_add: could not find package libiconv-1.7_5 !
    pkg_add: could not find package expat-1.95.2 !
    pkg_add: could not find package gettext-0.11.1_3 !

    It seemed strange to me that a 'make' utility would have a dependency on an
    XML parser. It turns out that gettext is the GNU way to deal with this:

    [man gettext]

    DESCRIPTION
    The gettext program translates a natural language message
    into the user's language, by looking up the translation in
    a message catalog.

    I imagine that the source format for these message catalogues is XML, and
    hence the requirement on an XML parser (although IMO gettext should be split
    into two: a client side which just reads the message catalogues, which
    appear to be in a binary format, and a -devel package which includes the
    XML-to-message-catalogue tools. But I digress).

    I note there's a ruby-gettext library already. A quick browse and it seems
    to require indexing by message-ID rather than the original "untranslated"
    text.

    That approach makes logical sense to me - decouple *all* language text from
    the source, rather than have language A in the source and languages A,B,C,D
    in the translation database. (Otherwise, whenever you change a message in
    the source you'd have to update the corresponding language A entry - a
    violation of the DRY principle)

    Cheers,

    Brian.

    Brian Candler Guest

  11. #10

    Default Re: Message catalogs (I18N) overnight hack...

    ----- Original Message -----
    From: "Brian Candler" <B.Candler@pobox.com>
    To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
    Sent: Tuesday, July 01, 2003 8:48 AM
    Subject: Re: Message catalogs (I18N) overnight hack...

    > It seemed strange to me that a 'make' utility would have a dependency on
    an
    > XML parser. It turns out that gettext is the GNU way to deal with this:
    >
    > [man gettext]
    I know about this, but I still like the low-impact
    approach. Obviously the message catalog prep takes
    some time/effort; but I like the fact that most/many
    apps will only require *two* extra lines of code,
    and no other changes in the code itself.
    > That approach makes logical sense to me - decouple *all* language text
    from
    > the source, rather than have language A in the source and languages
    A,B,C,D
    > in the translation database. (Otherwise, whenever you change a message in
    > the source you'd have to update the corresponding language A entry - a
    > violation of the DRY principle).
    Now THAT is a good point.

    Ooh, a violation of DRY. Don't tell Dave!!

    On the other hand, does gettext have the notion of "default"
    text? I think it must. (What is used if *no* message catalog
    can be found?) That would still have to be kept synchronized
    between the source and the catalogs.

    Hal


    Hal E. Fulton Guest

  12. #11

    Default Re: Message catalogs (I18N) overnight hack...

    ----- Original Message -----
    From: <nobu.nokada@softhome.net>
    To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
    Sent: Tuesday, July 01, 2003 10:35 AM
    Subject: Re: Message catalogs (I18N) overnight hack...

    > Hi,
    >
    > At Tue, 1 Jul 2003 22:42:13 +0900,
    > Hal E. Fulton wrote:
    > > > printf("I gave the %1$s to %2$s", thing, recipient)
    > >
    > > Yes, that is how I have done it in C on AIX. :)
    > >
    > > But I simplified it in my code -- it does not require
    > > any change to printf or to its format string. I use
    > > markers in the translated strings themselves:
    >
    > Change? Although I may not correctly understand what you mean,
    > it's already even in 1.6.
    Forgive my ignorance. I didn't know that printf
    supported numbered parameters.
    > I know just gettext can handle plural forms, but nothing about
    > this case. How do German solve this issue in gettext?
    I have no idea. Perhaps just by careful wording.

    Hal


    Hal E. Fulton Guest

  13. #12

    Default Re: Message catalogs (I18N) overnight hack...

    Hi,

    On Tue, 1 Jul 2003 22:48:21 +0900
    Brian Candler <B.Candler@pobox.com> wrote:
    > # pkg_add gmake-3.79.1_1.tgz
    > pkg_add: could not find package libiconv-1.7_5 !
    > pkg_add: could not find package expat-1.95.2 !
    > pkg_add: could not find package gettext-0.11.1_3 !
    >
    > It seemed strange to me that a 'make' utility would have a dependency on an
    > XML parser. It turns out that gettext is the GNU way to deal with this:
    >
    > [man gettext]
    >
    > DESCRIPTION
    > The gettext program translates a natural language message
    > into the user's language, by looking up the translation in
    > a message catalog.
    >
    > I imagine that the source format for these message catalogues is XML, and
    > hence the requirement on an XML parser (although IMO gettext should be split
    > into two: a client side which just reads the message catalogues, which
    > appear to be in a binary format, and a -devel package which includes the
    > XML-to-message-catalogue tools. But I digress).
    expat is used in xgettext for glade only.
    xgettext is the tool which extract translatable strings from given source codes.

    --
    .:% Masao Mutoh<mutoh@highway.ne.jp>

    Masao Mutoh Guest

  14. #13

    Default Re: Message catalogs (I18N) overnight hack...

    Hi,

    On Wed, 2 Jul 2003 02:39:59 +0900
    "Hal E. Fulton" <hal9000@hypermetrics.com> wrote:
    > ----- Original Message -----
    > From: "Brian Candler" <B.Candler@pobox.com>
    > To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
    > Sent: Tuesday, July 01, 2003 8:48 AM
    > Subject: Re: Message catalogs (I18N) overnight hack...
    >
    > On the other hand, does gettext have the notion of "default"
    > text? I think it must. (What is used if *no* message catalog
    > can be found?) That would still have to be kept synchronized
    > between the source and the catalogs.
    >
    > Hal
    Usually, gettext uses a English message as a msgid.
    And msgid is written in the source code.

    the example of Ruby-GetText-Package below:

    puts _("Hello World") # "Hello World" is a msgid

    If the localized-message can't be find,
    msgid is used as the message.

    puts _("Hello Wold") #=> "KONNICHIWA SEKAI" #Found(Japanese)
    puts _("Hello Wold") #=> "Hello World" #Not Found


    For synchronizing the sources and catalogs,
    GNU GetText provides some tools like as msgmerge.

    BTW,
    The manual of GNU GetText may help you.
    [url]http://www.gnu.org/manual/gettext/html_chapter/gettext_toc.html[/url]

    --
    .:% Masao Mutoh<mutoh@highway.ne.jp>

    Masao Mutoh Guest

  15. #14

    Default Re: Message catalogs (I18N) overnight hack...

    Saluton!

    * Brian Candler; 2003-07-01, 12:05 UTC:
    > Also I'd be a bit concerned about phrases where the word order
    > might need to be different in different languages:
    >
    > printf("I gave the %s to %s", thing, recipient)
    >
    > It could perhaps use tags which were automatically stripped out in
    > translation:
    >
    > printf("I gave the <THING:%s> to <RECIP:%s>", thing, recipient)
    >
    > Also, do we worry about languages where word endings change
    > dependent on function? Languages which require noun.capitalize ?
    You forgot something: Word order

    printf("%s of %s", 'the house', 'a friend')
    tomodachi no ie

    I did use latin transcription of Japanese in order to make it
    readable. This involves *two* changes:

    a) dropping articles 'the' and 'a'
    b) reversing word order

    'tomodachi' means 'friend' while 'ie' means 'house'.

    This problem is the reason why Microsoft did introduce %1, %2, ... in
    C#. Ruby's "#{}" is even better.

    German does use capitalization of nouns and there are quite a number
    of native speakers of that language.

    Gis,

    Josef 'Jupp' Schugt
    --
    Someone even submitted a fingerprint for Debian Linux running on the
    Microsoft Xbox. You have to love that irony :).
    -- Fyodor on [email]nmap-hackers@insecure.org[/email]

    Josef 'Jupp' Schugt Guest

  16. #15

    Default Re: Message catalogs (I18N) overnight hack...

    ----- Original Message -----
    From: "Josef 'Jupp' Schugt" <jupp@gmx.de>
    To: "ruby-talk ML" <ruby-talk@ruby-lang.org>
    Sent: Wednesday, July 02, 2003 6:03 AM
    Subject: Re: Message catalogs (I18N) overnight hack...

    > > It could perhaps use tags which were automatically stripped out in
    > > translation:
    > >
    > > printf("I gave the <THING:%s> to <RECIP:%s>", thing, recipient)
    > >
    > > Also, do we worry about languages where word endings change
    > > dependent on function? Languages which require noun.capitalize ?
    >
    > You forgot something: Word order
    No, I think you misunderstood. Word order was
    the reason he suggested the labels in the first
    place.

    Hal

    Hal E. Fulton Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139