[newbie] upper to lower first letter of a word

Ask a Question related to Ruby, Design and Development.

  1. #1

    Default [newbie] upper to lower first letter of a word

    Recently, i get a vintage list (more than 500 items) with poor typo, for
    example, i've :

    Côte de beaune-villages

    instead of :

    Côte de Beaune-Villages

    Crémant d'alsace

    instead of :

    Crémant d'Alsace

    i wonder of the way to change lower to upper case and also of

    a regex able to do the trick.

    something like :

    every letter following a " ", "-" or "'" should be upper if not
    belonging to a black list of words :

    black_list = %w{d de du la le sec sur entre etc...}

    --
    Yvon
    Yvon Thoraval Guest

  2. Similar Questions and Discussions

    1. No Pcase? how do I capitolize the first letter of aword, with the rest of the word lower case?
      I know I can go one way or another, Ucase(var) all upppercase, or Lcase(var) all lowercase, but how can I get a Propercase? I've seen this in VB,...
    2. Line breaks, word and letter spacing change in the pdf
      Hi Everyone, I have a 1,000 page (black and white, mostly text and line drawings) publication originally formatted in WordPerfect, and have...
    3. Extra Letter in Running Head when converted from Word
      I am converting a document from Word to PDF. The document has running headers, i.e. like a dictionary -- it picks up the section number from the page...
    4. String formatting function - First char Upper, rest lower
      Hi, Newbie question. Does anyone know of a function or script that will capitalize the first char and lowercase the remaining chars of each...
    5. Regexp help - upper to lower ONLY matching this pattern
      Hi, I have a bunch of variables that I need changed. Basicly, anywhere i see something like this: $test{'HI_THERE'} I need tranlated to this:...
  3. #2

    Default Re: [newbie] upper to lower first letter of a word

    On Tue, Sep 23, 2003 at 06:29:58PM +0200, Yvon Thoraval wrote:
    > Recently, i get a vintage list (more than 500 items) with poor typo, for
    > example, i've :
    >
    > Côte de beaune-villages
    >
    > instead of :
    >
    > Côte de Beaune-Villages
    >
    > Crémant d'alsace
    >
    > instead of :
    >
    > Crémant d'Alsace
    >
    > i wonder of the way to change lower to upper case and also of
    >
    > a regex able to do the trick.
    >
    > something like :
    >
    > every letter following a " ", "-" or "'" should be upper if not
    > belonging to a black list of words :
    >
    > black_list = %w{d de du la le sec sur entre etc...}
    string.gsub!(/\b[a-z]+/) { |w| black_list.include?(w) ? w : w.capitalize }

    -Mark
    Mark J. Reed Guest

  4. #3

    Default Re: [newbie] upper to lower first letter of a word

    Mark J. Reed <markjreed@mail.com> wrote:
    > string.gsub!(/\b[a-z]+/) { |w| black_list.include?(w) ? w : w.capitalize }
    a lot of tanxs °;)
    --
    Yvon
    Yvon Thoraval Guest

  5. #4

    Default Re: [newbie] upper to lower first letter of a word

    Yvon Thoraval <yvon.thoravallist@-SUPPRIMEZ-free.fr.invalid> wrote:
    >
    > > string.gsub!(/\b[a-z]+/) { |w| black_list.include?(w) ? w : w.capitalize }
    >
    > a lot of tanxs °;)
    it seems, it's a little bit trickier because accentuated characters are
    taken as \b for example :

    Vosne-romanée
    becomes :
    Vosne-RomanéE

    then instead of \b i would have to exclude a list of chars :
    [à|ä|â|é|è|ê|î|ö|ô|ü|ù]
    --
    Yvon
    Yvon Thoraval Guest

  6. #5

    Default Re: [newbie] upper to lower first letter of a word

    On Tue, Sep 23, 2003 at 07:23:52PM +0200, Yvon Thoraval wrote:
    > it seems, it's a little bit trickier because accentuated characters are
    > taken as \b
    Really? That's arguably a bug. What character encoding are you using?

    Accented letters should be in \w, not \W, and therefore the
    space between one and an adjacent letter should not match \b.
    But Ruby regexes may be ASCII-only, and even if not, they're probably
    Latin-1-only. So, for instance, they wouldn't work on UTF-8 strings.
    > Vosne-romanée
    > becomes :
    > Vosne-RomanéE
    >
    > then instead of \b i would have to exclude a list of chars :
    > [à|ä|â|é|è|ê|î|ö|ô|ü|ù]
    First, you don't need the pipes (|'s) there. Pipes are for
    alternation without the [...]; basically, [abc] is short for
    (a|b|c). The pipe form is most useful when the alternatives are
    not all single characters, for instance (alfa|bravo|charlie).

    I'm not sure whether the exclude-list or the include-list would
    be shorter. You could do (^|[- ']) to match "beginning of string or
    dash or space or apostrophe", but then that character would be included
    in the resulting string. Which means that it would be, for instance,
    " d" or "-d" or "'d" instead of "d", and therefore won't be in the
    blacklist and won't capitalize properly (since String#capitalize operates
    on the first character, which will be the space or dash or apostrophe).
    The block has to compensate for that. Something like this:

    string.gsub!(/(^|[- '])([a-z]+)/) { $1 + $2.capitalize }

    Except that [a-z] won't match accented characters, so it's more like this:

    string.gsub!(/(^|[- '])([a-záàâçéèêíìîóòôúùû]+)/) { $1 + $2.capitalize }

    And if the names aren't limited to French, then even more special characters
    creep in . . .

    -Mark
    Mark J. Reed Guest

  7. #6

    Default Re: [newbie] upper to lower first letter of a word

    On Tue, Sep 23, 2003 at 05:49:24PM +0000, Mark J. Reed wrote:
    > string.gsub!(/(^|[- '])([a-záàâçéèêíìîóòôúùû]+)/) { $1 + $2.capitalize }
    Left off the blacklist check, which should be applied to $2:

    string.gsub!(/(^|[- '])([a-záàâçéèêíìîóòôúùû]+)/) {
    black_list.include?($2) ? $1 + $2 : $1 + $2.capitalize
    }

    -Mark
    Mark J. Reed Guest

  8. #7

    Default Re: [newbie] upper to lower first letter of a word

    Mark J. Reed <markjreed@mail.com> wrote:
    > Really? That's arguably a bug. What character encoding are you using?
    I'm (more-or-less) sure about that because even if i put :
    l.gsub!(/\b[a-záàâçéèêíìîóòöôúùüû]+/) { |w| black_list.include?(w) ? w
    : w.capitalize }

    i get :
    MâCon SupéRieur
    when input was :
    Mâcon supérieur
    > Accented letters should be in \w, not \W, and therefore the
    > space between one and an adjacent letter should not match \b.
    > But Ruby regexes may be ASCII-only, and even if not, they're probably
    > Latin-1-only. So, for instance, they wouldn't work on UTF-8 strings.
    precisely i'm using utf-8 °;)
    however, i'm able to do a try using iso-8859-1, my word editor (Pepper
    on MacOS X) is able to transcode within 2 clicks + one cut'n paste rom
    utf to iso...
    sounds strange to me because Ruby is coming from Japan where "special"
    chars are every-day chars ???

    [snip]
    > The block has to compensate for that. Something like this:
    >
    > string.gsub!(/(^|[- '])([a-z]+)/) { $1 + $2.capitalize }
    >
    > Except that [a-z] won't match accented characters, so it's more like this:
    >
    > string.gsub!(/(^|[- '])([a-záàâçéèêíìîóòôúùû]+)/) { $1 + $2.capitalize }
    >
    > And if the names aren't limited to French, then even more special characters
    > creep in . . .
    Yes, right, i know, for the time being, only about french and german
    accentuated chars...

    However because vintage are classified by area i might have to change
    regex upon region...
    --
    Yvon
    Yvon Thoraval Guest

  9. #8

    Default Re: [newbie] upper to lower first letter of a word


    On Tuesday, September 23, 2003, at 12:34 PM, Yvon Thoraval wrote:
    > Recently, i get a vintage list (more than 500 items) with poor typo,
    > for
    > example, i've :
    >
    > Côte de beaune-villages
    >
    > instead of :
    >
    > Côte de Beaune-Villages
    >
    > Crémant d'alsace
    >
    > instead of :
    >
    > Crémant d'Alsace
    >
    > i wonder of the way to change lower to upper case and also of
    >
    > a regex able to do the trick.
    >
    > something like :
    >
    > every letter following a " ", "-" or "'" should be upper if not
    > belonging to a black list of words :
    >
    > black_list = %w{d de du la le sec sur entre etc...}
    You might adapt the English language 'titlecase' program, which can be
    found here:

    [url]http://zem.novylen.net/ruby/titlecase.rb[/url]

    Regards,

    Mark


    Mark Wilson Guest

  10. #9

    Default Re: [newbie] upper to lower first letter of a word

    Hi --

    On Wed, 24 Sep 2003, Yvon Thoraval wrote:
    > Yvon Thoraval <yvon.thoravallist@-SUPPRIMEZ-free.fr.invalid> wrote:
    >
    > >
    > > > string.gsub!(/\b[a-z]+/) { |w| black_list.include?(w) ? w : w.capitalize }
    > >
    > > a lot of tanxs °;)
    >
    > it seems, it's a little bit trickier because accentuated characters are
    > taken as \b for example :
    >
    > Vosne-romanée
    > becomes :
    > Vosne-RomanéE
    I believe the /s modifier to the regex will help you here by changing
    the encoding, though I'm having character-rendering issues which make
    it hard for me to test.... But try this, in the hope that I'm right
    even though I can't see the characters:

    str.gsub!(/\b[a-z]+/s) {|w| black_list.include?(w) ? w : w.capitalize}


    David

    --
    David Alan Black
    home: [email]dblack@superlink.net[/email]
    work: [email]blackdav@shu.edu[/email]
    Web: [url]http://pirate.shu.edu/~blackdav[/url]


    dblack@superlink.net Guest

  11. #10

    Default Re: [newbie] upper to lower first letter of a word

    Mark Wilson <mwilson13@cox.net> wrote:
    > You might adapt the English language 'titlecase' program, which can be
    > found here:
    >
    > [url]http://zem.novylen.net/ruby/titlecase.rb[/url]
    Yes, tanxs, that way i'd change more easily rules versus area of
    vintage...
    --
    Yvon
    Yvon Thoraval Guest

  12. #11

    Default Re: [newbie] upper to lower first letter of a word

    <dblack@superlink.net> wrote:
    > I believe the /s modifier to the regex will help you here by changing
    > the encoding, though I'm having character-rendering issues which make
    > it hard for me to test.... But try this, in the hope that I'm right
    > even though I can't see the characters:
    >
    > str.gsub!(/\b[a-z]+/s) {|w| black_list.include?(w) ? w : w.capitalize}
    tanxs, i don't remember (from Perl) what's the meaning of this "s" ?


    --
    Yvon
    Yvon Thoraval Guest

  13. #12

    Default Re: [newbie] upper to lower first letter of a word

    Hi --

    On Wed, 24 Sep 2003, Yvon Thoraval wrote:
    > <dblack@superlink.net> wrote:
    >
    > > I believe the /s modifier to the regex will help you here by changing
    > > the encoding, though I'm having character-rendering issues which make
    > > it hard for me to test.... But try this, in the hope that I'm right
    > > even though I can't see the characters:
    > >
    > > str.gsub!(/\b[a-z]+/s) {|w| black_list.include?(w) ? w : w.capitalize}
    >
    > tanxs, i don't remember (from Perl) what's the meaning of this "s" ?
    It's different in Perl and Ruby. In Perl, it means: treat the string
    as a single line, so that '.' matches newline. In Ruby, it affects
    the encoding.... I wish I could give a more knowledgeable account,
    but I've never actually used it myself and can't seem to dig up
    documentation.


    David

    --
    David Alan Black
    home: [email]dblack@superlink.net[/email]
    work: [email]blackdav@shu.edu[/email]
    Web: [url]http://pirate.shu.edu/~blackdav[/url]


    dblack@superlink.net Guest

  14. #13

    Default Re: [newbie] upper to lower first letter of a word

    On Wed, Sep 24, 2003 at 06:07:08AM +0900, [email]dblack@superlink.net[/email] wrote:
    > On Wed, 24 Sep 2003, Yvon Thoraval wrote:
    > > tanxs, i don't remember (from Perl) what's the meaning of this "s" ?
    > It's different in Perl and Ruby. In Perl, it means: treat the string
    > as a single line, so that '.' matches newline. In Ruby, it affects
    > the encoding.... I wish I could give a more knowledgeable account,
    > but I've never actually used it myself and can't seem to dig up
    > documentation.
    >
    According to the Pickaxe, or at least the online version thereof
    (my dead-trees vesion is at home), /s means to use the SJIS
    (Shift-Japanese Information Systems or something like that) multibyte
    text encoding. Similarly, /e means to use EUC, and /u means to use
    UTF-8. So /u is probably a better bet than /s for Yvon.


    [url]http://www.rubycentral.com/book/ref_c_regexp.html#Regexp.new[/url]

    -Mark
    Mark J. Reed Guest

  15. #14

    Default Re: [newbie] upper to lower first letter of a word



    "Yvon Thoraval" <yvon.thoravallist@-SUPPRIMEZ-free.fr.invalid> schrieb im
    Newsbeitrag
    news:1g1r8u8.1hzv3mvupjeizN%yvon.thoravallist@-SUPPRIMEZ-free.fr.invalid...
    > Mark J. Reed <markjreed@mail.com> wrote:
    >
    > > Really? That's arguably a bug. What character encoding are you
    using?
    >
    > I'm (more-or-less) sure about that because even if i put :
    > l.gsub!(/\b[a-záàâçéèêíìîóòöôúùüû]+/) { |w| black_list.include?(w) ? w
    > : w.capitalize }
    I'd omit the "\b" at the beginning since "é" then still matches a word
    boundry:

    l.gsub!(/[a-záàâçéèêíìîóòöôúùüû]+/) { |w| black_list.include?(w) ? w
    : w.capitalize }

    Alternatively:

    l.gsub!(/[^\s!?.;:-]+/) {|w| black_list.include?(w) ? w : w.capitalize }

    Regards

    robert

    Robert Klemme Guest

  16. #15

    Default Re: [newbie] upper to lower first letter of a word

    Robert Klemme <bob.news@gmx.net> wrote:
    > I'd omit the "\b" at the beginning since "é" then still matches a word
    > boundry:
    >
    > l.gsub!(/[a-záàâçéèêíìîóòöôúùüû]+/) { |w| black_list.include?(w) ? w
    > : w.capitalize }
    yes, fine, i discovered also that capitalization don't work on
    accentuated chars (as é)

    then i've done another step for those "special" chars being as the first
    letter of a xord
    > Alternatively:
    >
    > l.gsub!(/[^\s!?.;:-]+/) {|w| black_list.include?(w) ? w : w.capitalize }
    ok, however in my list no punctuation as ?!;:... only " " and "-"
    --
    Yvon
    Yvon Thoraval Guest

  17. #16

    Default Re: [newbie] upper to lower first letter of a word

    > yes, fine, i discovered also that capitalization don't work on
    > accentuated chars (as é)
    You can use an old library named unicode:

    irb(main):001:0> $KCODE="u"
    => "u"
    irb(main):002:0> require "unicode"
    => true
    irb(main):003:0> Unicode.capitalize("Ã*ëÃ*ôů")
    => "ÀëÃ*ôů"

    [url]http://raa.ruby-lang.org/list.rhtml?name=unicode[/url]


    Carlos Guest

  18. #17

    Default Re: [newbie] upper to lower first letter of a word

    Carlos <angus@quovadis.com.ar> wrote:
    >
    > You can use an old library named unicode:
    >
    > irb(main):001:0> $KCODE="u"
    > => "u"
    > irb(main):002:0> require "unicode"
    > => true
    > irb(main):003:0> Unicode.capitalize("àëíô?")
    > => "Àëíô?"
    >
    > [url]http://raa.ruby-lang.org/list.rhtml?name=unicode[/url]
    tanxs for all !
    --
    Yvon
    Yvon Thoraval Guest

  19. #18

    Default Re: regexp needed to split letters from numbers

    I would appreciate it if someone could give me the regexp that it would
    split the following:
    for example -
    "clonidine300 mg" into "clonidine 300 mg"

    I have a bunch of drug data where the dose had been typed together.

    Thanks


    Thomas A. Reilly Guest

  20. #19

    Default Re: regexp needed to split letters from numbers

    On Saturday, 27 September 2003 at 12:17:19 +0900, Thomas A. Reilly wrote:
    > I would appreciate it if someone could give me the regexp that it would
    > split the following:
    > for example -
    > "clonidine300 mg" into "clonidine 300 mg"
    >
    > I have a bunch of drug data where the dose had been typed together.
    There's probably more than one way to do this. Here's one way:

    irb(main):001:0> s="clonidine300 mg"
    => "clonidine300 mg"
    irb(main):005:0> s.scan(/[a-zA-Z]+|\d+/) { |i| p i }
    "clonidine"
    "300"
    "mg"



    --
    Jim Freeze
    ----------
    Anybody can win, unless there happens to be a second entry.

    Jim Freeze Guest

  21. #20

    Default Re: regexp needed to split letters from numbers

    On Saturday, 27 September 2003 at 12:44:11 +0900, Jim Freeze wrote:
    > On Saturday, 27 September 2003 at 12:17:19 +0900, Thomas A. Reilly wrote:
    > > I would appreciate it if someone could give me the regexp that it would
    > > split the following:
    > > for example -
    > > "clonidine300 mg" into "clonidine 300 mg"
    > >
    > > I have a bunch of drug data where the dose had been typed together.
    >
    > There's probably more than one way to do this. Here's one way:
    >
    And yet another way:


    rb(main):018:0> m = /(\w+?)(\d+)\s+(\w+)/.match(s)
    => #<MatchData:0x81c2200>
    irb(main):019:0> m[1]
    => "clonidine"
    irb(main):020:0> m[2]
    => "300"
    irb(main):021:0> m[3]
    => "mg"

    --
    Jim Freeze
    ----------
    "There is no reason for any individual to have a computer in their
    home."
    -- Ken Olson, President of DEC, World Future Society
    Convention, 1977

    Jim Freeze Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139