Ask a Question related to Ruby, Design and Development.
-
Yvon Thoraval #1
[newbie] upper to lower first letter of a word
Recently, i get a vintage list (more than 500 items) with poor typo, for
example, i've :
Côte de beaune-villages
instead of :
Côte de Beaune-Villages
Crémant d'alsace
instead of :
Crémant d'Alsace
i wonder of the way to change lower to upper case and also of
a regex able to do the trick.
something like :
every letter following a " ", "-" or "'" should be upper if not
belonging to a black list of words :
black_list = %w{d de du la le sec sur entre etc...}
--
Yvon
Yvon Thoraval Guest
-
No Pcase? how do I capitolize the first letter of aword, with the rest of the word lower case?
I know I can go one way or another, Ucase(var) all upppercase, or Lcase(var) all lowercase, but how can I get a Propercase? I've seen this in VB,... -
Line breaks, word and letter spacing change in the pdf
Hi Everyone, I have a 1,000 page (black and white, mostly text and line drawings) publication originally formatted in WordPerfect, and have... -
Extra Letter in Running Head when converted from Word
I am converting a document from Word to PDF. The document has running headers, i.e. like a dictionary -- it picks up the section number from the page... -
String formatting function - First char Upper, rest lower
Hi, Newbie question. Does anyone know of a function or script that will capitalize the first char and lowercase the remaining chars of each... -
Regexp help - upper to lower ONLY matching this pattern
Hi, I have a bunch of variables that I need changed. Basicly, anywhere i see something like this: $test{'HI_THERE'} I need tranlated to this:... -
Mark J. Reed #2
Re: [newbie] upper to lower first letter of a word
On Tue, Sep 23, 2003 at 06:29:58PM +0200, Yvon Thoraval wrote:
string.gsub!(/\b[a-z]+/) { |w| black_list.include?(w) ? w : w.capitalize }> Recently, i get a vintage list (more than 500 items) with poor typo, for
> example, i've :
>
> Côte de beaune-villages
>
> instead of :
>
> Côte de Beaune-Villages
>
> Crémant d'alsace
>
> instead of :
>
> Crémant d'Alsace
>
> i wonder of the way to change lower to upper case and also of
>
> a regex able to do the trick.
>
> something like :
>
> every letter following a " ", "-" or "'" should be upper if not
> belonging to a black list of words :
>
> black_list = %w{d de du la le sec sur entre etc...}
-Mark
Mark J. Reed Guest
-
Yvon Thoraval #3
Re: [newbie] upper to lower first letter of a word
Mark J. Reed <markjreed@mail.com> wrote:
a lot of tanxs °;)> string.gsub!(/\b[a-z]+/) { |w| black_list.include?(w) ? w : w.capitalize }
--
Yvon
Yvon Thoraval Guest
-
Yvon Thoraval #4
Re: [newbie] upper to lower first letter of a word
Yvon Thoraval <yvon.thoravallist@-SUPPRIMEZ-free.fr.invalid> wrote:
it seems, it's a little bit trickier because accentuated characters are>>> > string.gsub!(/\b[a-z]+/) { |w| black_list.include?(w) ? w : w.capitalize }
> a lot of tanxs °;)
taken as \b for example :
Vosne-romanée
becomes :
Vosne-RomanéE
then instead of \b i would have to exclude a list of chars :
[à|ä|â|é|è|ê|î|ö|ô|ü|ù]
--
Yvon
Yvon Thoraval Guest
-
Mark J. Reed #5
Re: [newbie] upper to lower first letter of a word
On Tue, Sep 23, 2003 at 07:23:52PM +0200, Yvon Thoraval wrote:
Really? That's arguably a bug. What character encoding are you using?> it seems, it's a little bit trickier because accentuated characters are
> taken as \b
Accented letters should be in \w, not \W, and therefore the
space between one and an adjacent letter should not match \b.
But Ruby regexes may be ASCII-only, and even if not, they're probably
Latin-1-only. So, for instance, they wouldn't work on UTF-8 strings.
First, you don't need the pipes (|'s) there. Pipes are for> Vosne-romanée
> becomes :
> Vosne-RomanéE
>
> then instead of \b i would have to exclude a list of chars :
> [à|ä|â|é|è|ê|î|ö|ô|ü|ù]
alternation without the [...]; basically, [abc] is short for
(a|b|c). The pipe form is most useful when the alternatives are
not all single characters, for instance (alfa|bravo|charlie).
I'm not sure whether the exclude-list or the include-list would
be shorter. You could do (^|[- ']) to match "beginning of string or
dash or space or apostrophe", but then that character would be included
in the resulting string. Which means that it would be, for instance,
" d" or "-d" or "'d" instead of "d", and therefore won't be in the
blacklist and won't capitalize properly (since String#capitalize operates
on the first character, which will be the space or dash or apostrophe).
The block has to compensate for that. Something like this:
string.gsub!(/(^|[- '])([a-z]+)/) { $1 + $2.capitalize }
Except that [a-z] won't match accented characters, so it's more like this:
string.gsub!(/(^|[- '])([a-záàâçéèêíìîóòôúùû]+)/) { $1 + $2.capitalize }
And if the names aren't limited to French, then even more special characters
creep in . . .
-Mark
Mark J. Reed Guest
-
Mark J. Reed #6
Re: [newbie] upper to lower first letter of a word
On Tue, Sep 23, 2003 at 05:49:24PM +0000, Mark J. Reed wrote:
Left off the blacklist check, which should be applied to $2:> string.gsub!(/(^|[- '])([a-záàâçéèêíìîóòôúùû]+)/) { $1 + $2.capitalize }
string.gsub!(/(^|[- '])([a-záàâçéèêíìîóòôúùû]+)/) {
black_list.include?($2) ? $1 + $2 : $1 + $2.capitalize
}
-Mark
Mark J. Reed Guest
-
Yvon Thoraval #7
Re: [newbie] upper to lower first letter of a word
Mark J. Reed <markjreed@mail.com> wrote:
I'm (more-or-less) sure about that because even if i put :> Really? That's arguably a bug. What character encoding are you using?
l.gsub!(/\b[a-záàâçéèêíìîóòöôúùüû]+/) { |w| black_list.include?(w) ? w
: w.capitalize }
i get :
MâCon SupéRieur
when input was :
Mâcon supérieur
precisely i'm using utf-8 °;)> Accented letters should be in \w, not \W, and therefore the
> space between one and an adjacent letter should not match \b.
> But Ruby regexes may be ASCII-only, and even if not, they're probably
> Latin-1-only. So, for instance, they wouldn't work on UTF-8 strings.
however, i'm able to do a try using iso-8859-1, my word editor (Pepper
on MacOS X) is able to transcode within 2 clicks + one cut'n paste rom
utf to iso...
sounds strange to me because Ruby is coming from Japan where "special"
chars are every-day chars ???
[snip]
Yes, right, i know, for the time being, only about french and german> The block has to compensate for that. Something like this:
>
> string.gsub!(/(^|[- '])([a-z]+)/) { $1 + $2.capitalize }
>
> Except that [a-z] won't match accented characters, so it's more like this:
>
> string.gsub!(/(^|[- '])([a-záàâçéèêíìîóòôúùû]+)/) { $1 + $2.capitalize }
>
> And if the names aren't limited to French, then even more special characters
> creep in . . .
accentuated chars...
However because vintage are classified by area i might have to change
regex upon region...
--
Yvon
Yvon Thoraval Guest
-
Mark Wilson #8
Re: [newbie] upper to lower first letter of a word
On Tuesday, September 23, 2003, at 12:34 PM, Yvon Thoraval wrote:
You might adapt the English language 'titlecase' program, which can be> Recently, i get a vintage list (more than 500 items) with poor typo,
> for
> example, i've :
>
> Côte de beaune-villages
>
> instead of :
>
> Côte de Beaune-Villages
>
> Crémant d'alsace
>
> instead of :
>
> Crémant d'Alsace
>
> i wonder of the way to change lower to upper case and also of
>
> a regex able to do the trick.
>
> something like :
>
> every letter following a " ", "-" or "'" should be upper if not
> belonging to a black list of words :
>
> black_list = %w{d de du la le sec sur entre etc...}
found here:
[url]http://zem.novylen.net/ruby/titlecase.rb[/url]
Regards,
Mark
Mark Wilson Guest
-
dblack@superlink.net #9
Re: [newbie] upper to lower first letter of a word
Hi --
On Wed, 24 Sep 2003, Yvon Thoraval wrote:
I believe the /s modifier to the regex will help you here by changing> Yvon Thoraval <yvon.thoravallist@-SUPPRIMEZ-free.fr.invalid> wrote:
>>> >> >> > > string.gsub!(/\b[a-z]+/) { |w| black_list.include?(w) ? w : w.capitalize }
> > a lot of tanxs °;)
> it seems, it's a little bit trickier because accentuated characters are
> taken as \b for example :
>
> Vosne-romanée
> becomes :
> Vosne-RomanéE
the encoding, though I'm having character-rendering issues which make
it hard for me to test.... But try this, in the hope that I'm right
even though I can't see the characters:
str.gsub!(/\b[a-z]+/s) {|w| black_list.include?(w) ? w : w.capitalize}
David
--
David Alan Black
home: [email]dblack@superlink.net[/email]
work: [email]blackdav@shu.edu[/email]
Web: [url]http://pirate.shu.edu/~blackdav[/url]
dblack@superlink.net Guest
-
Yvon Thoraval #10
Re: [newbie] upper to lower first letter of a word
Mark Wilson <mwilson13@cox.net> wrote:
Yes, tanxs, that way i'd change more easily rules versus area of> You might adapt the English language 'titlecase' program, which can be
> found here:
>
> [url]http://zem.novylen.net/ruby/titlecase.rb[/url]
vintage...
--
Yvon
Yvon Thoraval Guest
-
Yvon Thoraval #11
Re: [newbie] upper to lower first letter of a word
<dblack@superlink.net> wrote:
tanxs, i don't remember (from Perl) what's the meaning of this "s" ?> I believe the /s modifier to the regex will help you here by changing
> the encoding, though I'm having character-rendering issues which make
> it hard for me to test.... But try this, in the hope that I'm right
> even though I can't see the characters:
>
> str.gsub!(/\b[a-z]+/s) {|w| black_list.include?(w) ? w : w.capitalize}
--
Yvon
Yvon Thoraval Guest
-
dblack@superlink.net #12
Re: [newbie] upper to lower first letter of a word
Hi --
On Wed, 24 Sep 2003, Yvon Thoraval wrote:
It's different in Perl and Ruby. In Perl, it means: treat the string> <dblack@superlink.net> wrote:
>>> > I believe the /s modifier to the regex will help you here by changing
> > the encoding, though I'm having character-rendering issues which make
> > it hard for me to test.... But try this, in the hope that I'm right
> > even though I can't see the characters:
> >
> > str.gsub!(/\b[a-z]+/s) {|w| black_list.include?(w) ? w : w.capitalize}
> tanxs, i don't remember (from Perl) what's the meaning of this "s" ?
as a single line, so that '.' matches newline. In Ruby, it affects
the encoding.... I wish I could give a more knowledgeable account,
but I've never actually used it myself and can't seem to dig up
documentation.
David
--
David Alan Black
home: [email]dblack@superlink.net[/email]
work: [email]blackdav@shu.edu[/email]
Web: [url]http://pirate.shu.edu/~blackdav[/url]
dblack@superlink.net Guest
-
Mark J. Reed #13
Re: [newbie] upper to lower first letter of a word
On Wed, Sep 24, 2003 at 06:07:08AM +0900, [email]dblack@superlink.net[/email] wrote:
According to the Pickaxe, or at least the online version thereof> On Wed, 24 Sep 2003, Yvon Thoraval wrote:> It's different in Perl and Ruby. In Perl, it means: treat the string> > tanxs, i don't remember (from Perl) what's the meaning of this "s" ?
> as a single line, so that '.' matches newline. In Ruby, it affects
> the encoding.... I wish I could give a more knowledgeable account,
> but I've never actually used it myself and can't seem to dig up
> documentation.
>
(my dead-trees vesion is at home), /s means to use the SJIS
(Shift-Japanese Information Systems or something like that) multibyte
text encoding. Similarly, /e means to use EUC, and /u means to use
UTF-8. So /u is probably a better bet than /s for Yvon.
[url]http://www.rubycentral.com/book/ref_c_regexp.html#Regexp.new[/url]
-Mark
Mark J. Reed Guest
-
Robert Klemme #14
Re: [newbie] upper to lower first letter of a word
"Yvon Thoraval" <yvon.thoravallist@-SUPPRIMEZ-free.fr.invalid> schrieb im
Newsbeitrag
news:1g1r8u8.1hzv3mvupjeizN%yvon.thoravallist@-SUPPRIMEZ-free.fr.invalid...using?> Mark J. Reed <markjreed@mail.com> wrote:
>> > Really? That's arguably a bug. What character encoding are youI'd omit the "\b" at the beginning since "é" then still matches a word>
> I'm (more-or-less) sure about that because even if i put :
> l.gsub!(/\b[a-záàâçéèêíìîóòöôúùüû]+/) { |w| black_list.include?(w) ? w
> : w.capitalize }
boundry:
l.gsub!(/[a-záàâçéèêíìîóòöôúùüû]+/) { |w| black_list.include?(w) ? w
: w.capitalize }
Alternatively:
l.gsub!(/[^\s!?.;:-]+/) {|w| black_list.include?(w) ? w : w.capitalize }
Regards
robert
Robert Klemme Guest
-
Yvon Thoraval #15
Re: [newbie] upper to lower first letter of a word
Robert Klemme <bob.news@gmx.net> wrote:
yes, fine, i discovered also that capitalization don't work on> I'd omit the "\b" at the beginning since "é" then still matches a word
> boundry:
>
> l.gsub!(/[a-záàâçéèêíìîóòöôúùüû]+/) { |w| black_list.include?(w) ? w
> : w.capitalize }
accentuated chars (as é)
then i've done another step for those "special" chars being as the first
letter of a xordok, however in my list no punctuation as ?!;:... only " " and "-"> Alternatively:
>
> l.gsub!(/[^\s!?.;:-]+/) {|w| black_list.include?(w) ? w : w.capitalize }
--
Yvon
Yvon Thoraval Guest
-
Carlos #16
Re: [newbie] upper to lower first letter of a word
> yes, fine, i discovered also that capitalization don't work on
You can use an old library named unicode:> accentuated chars (as é)
irb(main):001:0> $KCODE="u"
=> "u"
irb(main):002:0> require "unicode"
=> true
irb(main):003:0> Unicode.capitalize("Ã*ëÃ*ôů")
=> "ÀëÃ*ôů"
[url]http://raa.ruby-lang.org/list.rhtml?name=unicode[/url]
Carlos Guest
-
Yvon Thoraval #17
Re: [newbie] upper to lower first letter of a word
Carlos <angus@quovadis.com.ar> wrote:
tanxs for all !>
> You can use an old library named unicode:
>
> irb(main):001:0> $KCODE="u"
> => "u"
> irb(main):002:0> require "unicode"
> => true
> irb(main):003:0> Unicode.capitalize("àëíô?")
> => "Àëíô?"
>
> [url]http://raa.ruby-lang.org/list.rhtml?name=unicode[/url]
--
Yvon
Yvon Thoraval Guest
-
Thomas A. Reilly #18
Re: regexp needed to split letters from numbers
I would appreciate it if someone could give me the regexp that it would
split the following:
for example -
"clonidine300 mg" into "clonidine 300 mg"
I have a bunch of drug data where the dose had been typed together.
Thanks
Thomas A. Reilly Guest
-
Jim Freeze #19
Re: regexp needed to split letters from numbers
On Saturday, 27 September 2003 at 12:17:19 +0900, Thomas A. Reilly wrote:
There's probably more than one way to do this. Here's one way:> I would appreciate it if someone could give me the regexp that it would
> split the following:
> for example -
> "clonidine300 mg" into "clonidine 300 mg"
>
> I have a bunch of drug data where the dose had been typed together.
irb(main):001:0> s="clonidine300 mg"
=> "clonidine300 mg"
irb(main):005:0> s.scan(/[a-zA-Z]+|\d+/) { |i| p i }
"clonidine"
"300"
"mg"
--
Jim Freeze
----------
Anybody can win, unless there happens to be a second entry.
Jim Freeze Guest
-
Jim Freeze #20
Re: regexp needed to split letters from numbers
On Saturday, 27 September 2003 at 12:44:11 +0900, Jim Freeze wrote:
And yet another way:> On Saturday, 27 September 2003 at 12:17:19 +0900, Thomas A. Reilly wrote:>> > I would appreciate it if someone could give me the regexp that it would
> > split the following:
> > for example -
> > "clonidine300 mg" into "clonidine 300 mg"
> >
> > I have a bunch of drug data where the dose had been typed together.
> There's probably more than one way to do this. Here's one way:
>
rb(main):018:0> m = /(\w+?)(\d+)\s+(\w+)/.match(s)
=> #<MatchData:0x81c2200>
irb(main):019:0> m[1]
=> "clonidine"
irb(main):020:0> m[2]
=> "300"
irb(main):021:0> m[3]
=> "mg"
--
Jim Freeze
----------
"There is no reason for any individual to have a computer in their
home."
-- Ken Olson, President of DEC, World Future Society
Convention, 1977
Jim Freeze Guest



Reply With Quote

