Ask a Question related to PERL Beginners, Design and Development.
-
Stuart Clemons #1
Need help with a regex
This newbie needs help with a regex. Here's what the data from a text
file looks like. There's no delimiter and the fields aren't evenly spaced
apart.
apples San Antonio Fruit
oranges Sacramento Fruit
pineapples Honolulu Fruit
lemons Corona del Rey Fruit
Basically, I want to put the city names into an array. The first field,
the fruit name, is always one word with no spaces.
So, I would guess that the regex needs to grab everything after the first
word and before the beginning of Fruit. Then strip out all the spaces.
Or grab the beginning of the second word until the beginning of Fruit.
Then strip out the spaces after the city name.
Anyone know how to do that ?
I did recently buy the Mastering Regular Expressions, 2nd Edition book.
I've only read a little, but I've found the book to be very readable. If
I only had the time to really spend with it ! So much to learn, so little
time.
Thanks in advance for any help.
Stuart Clemons Guest
-
IP regex?
Gareth Glaccum wrote: How about using m/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/ and testing $1 - $4 for compliance? Much cleaner. -- Cheers, -
Regex help
I'd like to replace any html tags containing "< >" with a space. For example, <TR VALIGN=TOP>, I'd like to replace that with a space. Is there a... -
REGEX help pls
in the regex buddy they are explaining: "Be careful when using the negated shorthands inside square brackets. is not the same as . The latter... -
Regex..
Could some good samaritan help me out with this pls... I am trying to find a regular expression for the below string.. ExchangeName =... -
Need help with regex
> I have a directory of files that I want to move to another directory. -
Kenton Brede #2
Re: Need help with a regex
On Fri, Jan 23, 2004 at 12:01:13AM -0500, [email]stuart_clemons@us.ibm.com[/email] (stuart_clemons@us.ibm.com) wrote:
I'm not that experienced with Perl but here is my stab at it.> This newbie needs help with a regex. Here's what the data from a text
> file looks like. There's no delimiter and the fields aren't evenly spaced
> apart.
>
> apples San Antonio Fruit
> oranges Sacramento Fruit
> pineapples Honolulu Fruit
> lemons Corona del Rey Fruit
>
> Basically, I want to put the city names into an array. The first field,
> the fruit name, is always one word with no spaces.
>
> So, I would guess that the regex needs to grab everything after the first
> word and before the beginning of Fruit. Then strip out all the spaces.
>
> Or grab the beginning of the second word until the beginning of Fruit.
> Then strip out the spaces after the city name.
>
> Anyone know how to do that ?
#!/usr/bin/perl
use warnings;
use strict;
while (<DATA>) {
if ($_ =~ /^\w+\s+(.+)\s+F\w+$/) {
push (my @array, $1);
print "@array\n";
}
}
__DATA__
apples San Antonio Fruit
oranges Sacramento Fruit
pineapples Honolulu Fruit
lemons Corona del Rey Fruit
__END__
Not the greatest regex but it works. I'm sure you will get better
solutions.
/^ = Beginning of line
\w+ = one or more word characters
\s+ = one or more white spaces
(.+) = any character one or more times grouped by (), contains "city"
\s+F = white space up to "F"
\w+$/ = one or more word characters up to end of line.
push loads @array with "$1" which snags what is in (.+) from the regex.
hth,
Kent
Kenton Brede Guest
-
Stuart Clemons #3
RE: Need help with a regex
Thanks very much Tim. I just did a quick test on my real file and it
worked perfectly.
I definitely still have a lot to learn with both Perl and regex's, so I
really appreciate the explanation as well. Though your script is very
compact, I learned a lot from it. Such as how you initialized the array.
I have a couple of scripts where I get warnings about either improper or
uninitialized arrays, or something to that effect. I tried to fix those,
but was unsuccessful. Those scripts produced the output I wanted, but the
warnings are bothersome. I'll take another look at those scripts to see if
initializing using "my @arrayname = ( );" will help.
Also, the "push" structure for adding elements to the array was very
helpful. I have a way to do it, and while my way works and is somewhat
creative, my way is actually really embarrassingly bad and inefficient
coding. So, I learned from that too.
It's funny how all this stuff is in the Perl books that I've been reading,
but once I need to solve a problem, the exact right way to do it doesn't
come to me. I can spend hours trying to do some pretty simple stuff. I
can usually come up with a solution, but I know that it's not usually
efficient nor is it really close to the right way to do it. But, the
good news is, if I think about where my Perl skills are today compared to
a month ago, I'm making progress !
Anyway, sorry for being so looong winded. The bottom line is that I
really appreciate your help.
"Tim Johnson" <tjohnson@sandisk.com>
01/23/2004 01:32 AM
To
"Tim Johnson" <tjohnson@sandisk.com>, <stuart_clemons@us.ibm.com>,
<beginners@perl.org>
cc
Subject
RE: Need help with a regex
Ooh. That's embarassing. I didn't pay close enough attention to the OP.
Some of the inside matches contain spaces. My regex should have been:
/^\S+\s+(.+)\s+/
which would match:
* the beginning of the line (^)
* followed by one or more non-whitespace characters (\S+)
* followed by one or more whitespace characters (\s+)
* followed by one or more of any characters including
whitespace (.+)
* followed by one or more whitespace characters (\s+)
because Perl will match the largest possible number of characters, the .+
will match everything between the two outside spaces.
-----Original Message-----
From: Tim Johnson
Sent: Thu 1/22/2004 9:31 PM
To: [email]stuart_clemons@us.ibm.com[/email]; [email]beginners@perl.org[/email]
Cc:
Subject: RE: Need help with a regex
Try this on for size:
#####################
use strict;
use warnings;
my @cities = ();
open(INFILE,"myfile.txt") || die "Couldn't open
myfile.txt for reading!\n";
while(<INFILE>){
$_ =~ /^\S+\s+(\S+)/;
push @cities,$1;
}
#do something to @cities
#####################
which basically means to match:
* the start of the line (^)
* followed by one or more non-whitespace characters
(\S+)
* followed by one or more whitespace characters
(\s+)
* followed by one or more non-whitespace characters
(\S+)
the parentheses around the last non-whitespace match
assign it to $1
Note: Check out "perldoc perlre" for the man pages. It
might be worth looking over real quick before you dig into the book.
Or, for the quick and easy way without a regex, how bout:
#############################
use strict;
use warnings;
my @cities;
open(INFILE,"myfile.txt") || die "Could not open
myfile.txt for reading!\n";
while(<INFILE>){
push @cities,(split /\s+/,$_)[1];
}
#############################
which does a split on the line and returns the second
element of the resulting list and assigns it to @cities.
-----Original Message-----
From: [email]stuart_clemons@us.ibm.com[/email]
[mailto:stuart_clemons@us.ibm.com]
Sent: Thu 1/22/2004 9:01 PM
To: [email]beginners@perl.org[/email]
Cc:
Subject: Need help with a regex
This newbie needs help with a regex. Here's what
the data from a text
file looks like. There's no delimiter and the
fields aren't evenly spaced
apart.
apples San Antonio Fruit
oranges Sacramento Fruit
pineapples Honolulu Fruit
lemons Corona del Rey Fruit
Basically, I want to put the city names into an
array. The first field,
the fruit name, is always one word with no
spaces.
Stuart Clemons Guest
-
Jeff 'Japhy' Pinyan #4
Re: Need help with a regex
On Jan 23, [email]stuart_clemons@us.ibm.com[/email] said:
Well, there are many ways. You could split the string on whitespace,>This newbie needs help with a regex. Here's what the data from a text
>file looks like. There's no delimiter and the fields aren't evenly spaced
>apart.
>
>apples San Antonio Fruit
>oranges Sacramento Fruit
>pineapples Honolulu Fruit
>lemons Corona del Rey Fruit
>
>Basically, I want to put the city names into an array. The first field,
>the fruit name, is always one word with no spaces.
>
>Anyone know how to do that ?
remove the first and last elements, and join the others with spaces:
for (@data) {
my @fields = split;
shift @fields;
pop @fields;
push @cities, "@fields"; # "@array" = join(" ", @array)
}
Or, you could use a regex that gets SPECIFICALLY what you want:
for (@data) {
push @cities, $1 if /^\S+\s+(\S+(?:\s+\S+)*)\s+\S+$/;
}
That regex might need a bit of explanation:
m{
^ # the beginning of the string
\S+ # one or more non-spaces
\s+ # one or more spaces
( # capture to $1:
\S+ # first word of the city name
(?: \s+ \S+ )* # *ALL* remaining words
)
\s+ # one or more spaces
\S+ # one or more non-spaces
$ # the end of the string
}x;
What this does on a string like "peach Georgia fruit" is this: the first
\S+\s+ matches "peach ". Then we capture "Georgia fruit" to $1. However,
the REST of the regex still has to match, but it can't, so the (?:\s+\S+)*
backtracks -- it gives up one of the chunks it matched, so $1 is only
"Georgia". Then the last \s+\S+ can match " fruit".
--
Jeff "japhy" Pinyan [email]japhy@pobox.com[/email] [url]http://www.pobox.com/~japhy/[/url]
RPI Acacia brother #734 [url]http://www.perlmonks.org/[/url] [url]http://www.cpan.org/[/url]
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
[ I'm looking for programming work. If you like my work, let me know. ]
Jeff 'Japhy' Pinyan Guest
-
Stuart Clemons #5
Re: Need help with a regex
Thanks Jeff. I hope to try this out later today. I thought I had the
solution earlier this morning, but I ran into a problem. I hope this will
solve it ! Thanks again.
"Jeff 'japhy' Pinyan" <japhy@perlmonk.org>
01/23/2004 10:34 AM
Please respond to
[email]japhy@pobox.com[/email]
To
[email]stuart_clemons@us.ibm.com[/email]
cc
[email]beginners@perl.org[/email]
Subject
Re: Need help with a regex
On Jan 23, [email]stuart_clemons@us.ibm.com[/email] said:
Well, there are many ways. You could split the string on whitespace,>This newbie needs help with a regex. Here's what the data from a text
>file looks like. There's no delimiter and the fields aren't evenly spaced
>apart.
>
>apples San Antonio Fruit
>oranges Sacramento Fruit
>pineapples Honolulu Fruit
>lemons Corona del Rey Fruit
>
>Basically, I want to put the city names into an array. The first field,
>the fruit name, is always one word with no spaces.
>
>Anyone know how to do that ?
remove the first and last elements, and join the others with spaces:
for (@data) {
my @fields = split;
shift @fields;
pop @fields;
push @cities, "@fields"; # "@array" = join(" ", @array)
}
Or, you could use a regex that gets SPECIFICALLY what you want:
for (@data) {
push @cities, $1 if /^\S+\s+(\S+(?:\s+\S+)*)\s+\S+$/;
}
That regex might need a bit of explanation:
m{
^ # the beginning of the string
\S+ # one or more non-spaces
\s+ # one or more spaces
( # capture to $1:
\S+ # first word of the city name
(?: \s+ \S+ )* # *ALL* remaining words
)
\s+ # one or more spaces
\S+ # one or more non-spaces
$ # the end of the string
}x;
What this does on a string like "peach Georgia fruit" is this: the first
\S+\s+ matches "peach ". Then we capture "Georgia fruit" to $1. However,
the REST of the regex still has to match, but it can't, so the (?:\s+\S+)*
backtracks -- it gives up one of the chunks it matched, so $1 is only
"Georgia". Then the last \s+\S+ can match " fruit".
--
Jeff "japhy" Pinyan [email]japhy@pobox.com[/email] [url]http://www.pobox.com/~japhy/[/url]
RPI Acacia brother #734 [url]http://www.perlmonks.org/[/url] [url]http://www.cpan.org/[/url]
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
[ I'm looking for programming work. If you like my work, let me know. ]
Stuart Clemons Guest
-
R. Joseph Newton #6
Re: Need help with a regex
[email]stuart_clemons@us.ibm.com[/email] wrote:
I'd vote for this one--almost. It does the right thing with positions,> Thanks Jeff. I hope to try this out later today. I thought I had the
> solution earlier this morning, but I ran into a problem. I hope this will
> solve it ! Thanks again.
>>> >apples San Antonio Fruit
> >oranges Sacramento Fruit
> >pineapples Honolulu Fruit
> >lemons Corona del Rey Fruit
> >
> >Basically, I want to put the city names into an array. The first field,
> >the fruit name, is always one word with no spaces.
> >
> >Anyone know how to do that ?
> Well, there are many ways. You could split the string on whitespace,
> remove the first and last elements, and join the others with spaces:
>
> for (@data) {
> my @fields = split;
> shift @fields;
> pop @fields;
> push @cities, "@fields"; # "@array" = join(" ", @array)
> }
presuming that Stuart can count on the fruit type and class always being
contained in a single token. The one thing I would do is to give the parts
meaningful names. Unless he totally wants to discard the significant fruit
name as well as the non-informaticve class desiganation "Fruit", he might as
well preserve the information that he has available:
foreach (@data) {
my @fields = split;
pop @fields; # only use void statements to get rid of garbage
my $growing_location = {
'fruit type' => shift @fields,
'growing location' => join @fields
}
push @cities, $growing_location;
}
Okay, I don't know whether these really indicate growing locations, but I am
assuming sanity here--that there is some articulable meaning to the
juxtaposition. The identifiers in the code should communicate that meaning..
Otherwise you are throwing information away, the antithesis of the
programmer's purpose. Besides, clearly named variables are much easier to
debug.
Joseph
R. Joseph Newton Guest



Reply With Quote

