Ask a Question related to PERL Miscellaneous, Design and Development.
-
mooseshoes #1
Regex Head Scratcher
All:
I've run into a dark alley with a regex and could use some assistance.
I have a list of keywords as keys in a hash (eg. %keywords) and I am looking
at each key to see if it can be matched in a string.
For example, let's say the current $keyword is "bush" and the $phrase is:
"George Bush arrived in San Francisco today from a trip to Russia."
Quite simply, the expression $phrase =~ /$keyword/is; would be a positive
match on "Bush". If $keyword becomes "U.S." the result not too
surprisingly become positive as well ("us" in Russia). Well, this is
undesirable so I change the expression to $phrase =~ /\Q$keyword/is; and
now there is no longer a match on "U.S." which is the desired result.
This is still insufficient, however, because if the $keyword becomes "Cisco"
there will be a positive match on "Francisco". Therefore, I'll make some
accommodation for leading and trailing characters in the expression in
order to isolate the keywords, thus:
$phrase =~ /(?:\s|'|"|\()$keyword(?:,|'|"|!|\.|\s|\))/is;
OK, so this works just fine but now I'm back to the "U.S." problem again and
I'd like to stick the \Q back in but clearly this will render the new parts
of the expression useless.
The question, therefore is how can I treat $keyword as quoted in this
context?
One thing I did try was to assign qq($keyword) to a new variable and search
on that variable, but this didn't seem to have an impact.
Thank you for your time and thoughts.
Moose
mooseshoes Guest
-
head do
:confused; Can neone tell me why this isn't working in mx..........it shows this ! ;" ?> got a feeling the escaping isn't right but it's so... -
Head-scratcher text problem
This is not a huge deal, but In CS, if you select some text and change the fill to none, then copy and paste it, the fill in the new text always... -
Write into <HEAD></HEAD> section?
Is it possible to write into HEAD section, for example to write out a LINK tag? -
[PHP-DEV] [PHP-CVS] cvs: php-src / NEWS /ext/standard basic_functions.c head.c head.h
> As Andi might say: "Why not call this http_headers()?" :) As you can probably guess, my answer to your first question is your second... -
Injecting code into the <head></head> section
Hi All, I have a web user control that, among other things, provides Print this page, and Email this page functionality I have this script that... -
mooseshoes #2
Re: Regex Head Scratcher
Please use this as the expression in question in the previous post:
$phrase =~ /(?:\s|'|"|\()?$keyword(?:,|'|"|!|\.|\s|\))?/is;
mooseshoes wrote:
> All:
>
> I've run into a dark alley with a regex and could use some assistance.
>
> I have a list of keywords as keys in a hash (eg. %keywords) and I am
> looking at each key to see if it can be matched in a string.
>
> For example, let's say the current $keyword is "bush" and the $phrase is:
>
> "George Bush arrived in San Francisco today from a trip to Russia."
>
> Quite simply, the expression $phrase =~ /$keyword/is; would be a positive
> match on "Bush". If $keyword becomes "U.S." the result not too
> surprisingly become positive as well ("us" in Russia). Well, this is
> undesirable so I change the expression to $phrase =~ /\Q$keyword/is; and
> now there is no longer a match on "U.S." which is the desired result.
>
> This is still insufficient, however, because if the $keyword becomes
> "Cisco"
> there will be a positive match on "Francisco". Therefore, I'll make some
> accommodation for leading and trailing characters in the expression in
> order to isolate the keywords, thus:
>
> $phrase =~ /(?:\s|'|"|\()$keyword(?:,|'|"|!|\.|\s|\))/is;
>
> OK, so this works just fine but now I'm back to the "U.S." problem again
> and I'd like to stick the \Q back in but clearly this will render the new
> parts of the expression useless.
>
> The question, therefore is how can I treat $keyword as quoted in this
> context?
>
> One thing I did try was to assign qq($keyword) to a new variable and
> search on that variable, but this didn't seem to have an impact.
>
> Thank you for your time and thoughts.
>
> Moosemooseshoes Guest
-
John W. Krahn #3
Re: Regex Head Scratcher
mooseshoes wrote:
You probably want to use the \b word boundary zero width assertion.>
> I've run into a dark alley with a regex and could use some assistance.
>
> I have a list of keywords as keys in a hash (eg. %keywords) and I am looking
> at each key to see if it can be matched in a string.
>
> For example, let's say the current $keyword is "bush" and the $phrase is:
>
> "George Bush arrived in San Francisco today from a trip to Russia."
>
> Quite simply, the expression $phrase =~ /$keyword/is; would be a positive
> match on "Bush". If $keyword becomes "U.S." the result not too
> surprisingly become positive as well ("us" in Russia). Well, this is
> undesirable so I change the expression to $phrase =~ /\Q$keyword/is; and
> now there is no longer a match on "U.S." which is the desired result.
>
> This is still insufficient, however, because if the $keyword becomes "Cisco"
> there will be a positive match on "Francisco". Therefore, I'll make some
> accommodation for leading and trailing characters in the expression in
> order to isolate the keywords, thus:
>
> $phrase =~ /(?:\s|'|"|\()$keyword(?:,|'|"|!|\.|\s|\))/is;
$phrase =~ /\b\Q$keyword\E\b/is;
John
--
use Perl;
program
fulfillment
John W. Krahn Guest
-
Sam Holden #4
Re: Regex Head Scratcher
On 10 Aug 2003 21:53:26 GMT, mooseshoes <mooseshoes@gmx.net> wrote:
Small nitpick it "U.S." does not match the "us" in Russia, it matches> All:
>
> I've run into a dark alley with a regex and could use some assistance.
>
> I have a list of keywords as keys in a hash (eg. %keywords) and I am looking
> at each key to see if it can be matched in a string.
>
> For example, let's say the current $keyword is "bush" and the $phrase is:
>
> "George Bush arrived in San Francisco today from a trip to Russia."
>
> Quite simply, the expression $phrase =~ /$keyword/is; would be a positive
> match on "Bush". If $keyword becomes "U.S." the result not too
> surprisingly become positive as well ("us" in Russia). Well, this is
> undesirable so I change the expression to $phrase =~ /\Q$keyword/is; and
> now there is no longer a match on "U.S." which is the desired result.
the "ussi" in Russia (if it was going to match "us" it'd grab the one
in Bush :).
That isn't going to work, take for example your $phrase above and>
> This is still insufficient, however, because if the $keyword becomes "Cisco"
> there will be a positive match on "Francisco". Therefore, I'll make some
> accommodation for leading and trailing characters in the expression in
> order to isolate the keywords, thus:
>
> $phrase =~ /(?:\s|'|"|\()$keyword(?:,|'|"|!|\.|\s|\))/is;
the keyword "george".
Use \b, it in all likelyhood does what you actually want.
See "perldoc perlre" for details on \b.
Also, the /s modifier is useless, since all it does is change "." to
match all characters (instead of all characters bar "\n"). Since you
don't have any (unescaped) "."s in your regex /s just serves to confuse
the reader of the regex (who will look for a dot).
Again, "perldoc perlre" for details on /s.
There is \E as well as \Q.>
> OK, so this works just fine but now I'm back to the "U.S." problem again and
> I'd like to stick the \Q back in but clearly this will render the new parts
> of the expression useless.
See "perldoc perlre" again, for details.
$phrase=~/\b\Q$keyword\E\b/i;> The question, therefore is how can I treat $keyword as quoted in this
> context?
You could also use:
$keyword = quotemeta $keyword;
$phrase=~/\b$keyword\b/i;
Though you might want to use a different name if you need the original
later.
Why would it?> One thing I did try was to assign qq($keyword) to a new variable and search
> on that variable, but this didn't seem to have an impact.
"$foo = qq($bar)" is the same as "$foo = $bar" if $bar is a string already.
Programming by guess is not efficient...
--
Sam Holden
Sam Holden Guest
-
Sam Holden #5
Re: Regex Head Scratcher
On 10 Aug 2003 22:05:12 GMT, mooseshoes <mooseshoes@gmx.net> wrote:
That matches exactly the same set of strings as> Please use this as the expression in question in the previous post:
>
> $phrase =~ /(?:\s|'|"|\()?$keyword(?:,|'|"|!|\.|\s|\))?/is;
$phrase =~ /$keyword/i;
does. Putting element which can match the empty string at either
end of regex will not change the strings it matches (it may cause it
to capture different parts of the string, but you aren't doing
any capturing).
[snip "previous post"]
--
Sam Holden
Sam Holden Guest
-
mooseshoes #6
Re: Regex Head Scratcher
Sam (and John if you're listening):
Thank you for your helpful remarks.
You both came up with the same solution (great minds think alike?) and I now
fully understand both the errors of my ways and why the proposed solution
is the best approach. Despite the fact that perlretut and Wall's bible go
to bed with me each night, discovering perlre will be a very helpful
resource as perlretut is light on both \b and \E.
Regarding /s, I didn't mention earlier that the phrases were actually
sub-phrases of HTML pages converted to text and having had trouble with
line breaks in previous experiences with these strings I had left in the
/s, but potentially I can remove it at this point.
And yes, I am occasionally guilty of what I call programming "flailing"
which is a bad practice of inserting code with only a vague notion of what
the result may be. I think I can attribute this to spending many years in
the marketing departments of large companies. ;) I generally do catch
myself, however, as I do prefer to know what is going on.
Cheers,
Moose
Sam Holden wrote:
> On 10 Aug 2003 21:53:26 GMT, mooseshoes <mooseshoes@gmx.net> wrote:>>> All:
>>
>> I've run into a dark alley with a regex and could use some assistance.
>>
>> I have a list of keywords as keys in a hash (eg. %keywords) and I am
>> looking at each key to see if it can be matched in a string.
>>
>> For example, let's say the current $keyword is "bush" and the $phrase is:
>>
>> "George Bush arrived in San Francisco today from a trip to Russia."
>>
>> Quite simply, the expression $phrase =~ /$keyword/is; would be a positive
>> match on "Bush". If $keyword becomes "U.S." the result not too
>> surprisingly become positive as well ("us" in Russia). Well, this is
>> undesirable so I change the expression to $phrase =~ /\Q$keyword/is; and
>> now there is no longer a match on "U.S." which is the desired result.
> Small nitpick it "U.S." does not match the "us" in Russia, it matches
> the "ussi" in Russia (if it was going to match "us" it'd grab the one
> in Bush :).
>>>>
>> This is still insufficient, however, because if the $keyword becomes
>> "Cisco"
>> there will be a positive match on "Francisco". Therefore, I'll make some
>> accommodation for leading and trailing characters in the expression in
>> order to isolate the keywords, thus:
>>
>> $phrase =~ /(?:\s|'|"|\()$keyword(?:,|'|"|!|\.|\s|\))/is;
> That isn't going to work, take for example your $phrase above and
> the keyword "george".
>
> Use \b, it in all likelyhood does what you actually want.
>
> See "perldoc perlre" for details on \b.
>
> Also, the /s modifier is useless, since all it does is change "." to
> match all characters (instead of all characters bar "\n"). Since you
> don't have any (unescaped) "."s in your regex /s just serves to confuse
> the reader of the regex (who will look for a dot).
>
> Again, "perldoc perlre" for details on /s.
>>>>
>> OK, so this works just fine but now I'm back to the "U.S." problem again
>> and I'd like to stick the \Q back in but clearly this will render the new
>> parts of the expression useless.
> There is \E as well as \Q.
>
> See "perldoc perlre" again, for details.
>>>> The question, therefore is how can I treat $keyword as quoted in this
>> context?
> $phrase=~/\b\Q$keyword\E\b/i;
>
> You could also use:
>
> $keyword = quotemeta $keyword;
>
> $phrase=~/\b$keyword\b/i;
>
> Though you might want to use a different name if you need the original
> later.
>
>>>> One thing I did try was to assign qq($keyword) to a new variable and
>> search on that variable, but this didn't seem to have an impact.
> Why would it?
>
> "$foo = qq($bar)" is the same as "$foo = $bar" if $bar is a string
> already.
>
> Programming by guess is not efficient...
>mooseshoes Guest
-
Sam Holden #7
Re: Regex Head Scratcher
On 11 Aug 2003 00:14:20 GMT, mooseshoes <mooseshoes@gmx.net> wrote:
More the normal way of performing such a match than great minds...> Sam (and John if you're listening):
>
> Thank you for your helpful remarks.
>
> You both came up with the same solution (great minds think alike?) and I now
> fully understand both the errors of my ways and why the proposed solution
> is the best approach. Despite the fact that perlretut and Wall's bible go
> to bed with me each night, discovering perlre will be a very helpful
> resource as perlretut is light on both \b and \E.
Of course if you haven't come across \b and \E, you aren't going to
know the "normal" way.
I think everyone "flails" at times, though with perl the documentation is>
> And yes, I am occasionally guilty of what I call programming "flailing"
> which is a bad practice of inserting code with only a vague notion of what
> the result may be. I think I can attribute this to spending many years in
> the marketing departments of large companies. ;) I generally do catch
> myself, however, as I do prefer to know what is going on.
of an amazingly high quality and hence there is little need to. Trial and
error can on occassions be a useful learning method - as long as you
take the time to learn why the things which failed failed, and why the
things which worked worked.
[snip quote of entire article]
You really shouldn't do that. Many of the most experienced and helpful
people here don't like it, and ignore posts from people who keep doing
it. Taking the time to trim the quoted text to only what is necessary
to give context to the reader will make your life easier later.
See the Posting Guidelines which are posted here frequently or on the web
at [url]http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html[/url] for
some other useful tips to making the most out of this newsgroup.
--
Sam Holden
Sam Holden Guest
-
Steve May #8
Re: Regex Head Scratcher
mooseshoes wrote:
<SNIP>> All:
>
> I've run into a dark alley with a regex and could use some assistance.
>
> I have a list of keywords as keys in a hash (eg. %keywords) and I am looking
> at each key to see if it can be matched in a string.
Other posts dealt with the issues you were having with your regular
expression.
However, one thing to keep in mind is that your list of key words
gets longer, using a regex to match against them in the string
gets less and less efficient. In other words, the regex approach
does not seem to scale well IF you are searching for multiple words.
It usually only takes a limited number of key words (two in some cases)
to iterate through before it becomes more effecient to split the string
and then check the resulting list against the keywords hash.
Something like:
my @list = split /\s/, $string;
for( @list ){
$keywords{$_} or next;
print "found one ( $_ )\n";
}
You might want to benchmark whatever you come up with and see for
yourself.
What I've found is that variations of the above will benchmark about
the same and regex time increases dramatically as the number of keys
increases.
Just a thought....
s.
Steve May Guest



Reply With Quote

