Ask a Question related to PERL Miscellaneous, Design and Development.
-
Geoff Cox #1
regex help!
Hello,
I am trying to extract email addresses from about 1000 htm files.
So far am trying
if ($line =~ /Mailto:(.*)"/ {
print OUT ("$1 \n");
where the line is
<a href="mailto:fred@aol.com"
problem is with the " after the email address and the "greedy" regex
characteristic which finds other " further along the line ...
can I stop at the first " mark?
Cheers
Geoff
Geoff Cox Guest
-
Regex help
I'd like to replace any html tags containing "< >" with a space. For example, <TR VALIGN=TOP>, I'd like to replace that with a space. Is there a... -
REGEX help pls
in the regex buddy they are explaining: "Be careful when using the negated shorthands inside square brackets. is not the same as . The latter... -
Regex..
Could some good samaritan help me out with this pls... I am trying to find a regular expression for the below string.. ExchangeName =... -
Need help with regex
> I have a directory of files that I want to move to another directory. -
IP regex?
Gareth Glaccum wrote: How about using m/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/ and testing $1 - $4 for compliance? Much cleaner. -- Cheers, -
Andreas Kahari #2
Re: regex help!
In article <lle5mvgf13133a618g0s3h560bpi1c535e@4ax.com>, Geoff Cox wrote:
E-mail address harvesting on your spare time, are you?> Hello,
>
> I am trying to extract email addresses from about 1000 htm files.
[cut]> if ($line =~ /Mailto:(.*)"/ {
> print OUT ("$1 \n");Read the perlre manual about changing the "greediness" of a> problem is with the " after the email address and the "greedy" regex
> characteristic which finds other " further along the line ...
quantifier with "?".
--
Andreas Kähäri
Andreas Kahari Guest
-
Michael Budash #3
Re: regex help!
In article <lle5mvgf13133a618g0s3h560bpi1c535e@4ax.com>,
Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:
/Mailto:(.*?)"/> Hello,
>
> I am trying to extract email addresses from about 1000 htm files.
>
> So far am trying
>
> if ($line =~ /Mailto:(.*)"/ {
> print OUT ("$1 \n");
>
> where the line is
>
> <a href="mailto:fred@aol.com"
>
> problem is with the " after the email address and the "greedy" regex
> characteristic which finds other " further along the line ...
>
> can I stop at the first " mark?
you know that won't match your example don't you? unless you add the 'i'
flag (for 'i'gnore case):
/Mailto:(.*?)"/i
hth-
--
Michael Budash
Michael Budash Guest
-
Geoff Cox #4
Re: regex help!
On Sat, 13 Sep 2003 07:33:31 GMT, Michael Budash <mbudash@sonic.net>
wrote:
Michael,>/Mailto:(.*?)"/
>
>you know that won't match your example don't you? unless you add the 'i'
>flag (for 'i'gnore case):
Thanks for the help - following code works now but I get the error
message "uninitialized value in string ne at ... the line with a **
below - do you knwo why?
Cheers
Geoff
use warnings;
use strict;
use File::Find;
open (OUT, ">>out");
my $dir = 'c:/atemp1/directory';
find ( sub {
open (IN, "$_");
my $line = <IN>;
** while ($line ne "") {
if ($line =~ /Mailto:(.*?)"/i) {
print OUT ("$1 \n");
}
$line = <IN>;
}
}, $dir);
close (OUT);
>
>/Mailto:(.*?)"/i
>
>hth-Geoff Cox Guest
-
Andreas Kahari #5
Re: regex help!
In article <sfk5mvo202ccnl1b8tjv634fut1qvdo1nf@4ax.com>, Geoff Cox wrote:
[cut][cut]> Thanks for the help - following code works now but I get the error
> message "uninitialized value in string ne at ... the line with a **
> below - do you knwo why?[cut]> open (IN, "$_");
> my $line = <IN>;
> ** while ($line ne "") {
> if ($line =~ /Mailto:(.*?)"/i) {
> print OUT ("$1 \n");
What happens at the end of a file? Well, <IN> will give you an
undefined value. This will also happen if the open() call failed.
--
Andreas Kähäri
Andreas Kahari Guest
-
Geoff Cox #6
Re: regex help!
On Sat, 13 Sep 2003 08:21:39 +0000 (UTC), Andreas Kahari
<ak+usenet@freeshell.org> wrote:
Andreas,>In article <sfk5mvo202ccnl1b8tjv634fut1qvdo1nf@4ax.com>, Geoff Cox wrote:
>[cut]>[cut]>> Thanks for the help - following code works now but I get the error
>> message "uninitialized value in string ne at ... the line with a **
>> below - do you knwo why?>[cut]>> open (IN, "$_");
>> my $line = <IN>;
>> ** while ($line ne "") {
>> if ($line =~ /Mailto:(.*?)"/i) {
>> print OUT ("$1 \n");
>
>
>What happens at the end of a file? Well, <IN> will give you an
>undefined value. This will also happen if the open() call failed.
ah! well the open call works so must be the end of file part - is
there a better way than using while ($line ne "" ) ? eof?
Geoff
Geoff Cox Guest
-
Andreas Kahari #7
Re: regex help!
In article <28l5mvovmurpbjk33goktp83ee2iv9e06f@4ax.com>, Geoff Cox wrote:
[cut]> On Sat, 13 Sep 2003 08:21:39 +0000 (UTC), Andreas Kahari
><ak+usenet@freeshell.org> wrote:>>In article <sfk5mvo202ccnl1b8tjv634fut1qvdo1nf@4ax.com>, Geoff Cox wrote:Yes, a much much better way:>>>[cut]>>> open (IN, "$_");
>>> my $line = <IN>;
>>> ** while ($line ne "") {
>>> if ($line =~ /Mailto:(.*?)"/i) {
>>> print OUT ("$1 \n");
>>
>>
>>What happens at the end of a file? Well, <IN> will give you an
>>undefined value. This will also happen if the open() call failed.
> Andreas,
>
> ah! well the open call works so must be the end of file part - is
> there a better way than using while ($line ne "" ) ? eof?
while(defined($line = <IN>)) {
... code ...
}
And personally I would say
open(IN, $_) or die "Failed in open(): $!";
Cheers,
Andreas
--
Andreas Kähäri
Andreas Kahari Guest
-
Geoff Cox #8
Re: regex help!
On Sat, 13 Sep 2003 08:39:03 +0000 (UTC), Andreas Kahari
<ak+usenet@freeshell.org> wrote:
will use both - thanks!>Yes, a much much better way:
>
> while(defined($line = <IN>)) {
> ... code ...
> }
>
>And personally I would say
>
> open(IN, $_) or die "Failed in open(): $!";
Geoff
>
>
>Cheers,
>AndreasGeoff Cox Guest
-
Tad McClellan #9
Re: regex help!
Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:
> On Sat, 13 Sep 2003 08:39:03 +0000 (UTC), Andreas Kahari
><ak+usenet@freeshell.org> wrote:>> while(defined($line = <IN>)) {
I like this better:
while ( my $line = <IN> ) {
>>> ... code ...
>> }
>>
>>And personally I would say
>>
>> open(IN, $_) or die "Failed in open(): $!";
> will use both - thanks!
If you read the docs for the function that you used, then you
would have already known to check open()'s return value.
(there is a general-purpose lesson there...)
perldoc -f open
Open returns nonzero upon success, the undefined value otherwise.
...
When opening a file, it's usually a bad idea to continue normal execution
if the request failed, so C<open> is frequently used in connection with
C<die>. Even if C<die> won't do what you want (say, in a CGI script,
where you want to make a nicely formatted error message (but there are
modules that can help with that problem)) you should always check
the return value from opening a file.
--
Tad McClellan SGML consulting
[email]tadmc@augustmail.com[/email] Perl programming
Fort Worth, Texas
Tad McClellan Guest
-
Eric J. Roode #10
Re: regex help!
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Geoff Cox <geoff.cox@blueyonder.co.uk> wrote in
news:lle5mvgf13133a618g0s3h560bpi1c535e@4ax.com:
Change your thinking a bit. Instead of matching "Mailto:" followed by as> I am trying to extract email addresses from about 1000 htm files.
>
> So far am trying
>
> if ($line =~ /Mailto:(.*)"/ {
> print OUT ("$1 \n");
>
> where the line is
>
> <a href="mailto:fred@aol.com"
>
> problem is with the " after the email address and the "greedy" regex
> characteristic which finds other " further along the line ...
>
> can I stop at the first " mark?
many characters as possible followed by a quote, match "Mailto:" followed
by as many non-quote characters as possible followed by a quote:
if ($line =~ /Mailto:([^"]*)"/)
Also consider making it case-insensitive with the i modifier.
- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print
-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>
iQA/AwUBP2MoO2PeouIeTNHoEQIdtACgxV2WliWoH07gZaS39JHGdb 1q+wAAn1f6
oXom0J4O85KppYwOysICYuZs
=yU+G
-----END PGP SIGNATURE-----
Eric J. Roode Guest
-
Geoff Cox #11
Re: regex help!
On Sat, 13 Sep 2003 09:22:06 -0500, "Eric J. Roode"
<REMOVEsdnCAPS@comcast.net> wrote:
Thanks Eric - will give it a try...>Change your thinking a bit. Instead of matching "Mailto:" followed by as
>many characters as possible followed by a quote, match "Mailto:" followed
>by as many non-quote characters as possible followed by a quote:
>
> if ($line =~ /Mailto:([^"]*)"/)
Cheers
Geoff
>
>Also consider making it case-insensitive with the i modifier.Geoff Cox Guest



Reply With Quote

