Ask a Question related to PERL Miscellaneous, Design and Development.
-
Janek Schleicher #1
Re: regexp help
sjaak wrote at Thu, 26 Jun 2003 12:35:08 +0200:
Simple way:> Can anybody set me a little on the wright direction how to replace
> route.htm"></a> to generate
> route.htm">route</a>
>
> so
> .htm"></a>
> .html"></a>
>
> must replaced to
>
> .htm">route</a>
> .html">route</a>
s/(\.html?">)(</a>)/$1route$2/g;
If you do a lot of such stuff,
a HTML parser would be more useful.
Have you read> I don't know where to begin this with regexp.
perldoc perlre
?
You might also read the execellent> To avoid futher questions in this does anyone knows a good howto with many
> examples.
"Mastering Regular Expressions" book of J. Friedl
(read perldoc -q book for all details)
What combination?> Like this problem is a combination of regexp's and I don't understand them
> all so it's heard to do.
IHMO, regular expressions must only be understood one times deeply. The> I just need them ones a year that's why i don't stay in touch with regexp.
exact syntax is easy if the principle is understood. In fact, the exact
syntax differs very programs (vim, sed, grep, egrep, java, Perl).
Also, I believe using Perl would also indicate using regexps more often
than once a year.
Greetings,
Janek
Janek Schleicher Guest
-
[Q] little regexp challenge
hi, i want to do the following: replace the occurrances of an access of a hash (namely widgets) by an access to a instance variable in a piece... -
regexp help please!
I'm writing a PHP script inspired by smartypants and textile (but for PHP), which among other things does smart quoting. However, I want to avoid... -
i need a regexp
Hello, That's easy: $s = array ( "<br />\n\r<br />", "<br />\n<br />", "<br />\r<br />", "<br />\n\n\n\n\n\n<br />" ); foreach($s as $i) { -
regexp
Anton Arhipov wrote: You mean If you find any upper case word within quotes remove quotes right ? $str = 'abc "BLAH" aaa "eef" '; print... -
regexp help...
Hi all, I have a string that for all practical purposes should probably be a list (array). I need one line from the string and need to send the... -
fatted #2
Regexp help
I'd like to try and combine what I currently have in 4 regexps into
(maybe) one regexp.
I'm trying to parse data from a string, where the string is in the
format:
s_left *or* s_left = s_right.
rules:
s_right may have one or more equals in it.
In a
s_left = s_right
white space between the last/first non-white space char and the '='
char should be ignored (not part of s_left or s_right).
If s_left has a equals in it, s_left should be wrapped in double
quotes ("s_le=ft" = s_right)
Otherwise s_left is taken as beginning at the start of line, and
continuing up to the '=' char. (excluding white space between the last
non-white space from the sentence_left and the '=')
So, I have:
use warnings;
use strict;
# some test values
my @array = (
'"this = that"',
'log.log',
'start = "1"',
'sum=5+3=21',
'sql=SELECT * FROM table WHERE col17 = 2',
'"eq=als"=5=7',
'my name is = fatted',
'"1=1" = chicken'
);
foreach (@array)
{
unless( /=/ )
{
print "[zero] ".$_."\n";
}
elsif( /^"(.+)"$/ )
{
print "[one] ".$1."\n";
}
elsif( /^"([^"]*)[^=]*=\s*(.+)/ )
{
print "[two] ".$1."[]".$2."\n";
}
elsif( /^([^=]*)=\s*(.+)/ )
{
my $one = $1;
my $two = $2;
$one =~ s/\s+$//;
print "[three] ".$one."[]".$two."\n";
}
}
which gives:
[one] this = that
[zero] log.log
[three] start[]"1"
[three] sum[]5+3=21
[three] sql[]SELECT * FROM table WHERE col17 = 2
[two] eq=als[]5=7
[three] my name is[]fatted
[two] 1=1[]chicken
Which is what I require. But I'd now like to see if I could write the
regexp in one go, but I'm not having much luck, for instance:
one attempt to combine the last 2 regexps:
elsif( /^("|)([^"]*|[^=]*)[^=]*=\s*(.+)/)
{
# same processing as in last elsif block above
}
Doesn't work :( (not same results as above)
fatted Guest
-
Brian McCauley #3
Re: Regexp help
[email]obeseted@yahoo.com[/email] (fatted) writes:
foreach (@array)> I'd like to try and combine what I currently have in 4 regexps into
> (maybe) one regexp.
>
> I'm trying to parse data from a string, where the string is in the
> format:
> s_left *or* s_left = s_right.
> rules:
> s_right may have one or more equals in it.
> In a
> s_left = s_right
> white space between the last/first non-white space char and the '='
> char should be ignored (not part of s_left or s_right).
> If s_left has a equals in it, s_left should be wrapped in double
> quotes ("s_le=ft" = s_right)
> Otherwise s_left is taken as beginning at the start of line, and
> continuing up to the '=' char. (excluding white space between the last
> non-white space from the sentence_left and the '=')
>
> So, I have:
>
> use warnings;
> use strict;
>
> # some test values
> my @array = (
> '"this = that"',
> 'log.log',
> 'start = "1"',
> 'sum=5+3=21',
> 'sql=SELECT * FROM table WHERE col17 = 2',
> '"eq=als"=5=7',
> 'my name is = fatted',
> '"1=1" = chicken'
> );
>
> foreach (@array)
> {
> unless( /=/ )
> {
> print "[zero] ".$_."\n";
> }
> elsif( /^"(.+)"$/ )
> {
> print "[one] ".$1."\n";
> }
> elsif( /^"([^"]*)[^=]*=\s*(.+)/ )
> {
> print "[two] ".$1."[]".$2."\n";
> }
> elsif( /^([^=]*)=\s*(.+)/ )
> {
> my $one = $1;
> my $two = $2;
> $one =~ s/\s+$//;
> print "[three] ".$one."[]".$two."\n";
> }
> }
>
> which gives:
> [one] this = that
> [zero] log.log
> [three] start[]"1"
> [three] sum[]5+3=21
> [three] sql[]SELECT * FROM table WHERE col17 = 2
> [two] eq=als[]5=7
> [three] my name is[]fatted
> [two] 1=1[]chicken
>
> Which is what I require. But I'd now like to see if I could write the
> regexp in one go,
{
if ( my ( $left,$right) = /^(\"[^\"]+\"|[^\"][^=]+?)\s*(?:=\s*(.*)\s*)?$/ ) {
$left =~ s/^"(.*)"$/$1/;
if ( defined $right ) {
print "[two] $left\[]$right\n";
} else {
print "[one] $left\n";
}
}else {
# Invalid input!
print "[zero] $_\n";
}
}
Produeces:
[one] this = that
[one] log.log
[two] start[]"1"
[two] sum[]5+3=21
[two] sql[]SELECT * FROM table WHERE col17 = 2
[two] eq=als[]5=7
[two] my name is[]fatted
[two] 1=1[]chicken
Note: double quote does not need to be backslashed in // but it's a
courtesy to do so for the beniefit of people using broken
auto-indenters.
--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
Brian McCauley Guest
-
Anno Siegel #4
Re: Regexp help
Brian McCauley <nobull@mail.com> wrote in comp.lang.perl.misc:
I object.> Note: double quote does not need to be backslashed in // but it's a
> courtesy to do so for the beniefit of people using broken
> auto-indenters.
All auto-indenters, syntax-hilighters and their ilk are broken in some
respect. ("Only perl can parse Perl") If we start accommodating their
glitches, we'll end up with coding conventions that are completely
irrational from a Perl point of view.
Anno
Anno Siegel Guest
-
raven #5
Regexp Help
The output of lpstat -p on hpux returns this:
printer sacprn05 is idle. enabled since Jul 30 12:23
fence priority : 0
I am attempting to grab the printer name and whether or not it is
idle. The code to do this is:
if (/^printer (\w+).*(enabled|disabled)/)
Is there a more efficient way to obtain the desired information using
a Perl regexp?
raven Guest
-
John Bokma #6
Re: Regexp Help
raven wrote:
^^^^> The output of lpstat -p on hpux returns this:
>
> printer sacprn05 is idle. enabled since Jul 30 12:23
> fence priority : 0
>
> I am attempting to grab the printer name and whether or not it is
> idle. The code to do this is:
idle...
enabled/disabled... I am confused> if (/^printer (\w+).*(enabled|disabled)/)
Don't know if changing .* to .*? speeds things up.> Is there a more efficient way to obtain the desired information using
> a Perl regexp?
--
Kind regards, feel free to mail: mail(at)johnbokma.com (or reply)
virtual home: [url]http://johnbokma.com/[/url] ICQ: 218175426
John web site hints: [url]http://johnbokma.com/websitedesign/[/url]
John Bokma Guest
-
John Bokma #7
Re: Regexp Help
David Oswald wrote:
[snip]> "raven" <raven_riverwind@yahoo.com> wrote in message
>>>>The output of lpstat -p on hpux returns this:
>>>printer sacprn05 is idle. enabled since Jul 30 12:23
>> fence priority : 0
nope, 05 seems to suggest at least digits can appear in the name :-)> be too broad. For example, you might actually mean [a-zA-Z], or you might
Thanks Dave, this was very clear and educational!
--
Kind regards, feel free to mail: mail(at)johnbokma.com (or reply)
virtual home: [url]http://johnbokma.com/[/url] ICQ: 218175426
John web site hints: [url]http://johnbokma.com/websitedesign/[/url]
John Bokma Guest
-
Lao Coon #8
Re: Regexp Help
[email]raven_riverwind@yahoo.com[/email] (raven) wrote in
news:7270d1f8.0309050952.3154538e@posting.google.c om:
E.g. /^printer\s(\S+)/. Shorter is usually better, if you have little or> The output of lpstat -p on hpux returns this:
>
> printer sacprn05 is idle. enabled since Jul 30 12:23
> fence priority : 0
>
> I am attempting to grab the printer name and whether or not it is
> idle. The code to do this is:
>
> if (/^printer (\w+).*(enabled|disabled)/)
>
> Is there a more efficient way to obtain the desired information using
> a Perl regexp?
no variation in the input.
perl -MBenchmark -e "$a = 'printer sacprn05 is idle. enabled since Jul 30
12:23'; timethese(500000, {
'First' => '$a =~ /^printer (\w+).*(enabled|disabled)/;',
'Second' => '$a =~ /^printer\s(\S+)/;'
});"
Benchmark: timing 500000 iterations of First, Second...
First: 8 wallclock secs ( 7.19 usr + 0.02 sys = 7.21 CPU) @
69338.51/s (
n=500000)
Second: 1 wallclock secs ( 1.76 usr + 0.00 sys = 1.76 CPU) @
283607.49/s
(n=500000)
Lao Coon Guest
-
Jay Tilton #9
Re: Regexp Help
[email]raven_riverwind@yahoo.com[/email] (raven) wrote:
: The output of lpstat -p on hpux returns this:
:
: printer sacprn05 is idle. enabled since Jul 30 12:23
: fence priority : 0
:
: I am attempting to grab the printer name and whether or not it is
: idle. The code to do this is:
:
: if (/^printer (\w+).*(enabled|disabled)/)
:
: Is there a more efficient way to obtain the desired information using
: a Perl regexp?
What is your gauge of efficiency? I have to guess you want to
minimize runtime.
Why are you worrying about a match operation that takes microseconds
to perform? Unless it's going to happen a quadrillion times a day,
fuggetaboutit.
Anyway, the real time-killer will be invoking the external 'lpstat'
command.
Jay Tilton Guest
-
raven #10
Re: Regexp Help
Thanks for the enlightening reply. Sorry for being vague in my
specification. I am simply tracking the printers, their IPs and the
current disabled/enabled status. In the future I'm sure I'll add more
functionality. Below is the the output of the various commands and the
hack.
lpstat -vsacprn05
device for sacprn05: /dev/null
remote to: RNP6C8BF9 on 192.168.25.73
#!/usr/contrib/bin/perl -w
# Printer interfaces directory
my $interface = "/etc/lp/interface";
# Printer Array
my @printers = ();
# Get a list of printers
open PRINTER, "lpstat -p |";
while (<PRINTER>) {
if (/^printer (\w+).*?(enabled|disabled)/) {
# Push an anonymous hash REFERENCE onto the array
push @printers, { "name" => "$1", "ip" => "none", "status" =>
"$2" };
}
}
close PRINTER;
# Gather the IP address of each printer
foreach $printer (@printers) {
open PRINTER, "lpstat -v$$printer{name} |";
while (<PRINTER>) {
next if /^device.*/;
if (/^\s+remote.*?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})/) {
$$printer{ip} = $1;
last;
}
}
close PRINTER;
# We hope that the printers with 'none' for IP are JetDirect
if ($$printer{ip} =~ /none/) {
open INTERFACE, "$interfaces/$$printer{name}" || die
"Cannot open file!\n";
while (<INTERFACE>) {
if (/^PERIPH=(.*)/) {
$$printer{ip} = $1;
last;
}
}
close INTERFACE;
}
}
# Display the results
foreach $printer (@printers) {
print "$$printer{name} / $$printer{ip} / $$printer{status}\n";
}
raven Guest
-
John W. Krahn #11
Re: Regexp Help
raven wrote:
You should _ALWAYS_ verify that the pipe opened correctly.>
> Thanks for the enlightening reply. Sorry for being vague in my
> specification. I am simply tracking the printers, their IPs and the
> current disabled/enabled status. In the future I'm sure I'll add more
> functionality. Below is the the output of the various commands and the
> hack.
>
> lpstat -vsacprn05
> device for sacprn05: /dev/null
> remote to: RNP6C8BF9 on 192.168.25.73
>
> #!/usr/contrib/bin/perl -w
>
> # Printer interfaces directory
> my $interface = "/etc/lp/interface";
> # Printer Array
> my @printers = ();
>
> # Get a list of printers
> open PRINTER, "lpstat -p |";
Excessive quoting. The only thing that needs to be quoted is the string> while (<PRINTER>) {
> if (/^printer (\w+).*?(enabled|disabled)/) {
> # Push an anonymous hash REFERENCE onto the array
> push @printers, { "name" => "$1", "ip" => "none", "status" =>
> "$2" };
'none'.
You should _ALWAYS_ verify that the pipe closed correctly.> }
> }
> close PRINTER;
You should _ALWAYS_ verify that the pipe opened correctly.> # Gather the IP address of each printer
> foreach $printer (@printers) {
> open PRINTER, "lpstat -v$$printer{name} |";
^ ^ ^> while (<PRINTER>) {
> next if /^device.*/;
> if (/^\s+remote.*?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})/) {
^ ^ ^
Dots are special in regular expressions. You need to escape them to
match a literal dot character.
You should _ALWAYS_ verify that the pipe closed correctly.> $$printer{ip} = $1;
> last;
> }
> }
> close PRINTER;
Using the || operator there won't do what you expect. You need to> # We hope that the printers with 'none' for IP are JetDirect
> if ($$printer{ip} =~ /none/) {
> open INTERFACE, "$interfaces/$$printer{name}" || die
> "Cannot open file!\n";
either use the 'or' operator or parenthesize the open function.
> while (<INTERFACE>) {
> if (/^PERIPH=(.*)/) {
> $$printer{ip} = $1;
> last;
> }
> }
> close INTERFACE;
> }
> }
>
> # Display the results
> foreach $printer (@printers) {
> print "$$printer{name} / $$printer{ip} / $$printer{status}\n";
> }
It looks like you just need one loop for all that:
#!/usr/contrib/bin/perl -w
use strict;
# Printer interfaces directory
my $interface = '/etc/lp/interface';
# Printer Array
my @printers;
# Get a list of printers
open PRINTER, 'lpstat -p |' or die "Cannot open pipe from lpstat: $!";
while ( <PRINTER> ) {
next unless /^printer (\w+).*?((?:dis|en)abled)/;
# Push an anonymous hash REFERENCE onto the array
push @printers, { name => $1, ip => 'none', status => $2 };
if ( `lpstat -v $printers[-1]{name}` =~
/^\s+remote.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/m ) {
$printers[ -1 ]{ ip } = $1;
}
else {
local $/;
open INTERFACE, "$interfaces/$printers[-1]{name}"
or die "Cannot open $interfaces/$printers[-1]{name}: $!";
$printers[ -1 ]{ ip } = $1 if <INTERFACE> =~ /^PERIPH=(.*)/m;
close INTERFACE;
}
}
close PRINTER or die "Cannot close pipe from lpstat: $!";
# Display the results
foreach my $printer ( @printers ) {
print join( ' / ', @{$printer}{ qw/name ip status/ } ), "\n";
}
John
--
use Perl;
program
fulfillment
John W. Krahn Guest
-
Marcelo #12
Regexp help
Which regular expression would you use to remove the <title> and </title> from a line like this one:
<title>Here goes a webpage's title</title>
Thanks a lot in advance.
Marcelo Guest
-
Andrew Gaffney #13
Re: Regexp help
Marcelo wrote:
Try something like:> Which regular expression would you use to remove the <title> and </title> from a line like this one:
>
> <title>Here goes a webpage's title</title>
>
> Thanks a lot in advance.
s/<\/?title>//
Although, I can remember if < and > are special regex characters so you might need to
escape them (\< and \>).
--
Andrew Gaffney
System Administrator
Skyline Aeronautics, LLC.
776 North Bell Avenue
Chesterfield, MO 63005
636-357-1548
Andrew Gaffney Guest
-
John McKown #14
Re: Regexp help
On Sat, 24 Jan 2004, Marcelo wrote:
Did you what that _exact_ input? I.e. always <title>...</title>? If so,> Which regular expression would you use to remove the <title> and
> </title> from a line like this one:
>
> <title>Here goes a webpage's title</title>
>
> Thanks a lot in advance.
>
that's rather easy.
$line =~ s/<title>(.*)<\/title>/$1/
Now, if you want the more general form of <any_tag>...</any_tag>, that is
removing paired HTML tags, that's more difficult. Luckily, it is an
example in "Programming PERL, 3rd Edition" on page 184 which is close.
line =~ s/(<.*?>)(.*?)(?:</\1>)/$2/
In sort-of English. This says:
Match starting with a < and ending with the next >, calling it $1 (or \1).
Now, match everything up to the next < and call it $2. Now match a <
followed by a /, followed by what you matched first (in $1 or \1),
followed by a >. Now, replace all of that with $2.
A problem with this pattern is that it would not work as you would
like want it to with input such as:
<title><B>Title</B></title>
You'd end up removing the <B> and </B>, but leaving the <title> and
</title>. Of course, if your desire is to remove all paired HTML tags,
then put this in a loop until it no longer matches.
HTH,
--
Maranatha!
John McKown
John McKown Guest
-
Jan Eden #15
Re: Regexp help
John McKown wrote:
I remember reading that using regex to parse HTML is not reliable. You should use HTML::Parse from CPAN.>On Sat, 24 Jan 2004, Marcelo wrote:
>>>> Which regular expression would you use to remove the <title> and
>> </title> from a line like this one:
>>
>> <title>Here goes a webpage's title</title>
>>
>> Thanks a lot in advance.
>>
>Did you what that _exact_ input? I.e. always <title>...</title>? If so,
>that's rather easy.
>
>$line =~ s/<title>(.*)<\/title>/$1/
>
>Now, if you want the more general form of <any_tag>...</any_tag>, that is
>removing paired HTML tags, that's more difficult. Luckily, it is an
>example in "Programming PERL, 3rd Edition" on page 184 which is close.
>
>line =~ s/(<.*?>)(.*?)(?:</\1>)/$2/
HTH,
Jan
--
Either this man is dead or my watch has stopped. - Groucho Marx
Jan Eden Guest



Reply With Quote

