Ask a Question related to PERL Miscellaneous, Design and Development.

  1. #1

    Default Re: regexp help

    sjaak wrote at Thu, 26 Jun 2003 12:35:08 +0200:
    > Can anybody set me a little on the wright direction how to replace
    > route.htm"></a> to generate
    > route.htm">route</a>
    >
    > so
    > .htm"></a>
    > .html"></a>
    >
    > must replaced to
    >
    > .htm">route</a>
    > .html">route</a>
    Simple way:

    s/(\.html?">)(</a>)/$1route$2/g;

    If you do a lot of such stuff,
    a HTML parser would be more useful.
    > I don't know where to begin this with regexp.
    Have you read
    perldoc perlre
    ?
    > To avoid futher questions in this does anyone knows a good howto with many
    > examples.
    You might also read the execellent
    "Mastering Regular Expressions" book of J. Friedl
    (read perldoc -q book for all details)
    > Like this problem is a combination of regexp's and I don't understand them
    > all so it's heard to do.
    What combination?
    > I just need them ones a year that's why i don't stay in touch with regexp.
    IHMO, regular expressions must only be understood one times deeply. The
    exact syntax is easy if the principle is understood. In fact, the exact
    syntax differs very programs (vim, sed, grep, egrep, java, Perl).

    Also, I believe using Perl would also indicate using regexps more often
    than once a year.


    Greetings,
    Janek
    Janek Schleicher Guest

  2. Similar Questions and Discussions

    1. [Q] little regexp challenge
      hi, i want to do the following: replace the occurrances of an access of a hash (namely widgets) by an access to a instance variable in a piece...
    2. regexp help please!
      I'm writing a PHP script inspired by smartypants and textile (but for PHP), which among other things does smart quoting. However, I want to avoid...
    3. i need a regexp
      Hello, That's easy: $s = array ( "<br />\n\r<br />", "<br />\n<br />", "<br />\r<br />", "<br />\n\n\n\n\n\n<br />" ); foreach($s as $i) {
    4. regexp
      Anton Arhipov wrote: You mean If you find any upper case word within quotes remove quotes right ? $str = 'abc "BLAH" aaa "eef" '; print...
    5. regexp help...
      Hi all, I have a string that for all practical purposes should probably be a list (array). I need one line from the string and need to send the...
  3. #2

    Default Regexp help

    I'd like to try and combine what I currently have in 4 regexps into
    (maybe) one regexp.

    I'm trying to parse data from a string, where the string is in the
    format:
    s_left *or* s_left = s_right.
    rules:
    s_right may have one or more equals in it.
    In a
    s_left = s_right
    white space between the last/first non-white space char and the '='
    char should be ignored (not part of s_left or s_right).
    If s_left has a equals in it, s_left should be wrapped in double
    quotes ("s_le=ft" = s_right)
    Otherwise s_left is taken as beginning at the start of line, and
    continuing up to the '=' char. (excluding white space between the last
    non-white space from the sentence_left and the '=')

    So, I have:

    use warnings;
    use strict;

    # some test values
    my @array = (
    '"this = that"',
    'log.log',
    'start = "1"',
    'sum=5+3=21',
    'sql=SELECT * FROM table WHERE col17 = 2',
    '"eq=als"=5=7',
    'my name is = fatted',
    '"1=1" = chicken'
    );

    foreach (@array)
    {
    unless( /=/ )
    {
    print "[zero] ".$_."\n";
    }
    elsif( /^"(.+)"$/ )
    {
    print "[one] ".$1."\n";
    }
    elsif( /^"([^"]*)[^=]*=\s*(.+)/ )
    {
    print "[two] ".$1."[]".$2."\n";
    }
    elsif( /^([^=]*)=\s*(.+)/ )
    {
    my $one = $1;
    my $two = $2;
    $one =~ s/\s+$//;
    print "[three] ".$one."[]".$two."\n";
    }
    }

    which gives:
    [one] this = that
    [zero] log.log
    [three] start[]"1"
    [three] sum[]5+3=21
    [three] sql[]SELECT * FROM table WHERE col17 = 2
    [two] eq=als[]5=7
    [three] my name is[]fatted
    [two] 1=1[]chicken

    Which is what I require. But I'd now like to see if I could write the
    regexp in one go, but I'm not having much luck, for instance:

    one attempt to combine the last 2 regexps:

    elsif( /^("|)([^"]*|[^=]*)[^=]*=\s*(.+)/)
    {
    # same processing as in last elsif block above
    }

    Doesn't work :( (not same results as above)
    fatted Guest

  4. #3

    Default Re: Regexp help

    [email]obeseted@yahoo.com[/email] (fatted) writes:
    > I'd like to try and combine what I currently have in 4 regexps into
    > (maybe) one regexp.
    >
    > I'm trying to parse data from a string, where the string is in the
    > format:
    > s_left *or* s_left = s_right.
    > rules:
    > s_right may have one or more equals in it.
    > In a
    > s_left = s_right
    > white space between the last/first non-white space char and the '='
    > char should be ignored (not part of s_left or s_right).
    > If s_left has a equals in it, s_left should be wrapped in double
    > quotes ("s_le=ft" = s_right)
    > Otherwise s_left is taken as beginning at the start of line, and
    > continuing up to the '=' char. (excluding white space between the last
    > non-white space from the sentence_left and the '=')
    >
    > So, I have:
    >
    > use warnings;
    > use strict;
    >
    > # some test values
    > my @array = (
    > '"this = that"',
    > 'log.log',
    > 'start = "1"',
    > 'sum=5+3=21',
    > 'sql=SELECT * FROM table WHERE col17 = 2',
    > '"eq=als"=5=7',
    > 'my name is = fatted',
    > '"1=1" = chicken'
    > );
    >
    > foreach (@array)
    > {
    > unless( /=/ )
    > {
    > print "[zero] ".$_."\n";
    > }
    > elsif( /^"(.+)"$/ )
    > {
    > print "[one] ".$1."\n";
    > }
    > elsif( /^"([^"]*)[^=]*=\s*(.+)/ )
    > {
    > print "[two] ".$1."[]".$2."\n";
    > }
    > elsif( /^([^=]*)=\s*(.+)/ )
    > {
    > my $one = $1;
    > my $two = $2;
    > $one =~ s/\s+$//;
    > print "[three] ".$one."[]".$two."\n";
    > }
    > }
    >
    > which gives:
    > [one] this = that
    > [zero] log.log
    > [three] start[]"1"
    > [three] sum[]5+3=21
    > [three] sql[]SELECT * FROM table WHERE col17 = 2
    > [two] eq=als[]5=7
    > [three] my name is[]fatted
    > [two] 1=1[]chicken
    >
    > Which is what I require. But I'd now like to see if I could write the
    > regexp in one go,
    foreach (@array)
    {
    if ( my ( $left,$right) = /^(\"[^\"]+\"|[^\"][^=]+?)\s*(?:=\s*(.*)\s*)?$/ ) {
    $left =~ s/^"(.*)"$/$1/;
    if ( defined $right ) {
    print "[two] $left\[]$right\n";
    } else {
    print "[one] $left\n";
    }
    }else {
    # Invalid input!
    print "[zero] $_\n";
    }
    }

    Produeces:

    [one] this = that
    [one] log.log
    [two] start[]"1"
    [two] sum[]5+3=21
    [two] sql[]SELECT * FROM table WHERE col17 = 2
    [two] eq=als[]5=7
    [two] my name is[]fatted
    [two] 1=1[]chicken

    Note: double quote does not need to be backslashed in // but it's a
    courtesy to do so for the beniefit of people using broken
    auto-indenters.

    --
    \\ ( )
    . _\\__[oo
    .__/ \\ /\@
    . l___\\
    # ll l\\
    ###LL LL\\
    Brian McCauley Guest

  5. #4

    Default Re: Regexp help

    Brian McCauley <nobull@mail.com> wrote in comp.lang.perl.misc:
    > Note: double quote does not need to be backslashed in // but it's a
    > courtesy to do so for the beniefit of people using broken
    > auto-indenters.
    I object.

    All auto-indenters, syntax-hilighters and their ilk are broken in some
    respect. ("Only perl can parse Perl") If we start accommodating their
    glitches, we'll end up with coding conventions that are completely
    irrational from a Perl point of view.

    Anno
    Anno Siegel Guest

  6. #5

    Default Regexp Help

    The output of lpstat -p on hpux returns this:

    printer sacprn05 is idle. enabled since Jul 30 12:23
    fence priority : 0

    I am attempting to grab the printer name and whether or not it is
    idle. The code to do this is:

    if (/^printer (\w+).*(enabled|disabled)/)

    Is there a more efficient way to obtain the desired information using
    a Perl regexp?
    raven Guest

  7. #6

    Default Re: Regexp Help

    raven wrote:
    > The output of lpstat -p on hpux returns this:
    >
    > printer sacprn05 is idle. enabled since Jul 30 12:23
    > fence priority : 0
    >
    > I am attempting to grab the printer name and whether or not it is
    > idle. The code to do this is:
    ^^^^

    idle...
    > if (/^printer (\w+).*(enabled|disabled)/)
    enabled/disabled... I am confused
    > Is there a more efficient way to obtain the desired information using
    > a Perl regexp?
    Don't know if changing .* to .*? speeds things up.

    --
    Kind regards, feel free to mail: mail(at)johnbokma.com (or reply)
    virtual home: [url]http://johnbokma.com/[/url] ICQ: 218175426
    John web site hints: [url]http://johnbokma.com/websitedesign/[/url]

    John Bokma Guest

  8. #7

    Default Re: Regexp Help

    David Oswald wrote:
    > "raven" <raven_riverwind@yahoo.com> wrote in message
    >
    >>The output of lpstat -p on hpux returns this:
    >
    >
    >>printer sacprn05 is idle. enabled since Jul 30 12:23
    >> fence priority : 0
    [snip]
    > be too broad. For example, you might actually mean [a-zA-Z], or you might
    nope, 05 seems to suggest at least digits can appear in the name :-)

    Thanks Dave, this was very clear and educational!

    --
    Kind regards, feel free to mail: mail(at)johnbokma.com (or reply)
    virtual home: [url]http://johnbokma.com/[/url] ICQ: 218175426
    John web site hints: [url]http://johnbokma.com/websitedesign/[/url]

    John Bokma Guest

  9. #8

    Default Re: Regexp Help

    [email]raven_riverwind@yahoo.com[/email] (raven) wrote in
    news:7270d1f8.0309050952.3154538e@posting.google.c om:
    > The output of lpstat -p on hpux returns this:
    >
    > printer sacprn05 is idle. enabled since Jul 30 12:23
    > fence priority : 0
    >
    > I am attempting to grab the printer name and whether or not it is
    > idle. The code to do this is:
    >
    > if (/^printer (\w+).*(enabled|disabled)/)
    >
    > Is there a more efficient way to obtain the desired information using
    > a Perl regexp?
    E.g. /^printer\s(\S+)/. Shorter is usually better, if you have little or
    no variation in the input.

    perl -MBenchmark -e "$a = 'printer sacprn05 is idle. enabled since Jul 30
    12:23'; timethese(500000, {

    'First' => '$a =~ /^printer (\w+).*(enabled|disabled)/;',

    'Second' => '$a =~ /^printer\s(\S+)/;'

    });"

    Benchmark: timing 500000 iterations of First, Second...
    First: 8 wallclock secs ( 7.19 usr + 0.02 sys = 7.21 CPU) @
    69338.51/s (
    n=500000)
    Second: 1 wallclock secs ( 1.76 usr + 0.00 sys = 1.76 CPU) @
    283607.49/s
    (n=500000)
    Lao Coon Guest

  10. #9

    Default Re: Regexp Help

    [email]raven_riverwind@yahoo.com[/email] (raven) wrote:

    : The output of lpstat -p on hpux returns this:
    :
    : printer sacprn05 is idle. enabled since Jul 30 12:23
    : fence priority : 0
    :
    : I am attempting to grab the printer name and whether or not it is
    : idle. The code to do this is:
    :
    : if (/^printer (\w+).*(enabled|disabled)/)
    :
    : Is there a more efficient way to obtain the desired information using
    : a Perl regexp?

    What is your gauge of efficiency? I have to guess you want to
    minimize runtime.

    Why are you worrying about a match operation that takes microseconds
    to perform? Unless it's going to happen a quadrillion times a day,
    fuggetaboutit.

    Anyway, the real time-killer will be invoking the external 'lpstat'
    command.

    Jay Tilton Guest

  11. #10

    Default Re: Regexp Help

    Thanks for the enlightening reply. Sorry for being vague in my
    specification. I am simply tracking the printers, their IPs and the
    current disabled/enabled status. In the future I'm sure I'll add more
    functionality. Below is the the output of the various commands and the
    hack.

    lpstat -vsacprn05
    device for sacprn05: /dev/null
    remote to: RNP6C8BF9 on 192.168.25.73


    #!/usr/contrib/bin/perl -w

    # Printer interfaces directory
    my $interface = "/etc/lp/interface";
    # Printer Array
    my @printers = ();

    # Get a list of printers
    open PRINTER, "lpstat -p |";
    while (<PRINTER>) {
    if (/^printer (\w+).*?(enabled|disabled)/) {
    # Push an anonymous hash REFERENCE onto the array
    push @printers, { "name" => "$1", "ip" => "none", "status" =>
    "$2" };
    }
    }
    close PRINTER;

    # Gather the IP address of each printer
    foreach $printer (@printers) {
    open PRINTER, "lpstat -v$$printer{name} |";
    while (<PRINTER>) {
    next if /^device.*/;
    if (/^\s+remote.*?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})/) {
    $$printer{ip} = $1;
    last;
    }
    }
    close PRINTER;
    # We hope that the printers with 'none' for IP are JetDirect
    if ($$printer{ip} =~ /none/) {
    open INTERFACE, "$interfaces/$$printer{name}" || die
    "Cannot open file!\n";
    while (<INTERFACE>) {
    if (/^PERIPH=(.*)/) {
    $$printer{ip} = $1;
    last;
    }
    }
    close INTERFACE;
    }
    }

    # Display the results
    foreach $printer (@printers) {
    print "$$printer{name} / $$printer{ip} / $$printer{status}\n";
    }
    raven Guest

  12. #11

    Default Re: Regexp Help

    raven wrote:
    >
    > Thanks for the enlightening reply. Sorry for being vague in my
    > specification. I am simply tracking the printers, their IPs and the
    > current disabled/enabled status. In the future I'm sure I'll add more
    > functionality. Below is the the output of the various commands and the
    > hack.
    >
    > lpstat -vsacprn05
    > device for sacprn05: /dev/null
    > remote to: RNP6C8BF9 on 192.168.25.73
    >
    > #!/usr/contrib/bin/perl -w
    >
    > # Printer interfaces directory
    > my $interface = "/etc/lp/interface";
    > # Printer Array
    > my @printers = ();
    >
    > # Get a list of printers
    > open PRINTER, "lpstat -p |";
    You should _ALWAYS_ verify that the pipe opened correctly.

    > while (<PRINTER>) {
    > if (/^printer (\w+).*?(enabled|disabled)/) {
    > # Push an anonymous hash REFERENCE onto the array
    > push @printers, { "name" => "$1", "ip" => "none", "status" =>
    > "$2" };
    Excessive quoting. The only thing that needs to be quoted is the string
    'none'.

    > }
    > }
    > close PRINTER;
    You should _ALWAYS_ verify that the pipe closed correctly.

    > # Gather the IP address of each printer
    > foreach $printer (@printers) {
    > open PRINTER, "lpstat -v$$printer{name} |";
    You should _ALWAYS_ verify that the pipe opened correctly.

    > while (<PRINTER>) {
    > next if /^device.*/;
    > if (/^\s+remote.*?(\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})/) {
    ^ ^ ^
    ^ ^ ^
    Dots are special in regular expressions. You need to escape them to
    match a literal dot character.

    > $$printer{ip} = $1;
    > last;
    > }
    > }
    > close PRINTER;
    You should _ALWAYS_ verify that the pipe closed correctly.

    > # We hope that the printers with 'none' for IP are JetDirect
    > if ($$printer{ip} =~ /none/) {
    > open INTERFACE, "$interfaces/$$printer{name}" || die
    > "Cannot open file!\n";
    Using the || operator there won't do what you expect. You need to
    either use the 'or' operator or parenthesize the open function.

    > while (<INTERFACE>) {
    > if (/^PERIPH=(.*)/) {
    > $$printer{ip} = $1;
    > last;
    > }
    > }
    > close INTERFACE;
    > }
    > }
    >
    > # Display the results
    > foreach $printer (@printers) {
    > print "$$printer{name} / $$printer{ip} / $$printer{status}\n";
    > }

    It looks like you just need one loop for all that:

    #!/usr/contrib/bin/perl -w
    use strict;

    # Printer interfaces directory
    my $interface = '/etc/lp/interface';
    # Printer Array
    my @printers;

    # Get a list of printers
    open PRINTER, 'lpstat -p |' or die "Cannot open pipe from lpstat: $!";
    while ( <PRINTER> ) {

    next unless /^printer (\w+).*?((?:dis|en)abled)/;
    # Push an anonymous hash REFERENCE onto the array
    push @printers, { name => $1, ip => 'none', status => $2 };

    if ( `lpstat -v $printers[-1]{name}` =~
    /^\s+remote.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/m ) {
    $printers[ -1 ]{ ip } = $1;
    }
    else {
    local $/;
    open INTERFACE, "$interfaces/$printers[-1]{name}"
    or die "Cannot open $interfaces/$printers[-1]{name}: $!";
    $printers[ -1 ]{ ip } = $1 if <INTERFACE> =~ /^PERIPH=(.*)/m;
    close INTERFACE;
    }
    }
    close PRINTER or die "Cannot close pipe from lpstat: $!";

    # Display the results
    foreach my $printer ( @printers ) {
    print join( ' / ', @{$printer}{ qw/name ip status/ } ), "\n";
    }



    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn Guest

  13. #12

    Default Regexp help

    Which regular expression would you use to remove the <title> and </title> from a line like this one:

    <title>Here goes a webpage's title</title>

    Thanks a lot in advance.

    Marcelo Guest

  14. #13

    Default Re: Regexp help

    Marcelo wrote:
    > Which regular expression would you use to remove the <title> and </title> from a line like this one:
    >
    > <title>Here goes a webpage's title</title>
    >
    > Thanks a lot in advance.
    Try something like:

    s/<\/?title>//

    Although, I can remember if < and > are special regex characters so you might need to
    escape them (\< and \>).

    --
    Andrew Gaffney
    System Administrator
    Skyline Aeronautics, LLC.
    776 North Bell Avenue
    Chesterfield, MO 63005
    636-357-1548

    Andrew Gaffney Guest

  15. #14

    Default Re: Regexp help

    On Sat, 24 Jan 2004, Marcelo wrote:
    > Which regular expression would you use to remove the <title> and
    > </title> from a line like this one:
    >
    > <title>Here goes a webpage's title</title>
    >
    > Thanks a lot in advance.
    >
    Did you what that _exact_ input? I.e. always <title>...</title>? If so,
    that's rather easy.

    $line =~ s/<title>(.*)<\/title>/$1/

    Now, if you want the more general form of <any_tag>...</any_tag>, that is
    removing paired HTML tags, that's more difficult. Luckily, it is an
    example in "Programming PERL, 3rd Edition" on page 184 which is close.

    line =~ s/(<.*?>)(.*?)(?:</\1>)/$2/

    In sort-of English. This says:

    Match starting with a < and ending with the next >, calling it $1 (or \1).
    Now, match everything up to the next < and call it $2. Now match a <
    followed by a /, followed by what you matched first (in $1 or \1),
    followed by a >. Now, replace all of that with $2.

    A problem with this pattern is that it would not work as you would
    like want it to with input such as:

    <title><B>Title</B></title>

    You'd end up removing the <B> and </B>, but leaving the <title> and
    </title>. Of course, if your desire is to remove all paired HTML tags,
    then put this in a loop until it no longer matches.

    HTH,

    --
    Maranatha!
    John McKown


    John McKown Guest

  16. #15

    Default Re: Regexp help

    John McKown wrote:
    >On Sat, 24 Jan 2004, Marcelo wrote:
    >
    >> Which regular expression would you use to remove the <title> and
    >> </title> from a line like this one:
    >>
    >> <title>Here goes a webpage's title</title>
    >>
    >> Thanks a lot in advance.
    >>
    >
    >Did you what that _exact_ input? I.e. always <title>...</title>? If so,
    >that's rather easy.
    >
    >$line =~ s/<title>(.*)<\/title>/$1/
    >
    >Now, if you want the more general form of <any_tag>...</any_tag>, that is
    >removing paired HTML tags, that's more difficult. Luckily, it is an
    >example in "Programming PERL, 3rd Edition" on page 184 which is close.
    >
    >line =~ s/(<.*?>)(.*?)(?:</\1>)/$2/
    I remember reading that using regex to parse HTML is not reliable. You should use HTML::Parse from CPAN.

    HTH,

    Jan
    --
    Either this man is dead or my watch has stopped. - Groucho Marx
    Jan Eden Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139