Professional Web Applications Themes

starting position of RE match - PERL Beginners

Is there an equivalent for "index" that uses regular expressions instead of exact string? I've been looking at index, pos, m//, and the corresponding "$" variables but nothing I've found so far does what I'm looking for. Specifically, what I'm trying to do is find all the starting locations of a RE match. For example, using an exact string match: $ perl -e '$foo="baaaab"; $re="aa" ; for ($bar=index($foo, $re); $bar >= 0 ; $bar=index($foo, $re, $bar+1)) { print $bar, "\t" } print "\n" ; ' 1 2 3 I'd like to do the same except use a regular expression. BTW, ...

  1. #1

    Default starting position of RE match


    Is there an equivalent for "index" that uses regular expressions
    instead of exact string?

    I've been looking at index, pos, m//, and the corresponding "$"
    variables but nothing I've found so far does what I'm looking for.
    Specifically, what I'm trying to do is find all the starting locations
    of a RE match. For example, using an exact string match:

    $ perl -e '$foo="baaaab"; $re="aa" ;
    for ($bar=index($foo, $re); $bar >= 0 ; $bar=index($foo,
    $re, $bar+1))
    { print $bar, "\t" }
    print "\n" ; '
    1 2 3

    I'd like to do the same except use a regular expression. BTW, notice
    that matches can overlap.

    Any thoughts or ideas?

    Regards,
    - Robert
    http://www.cwelug.org

    Robert Guest

  2. #2

    Default Re: starting position of RE match

    From: Robert Citek <calberkeley.org> 

    How about this:

    $s = "sasas dfgfgh asasas asedsase";

    while ($s =~ /\G.*?(?=sas)./g) {
    print "pos=",pos($s)-1, " = '",substr($s,pos($s)-1,3),"'\n";
    }

    the "sas" is the regexp being matched.

    The \G matches where the last match left off, the .*? skips as few
    characters as possible, the (?=) makes sure the regexp matches at
    that place, but doesn't move the position in string and the . at the
    end moves the position so that the next round doesn't find the same
    occurrence. That's also why I have to subtrct the 1 from the pos($s).

    You could also do this:

    while ($s =~ /\G.+?(?=sas)/g) {
    print "pos=",pos($s), " = '",substr($s,pos($s),3),"'\n";
    }

    Which looks a bit nicer, but it would miss the match at the very
    beginning of the string.

    HTH, Jenda
    ===== cz === http://Jenda.Krynicky.cz =====
    When it comes to wine, women and song, wizards are allowed
    to get drunk and croon as much as they like.
    -- Terry Pratchett in Sourcery

    Jenda Guest

  3. #3

    Default Re: starting position of RE match


    On Tuesday, Sep 21, 2004, at 17:17 US/Central, Jenda Krynicky wrote: 

    Thanks. Seems to work, although I'm still trying to grok it. I'll
    probably have questions later.

    Regards,
    - Robert
    http://www.cwelug.org

    Robert Guest

  4. #4

    Default Re: starting position of RE match

    From: Robert Citek <calberkeley.org> 
    >
    > Thanks. Seems to work, although I'm still trying to grok it. I'll
    > probably have questions later.[/ref]

    Let me try again then :-)

    If you use the /g option with a match evaluated in the scalar
    context, the match finds the first match only on the first round,
    then the next one next time it's evaluated and so forth:

    $s = "foo brkshr frt ty fgh fss";
    while ($s =~ /f(..)/g) {
    print "$1\n";
    }

    Each time it starts looking for the next match where the last one
    left off:

    $s = "foo brkshr fftr ty fgh fss";
    while ($s =~ /f(..)/g) {
    print "$1\n";
    }

    As you can see it found. foo, fft, fgh and fss, but skipped ftr
    because it starts before the end of the previous match.

    That's why I need the (?=). This instructs the regexp engine to check
    that the regexp inside the braces matches at the point but keep the
    pointer at the same place:

    $s = "faaf bar fbbfccf";
    while ($s =~ /f(..)(?=f)/g) {
    print "$1\n";
    }
    vs.
    $s = "faaf bar fbbfccf";
    while ($s =~ /f(..)f/g) {
    print "$1\n";
    }

    The regexp I gave you was unnecessarily complex. With /g the regexp
    starts automaticaly where it left off the last time so I do not need
    the \G.*? so I can write it as:

    while ($s =~ /(?=sas)./g) {
    print "pos=",pos($s)-1, " = '",substr($s,pos($s)-1,3),"'\n";
    }

    and it will mean exactly the same.

    And it seems the . at the end of the regexp and the -1 subtracted
    from the pos($s) is not needed either.

    Which means it's actually much easier than I had you believe:

    $s = "sasas dfgfgh asasas asedsase";
    while ($s =~ /(?=sas)/g) {
    print "pos=",pos($s), " = '",substr($s,pos($s),3),"'\n";
    }

    With the \G.*? I had to use the . at the end of the regexp to make
    sure the pointer gets moved just after the first character matched by
    the regexp, without it the pointer gets moved automaticaly.

    Try

    $s = "sasas dfgfgh asasas asedsase";
    while ($s =~ /\G.*?(?=sas)/g) {
    print "pos=",pos($s), " = '",substr($s,pos($s),3),"'\n";
    }

    As you can see it returns most matches twice. The reason is that Perl
    moves the pointer by as many characters as matched by the complete
    regexp or by one character is the match was zero size (keep in mind
    that the stuff in (?=) doesn't count!).

    So in the string it first match was "" at the very beginning of the
    string and the pointer was moved one char:
    s^asas dfgfgh asasas asedsase
    next match was "a" preceding the second "sas" and the pointer was
    moved one character to
    sa^sas dfgfgh asasas asedsase
    next match was empty and the pointer was moved one char:
    sas^as dfgfgh asasas asedsase
    the next match was "as dfgfgh a" and the pointer was moved to:
    sasas dfgfgh a^sasas asedsase
    the next match is again empty and the pointer is moved to:
    sasas dfgfgh as^asas asedsase
    and so forth.

    If we do not include the \G.*? we do not match the strings in between
    so the match is always empty, just before the searched stuff, we
    always move the pointer just after the first character of the stuff
    we looked for.

    Humpf, not sure I'm still making sense.

    HTH, Jenda





    ===== cz === http://Jenda.Krynicky.cz =====
    When it comes to wine, women and song, wizards are allowed
    to get drunk and croon as much as they like.
    -- Terry Pratchett in Sourcery

    Jenda Guest

  5. #5

    Default Re: starting position of RE match


    On Wednesday, Sep 22, 2004, at 08:05 US/Central, Jenda Krynicky wrote: 

    Based on your example, I was able to transform this:

    $ perl -e '$foo="baaaab"; $re="aa" ;
    for ($bar=index($foo, $re); $bar >= 0 ; $bar=index($foo,
    $re, $bar+1))
    { print $bar, "\t" }
    print "\n" ; '

    into this:

    $ perl -e '$foo="baaaab"; $re=qr/(?=aa)/ ;
    while($foo =~ /$re/g) { print pos($foo), "\t" }
    print "\n" ; '

    Works exactly as I had hoped, and I understand this one.

    Will study your other examples with \G. I still don't understand
    those. Will probably just take a little time and experimenting.

    Thanks for your help.

    Regards,
    - Robert
    OpenSource for Windows, Linux, and Mac OS/X
    http://www.cwelug.org/downloads

    Robert Guest

Similar Threads

  1. Replies: 1
    Last Post: October 22nd, 08:44 PM
  2. Relative Starting position
    By Developer504 in forum Macromedia Flex General Discussion
    Replies: 9
    Last Post: June 24th, 06:42 PM
  3. Get todays match...
    By RotterdamStudents in forum PHP Development
    Replies: 3
    Last Post: April 19th, 08:45 PM
  4. Why is this not a match?
    By Danield in forum PERL Beginners
    Replies: 1
    Last Post: January 24th, 07:16 PM
  5. How do I RegExp Match a ? without using \X{3F}?
    By Dan Anderson in forum PERL Beginners
    Replies: 2
    Last Post: October 2nd, 10:00 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139