Ask a Question related to PERL Miscellaneous, Design and Development.

  1. #1

    Default emulating @+ and @-

    Hi,

    I am currently working on something that should be backward-compatible
    at least up to 5.00503. Unfortunately, this relies on @+ and @- which
    weren't there by that time. So I have to find a way to create and
    populate these two arrays with 5.00503. Consider the original code that
    assumes that @(+|-) exist:

    sub bla {
    my ($string, $pat, $code) = @_
    while ($string =~ /$pat/g) {
    $code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
    map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
    };
    }

    This is essentially what Ruby's scan() does: Scan a string for a pattern
    and call the code via the reference. If the pattern contains
    subpatterns, call it like

    $code->($1, $2, ...);

    otherwise (that means, no captured subpatterns) do

    $code->($&);

    This means, I need the whole of @- and @+, and not just the first
    element of each two. My question is specifically about generating
    elements 1 to $#-. My current solution:

    while ($string =~ /($pat)/g) {
    @- = @+ = (); # clear previous match offsets
    # populates $-[0] and $+[0]
    push @-, index($string, $1);
    push @+, pos($string);

    # fill @-[1..#@-] and @+[1..#@+]
    my $digit = 2; # $1 is the whole match
    while () {
    no strict 'refs';
    if (defined $$digit) {
    # extract offsets of $$digit
    ...
    $digit++;
    } else {
    last;
    }
    }

    $code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
    map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
    }

    The part I'm uneasy about is

    if (defined $$digit) {
    ...
    }

    Specifically, can $2 be undefined but $3 still contain a submatch? I
    vaguely remember that I had such cases but I can't reproduce them now.
    If they exist, I can't use the above code and need something better. If
    so, what would be a correct solution?

    Thanks in advance for any pointers,
    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
    Tassilo v. Parseval Guest

  2. Similar Questions and Discussions

    1. Printers suited for emulating engraved look & feel?
      Hello. I am looking for a prosumer level printer (low-volume, high-quality, small-format) that would be well suited to creating an engraved look,...
  3. #2

    Default Re: emulating @+ and @-

    [A complimentary Cc of this posting was sent to
    Tassilo v. Parseval
    <tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhh5k$3ib$1@nets3.rz.RWTH-Aachen.DE>:
    > push @-, index($string, $1);
    > push @+, pos($string);
    > $code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
    > map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
    ??? What is the point of finding indices, then calling substr? Why
    not use $$_ directly?
    > Specifically, can $2 be undefined but $3 still contain a submatch?
    Of course:

    (a)?(b)

    Hope this helps,
    Ilya
    Ilya Zakharevich Guest

  4. #3

    Default Re: emulating @+ and @-

    Tassilo v. Parseval <tassilo.parseval@post.rwth-aachen.de> wrote in comp.lang.perl.misc:
    > Hi,
    >
    > I am currently working on something that should be backward-compatible
    > at least up to 5.00503. Unfortunately, this relies on @+ and @- which
    > weren't there by that time. So I have to find a way to create and
    > populate these two arrays with 5.00503. Consider the original code that
    [$string =~ /($pat)/g]
    > Specifically, can $2 be undefined but $3 still contain a submatch? I
    > vaguely remember that I had such cases but I can't reproduce them now.
    > If they exist, I can't use the above code and need something better. If
    > so, what would be a correct solution?
    I have vaguely asked myself that too.

    It is my impression that after a successful match $1 ... $n are
    always defined where n is the number of capturing parentheses in $pat.
    Even if a submatch doesn't apply (as in an alternation), the corresponding
    $i is empty, but defined.

    The hard part is finding where this is documented. I can't.

    Anno
    Anno Siegel Guest

  5. #4

    Default Re: emulating @+ and @-

    Also sprach Ilya Zakharevich:
    ><tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhh5k$3ib$1@nets3.rz.RWTH-Aachen.DE>:
    >> push @-, index($string, $1);
    >> push @+, pos($string);
    >
    >> $code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
    >> map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
    >
    > ??? What is the point of finding indices, then calling substr? Why
    > not use $$_ directly?
    Copy and paste. I have the ordinary function scan() and the backward
    compatible one scan5005003() and I do a

    if ($] < 5.006) {
    *scan = \&scan500503;
    }

    to replace the version not suitable for this perl. So the
    backward-compatible ones are just a copy of the ordinary functions plus
    the population of @- and @+.

    Maybe I am going to reimplement it later by directly using $$digit, but
    that's quite a bit of work since I have around five or six of these
    duplicate functions. But I am not yet sure whether I should do that
    because then a user cannot use @- and @+ in his code references (as he
    could now).
    >> Specifically, can $2 be undefined but $3 still contain a submatch?
    >
    > Of course:
    >
    > (a)?(b)
    >
    > Hope this helps,
    It does, thank you. It means I have to find a different way to
    figure out how many submatches exist. :-(

    What about perl_get_sv? My code is 90% XS anyway, so I can just as
    easily add more of it. Would this be reliable?

    int
    num_submatches ()
    PREINIT:
    int i = 0;
    char *digit;
    int len = 1;
    CODE:
    New(0, digit, 2, char);
    digit = "2";
    while (1) {
    if (!sv_get(digit, FALSE))
    break;
    /* next() already exists elsewhere and
    * increments/grows the string accordingly */
    next(&digit, &len);
    i++;
    }
    RETVAL = i;
    OUTPUT:
    RETVAL

    It assumes that $4 does not exist when $3 was the last succesful submatch.
    But I don't know whether perl really destroys all the digit variables
    that are larger than the highest submatch.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
    Tassilo v. Parseval Guest

  6. #5

    Default Re: emulating @+ and @-

    [A complimentary Cc of this posting was sent to
    Tassilo v. Parseval
    <tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhmfj$d9l$1@nets3.rz.RWTH-Aachen.DE>:
    > What about perl_get_sv? My code is 90% XS anyway, so I can just as
    > easily add more of it. Would this be reliable?
    > int
    > num_submatches ()
    Just copy the code used for access to @-/@+. Lemme see... mg.c:

    case '+':
    if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
    paren = rx->lastparen;
    if (paren)
    goto getparen;
    }
    sv_setsv(sv,&PL_sv_undef);
    break;

    Check with older Perl sources, but I think that PM_GETRE should be
    something like identity macro on older perls...

    Yours,
    Ilya
    Ilya Zakharevich Guest

  7. #6

    Default Re: emulating @+ and @-

    Also sprach Ilya Zakharevich:
    ><tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhmfj$d9l$1@nets3.rz.RWTH-Aachen.DE>:
    >> What about perl_get_sv? My code is 90% XS anyway, so I can just as
    >> easily add more of it. Would this be reliable?
    >
    >> int
    >> num_submatches ()
    >
    > Just copy the code used for access to @-/@+. Lemme see... mg.c:
    >
    > case '+':
    > if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
    > paren = rx->lastparen;
    > if (paren)
    > goto getparen;
    > }
    > sv_setsv(sv,&PL_sv_undef);
    > break;
    >
    > Check with older Perl sources, but I think that PM_GETRE should be
    > something like identity macro on older perls...
    That's smart. I think that this will make it easy to solve my problem.
    'struct regexp' (among others) holds

    U32 nparens; /* number of parentheses */
    U32 lastparen; /* last paren matched */
    U32 lastcloseparen; /* last paren matched */

    One of those (or even all of them) is probably what I am looking for.

    Thanks for your help!
    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
    Tassilo v. Parseval Guest

  8. #7

    Default Re: emulating @+ and @-

    [A complimentary Cc of this posting was sent to
    Tassilo v. Parseval
    <tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhua3$kkv$1@nets3.rz.RWTH-Aachen.DE>:
    > That's smart. I think that this will make it easy to solve my problem.
    > 'struct regexp' (among others) holds
    >
    > U32 nparens; /* number of parentheses */
    > U32 lastparen; /* last paren matched */
    > U32 lastcloseparen; /* last paren matched */
    >
    > One of those (or even all of them) is probably what I am looking for.
    Keep in mind that struct regexp is used for *two* purposes (I did not
    have time to clean this when working with REx engine): it keeps a part
    of the state during the match process, and it keeps the info used for
    $<digit> etc *after* the match. Be sure to use only the entries
    mentioned in mg.c.

    Yours,
    Ilya

    Ilya Zakharevich Guest

  9. #8

    Default Re: emulating @+ and @-

    Also sprach Ilya Zakharevich:
    ><tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhua3$kkv$1@nets3.rz.RWTH-Aachen.DE>:
    >> That's smart. I think that this will make it easy to solve my problem.
    >> 'struct regexp' (among others) holds
    >>
    >> U32 nparens; /* number of parentheses */
    >> U32 lastparen; /* last paren matched */
    >> U32 lastcloseparen; /* last paren matched */
    >>
    >> One of those (or even all of them) is probably what I am looking for.
    >
    > Keep in mind that struct regexp is used for *two* purposes (I did not
    > have time to clean this when working with REx engine): it keeps a part
    > of the state during the match process, and it keeps the info used for
    > $<digit> etc *after* the match. Be sure to use only the entries
    > mentioned in mg.c.
    I eventually went for lastparen which seems to work very well. PM_GETRE
    had to be added, and that was all. My little XSUB now reads as

    int
    num_submatches ()
    CODE:
    #ifndef PM_GETRE
    # define PM_GETRE(o) ((o)->op_pmregexp)
    #endif
    RETVAL = PM_GETRE(PL_curpm)->lastparen;
    OUTPUT:
    RETVAL

    After a wild and ambitious attempt of adding the whole of @+ in @- to
    5.00503 failed (naturally, it didn't yet know of about 'D' magic), I can
    still implement them as tied arrays. That way I wouldn't need to change
    my Perl code.

    Ilya, thanks for the invaluable pointers to the relevant bits of the
    Perl source. Normally, regexes on the source level really scare me, but
    here it was surprisingly not hard at all.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
    Tassilo v. Parseval Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139