Ask a Question related to PERL Miscellaneous, Design and Development.
-
Tassilo v. Parseval #1
emulating @+ and @-
Hi,
I am currently working on something that should be backward-compatible
at least up to 5.00503. Unfortunately, this relies on @+ and @- which
weren't there by that time. So I have to find a way to create and
populate these two arrays with 5.00503. Consider the original code that
assumes that @(+|-) exist:
sub bla {
my ($string, $pat, $code) = @_
while ($string =~ /$pat/g) {
$code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
};
}
This is essentially what Ruby's scan() does: Scan a string for a pattern
and call the code via the reference. If the pattern contains
subpatterns, call it like
$code->($1, $2, ...);
otherwise (that means, no captured subpatterns) do
$code->($&);
This means, I need the whole of @- and @+, and not just the first
element of each two. My question is specifically about generating
elements 1 to $#-. My current solution:
while ($string =~ /($pat)/g) {
@- = @+ = (); # clear previous match offsets
# populates $-[0] and $+[0]
push @-, index($string, $1);
push @+, pos($string);
# fill @-[1..#@-] and @+[1..#@+]
my $digit = 2; # $1 is the whole match
while () {
no strict 'refs';
if (defined $$digit) {
# extract offsets of $$digit
...
$digit++;
} else {
last;
}
}
$code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
}
The part I'm uneasy about is
if (defined $$digit) {
...
}
Specifically, can $2 be undefined but $3 still contain a submatch? I
vaguely remember that I had such cases but I can't reproduce them now.
If they exist, I can't use the above code and need something better. If
so, what would be a correct solution?
Thanks in advance for any pointers,
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
Tassilo v. Parseval Guest
-
Printers suited for emulating engraved look & feel?
Hello. I am looking for a prosumer level printer (low-volume, high-quality, small-format) that would be well suited to creating an engraved look,... -
Ilya Zakharevich #2
Re: emulating @+ and @-
[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
<tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhh5k$3ib$1@nets3.rz.RWTH-Aachen.DE>:> push @-, index($string, $1);
> push @+, pos($string);??? What is the point of finding indices, then calling substr? Why> $code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
> map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
not use $$_ directly?
Of course:> Specifically, can $2 be undefined but $3 still contain a submatch?
(a)?(b)
Hope this helps,
Ilya
Ilya Zakharevich Guest
-
Anno Siegel #3
Re: emulating @+ and @-
Tassilo v. Parseval <tassilo.parseval@post.rwth-aachen.de> wrote in comp.lang.perl.misc:
[$string =~ /($pat)/g]> Hi,
>
> I am currently working on something that should be backward-compatible
> at least up to 5.00503. Unfortunately, this relies on @+ and @- which
> weren't there by that time. So I have to find a way to create and
> populate these two arrays with 5.00503. Consider the original code that
I have vaguely asked myself that too.> Specifically, can $2 be undefined but $3 still contain a submatch? I
> vaguely remember that I had such cases but I can't reproduce them now.
> If they exist, I can't use the above code and need something better. If
> so, what would be a correct solution?
It is my impression that after a successful match $1 ... $n are
always defined where n is the number of capturing parentheses in $pat.
Even if a submatch doesn't apply (as in an alternation), the corresponding
$i is empty, but defined.
The hard part is finding where this is documented. I can't.
Anno
Anno Siegel Guest
-
Tassilo v. Parseval #4
Re: emulating @+ and @-
Also sprach Ilya Zakharevich:
Copy and paste. I have the ordinary function scan() and the backward><tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhh5k$3ib$1@nets3.rz.RWTH-Aachen.DE>:>>> push @-, index($string, $1);
>> push @+, pos($string);>>> $code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
>> map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
> ??? What is the point of finding indices, then calling substr? Why
> not use $$_ directly?
compatible one scan5005003() and I do a
if ($] < 5.006) {
*scan = \&scan500503;
}
to replace the version not suitable for this perl. So the
backward-compatible ones are just a copy of the ordinary functions plus
the population of @- and @+.
Maybe I am going to reimplement it later by directly using $$digit, but
that's quite a bit of work since I have around five or six of these
duplicate functions. But I am not yet sure whether I should do that
because then a user cannot use @- and @+ in his code references (as he
could now).
It does, thank you. It means I have to find a different way to>>> Specifically, can $2 be undefined but $3 still contain a submatch?
> Of course:
>
> (a)?(b)
>
> Hope this helps,
figure out how many submatches exist. :-(
What about perl_get_sv? My code is 90% XS anyway, so I can just as
easily add more of it. Would this be reliable?
int
num_submatches ()
PREINIT:
int i = 0;
char *digit;
int len = 1;
CODE:
New(0, digit, 2, char);
digit = "2";
while (1) {
if (!sv_get(digit, FALSE))
break;
/* next() already exists elsewhere and
* increments/grows the string accordingly */
next(&digit, &len);
i++;
}
RETVAL = i;
OUTPUT:
RETVAL
It assumes that $4 does not exist when $3 was the last succesful submatch.
But I don't know whether perl really destroys all the digit variables
that are larger than the highest submatch.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
Tassilo v. Parseval Guest
-
Ilya Zakharevich #5
Re: emulating @+ and @-
[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
<tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhmfj$d9l$1@nets3.rz.RWTH-Aachen.DE>:> What about perl_get_sv? My code is 90% XS anyway, so I can just as
> easily add more of it. Would this be reliable?Just copy the code used for access to @-/@+. Lemme see... mg.c:> int
> num_submatches ()
case '+':
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
paren = rx->lastparen;
if (paren)
goto getparen;
}
sv_setsv(sv,&PL_sv_undef);
break;
Check with older Perl sources, but I think that PM_GETRE should be
something like identity macro on older perls...
Yours,
Ilya
Ilya Zakharevich Guest
-
Tassilo v. Parseval #6
Re: emulating @+ and @-
Also sprach Ilya Zakharevich:
That's smart. I think that this will make it easy to solve my problem.><tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhmfj$d9l$1@nets3.rz.RWTH-Aachen.DE>:>>> What about perl_get_sv? My code is 90% XS anyway, so I can just as
>> easily add more of it. Would this be reliable?>>> int
>> num_submatches ()
> Just copy the code used for access to @-/@+. Lemme see... mg.c:
>
> case '+':
> if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
> paren = rx->lastparen;
> if (paren)
> goto getparen;
> }
> sv_setsv(sv,&PL_sv_undef);
> break;
>
> Check with older Perl sources, but I think that PM_GETRE should be
> something like identity macro on older perls...
'struct regexp' (among others) holds
U32 nparens; /* number of parentheses */
U32 lastparen; /* last paren matched */
U32 lastcloseparen; /* last paren matched */
One of those (or even all of them) is probably what I am looking for.
Thanks for your help!
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
Tassilo v. Parseval Guest
-
Ilya Zakharevich #7
Re: emulating @+ and @-
[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
<tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhua3$kkv$1@nets3.rz.RWTH-Aachen.DE>:Keep in mind that struct regexp is used for *two* purposes (I did not> That's smart. I think that this will make it easy to solve my problem.
> 'struct regexp' (among others) holds
>
> U32 nparens; /* number of parentheses */
> U32 lastparen; /* last paren matched */
> U32 lastcloseparen; /* last paren matched */
>
> One of those (or even all of them) is probably what I am looking for.
have time to clean this when working with REx engine): it keeps a part
of the state during the match process, and it keeps the info used for
$<digit> etc *after* the match. Be sure to use only the entries
mentioned in mg.c.
Yours,
Ilya
Ilya Zakharevich Guest
-
Tassilo v. Parseval #8
Re: emulating @+ and @-
Also sprach Ilya Zakharevich:
I eventually went for lastparen which seems to work very well. PM_GETRE><tassilo.parseval@post.rwth-aachen.de>], who wrote in article <bjhua3$kkv$1@nets3.rz.RWTH-Aachen.DE>:>>> That's smart. I think that this will make it easy to solve my problem.
>> 'struct regexp' (among others) holds
>>
>> U32 nparens; /* number of parentheses */
>> U32 lastparen; /* last paren matched */
>> U32 lastcloseparen; /* last paren matched */
>>
>> One of those (or even all of them) is probably what I am looking for.
> Keep in mind that struct regexp is used for *two* purposes (I did not
> have time to clean this when working with REx engine): it keeps a part
> of the state during the match process, and it keeps the info used for
> $<digit> etc *after* the match. Be sure to use only the entries
> mentioned in mg.c.
had to be added, and that was all. My little XSUB now reads as
int
num_submatches ()
CODE:
#ifndef PM_GETRE
# define PM_GETRE(o) ((o)->op_pmregexp)
#endif
RETVAL = PM_GETRE(PL_curpm)->lastparen;
OUTPUT:
RETVAL
After a wild and ambitious attempt of adding the whole of @+ in @- to
5.00503 failed (naturally, it didn't yet know of about 'D' magic), I can
still implement them as tied arrays. That way I wouldn't need to change
my Perl code.
Ilya, thanks for the invaluable pointers to the relevant bits of the
Perl source. Normally, regexes on the source level really scare me, but
here it was surprisingly not hard at all.
Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexi ixesixeseg;y~\n~~dddd;eval
Tassilo v. Parseval Guest



Reply With Quote

