Professional Web Applications Themes

How to express "not followed by"? - PERL Miscellaneous

>>>>> "Yi" == Yi Mang <yi_mang> writes: Yi> My question is regarding to the regexp "not followed by", for example, Yi> if a string has a number followed by a semi-colon, then pass; Yi> otherwise, print an error. It's hard to tell what your "otherwise" applies to, since the rule can be interpreted as either: IF has number THEN IF number IS followed by semi THEN - PASS ELSE - FAIL ELSE - FAIL Or perhaps IF has number THEN IF number IS followed by semi THEN - PASS ELSE - FAIL ELSE - PASS Which is it? And there's ...

  1. #1

    Default Re: How to express "not followed by"?

    >>>>> "Yi" == Yi Mang <yi_mang> writes:

    Yi> My question is regarding to the regexp "not followed by", for example,
    Yi> if a string has a number followed by a semi-colon, then pass;
    Yi> otherwise, print an error.

    It's hard to tell what your "otherwise" applies to, since the rule can
    be interpreted as either:

    IF has number THEN
    IF number IS followed by semi THEN - PASS
    ELSE - FAIL
    ELSE - FAIL

    Or perhaps

    IF has number THEN
    IF number IS followed by semi THEN - PASS
    ELSE - FAIL
    ELSE - PASS

    Which is it?

    And there's the definition of "number". Do you mean "single digit",
    or "sequence of digits"?

    Perhaps you should clarify the rule, and also give some examples.

    Any answers you get until you've done that are stabbing in the dark.

    print "Just another Perl hacker,"

    --
    Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
    <merlynstonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
    Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
    See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
    Randal L. Schwartz Guest

  2. #2

    Default Re: How to express "not followed by"?

    On Fri, 15 Aug 2003 13:49:04 -0700, Yi Mang wrote:
    > My question is regarding to the regexp "not followed by", for example,
    > if a string has a number followed by a semi-colon, then pass;
    > otherwise, print an error.
    Hmm. If I understand you right, the solution should be pretty
    straightforward?

    if ( /\d;/ ) {
    # Pass
    }
    else {
    # Error
    }

    Or?


    --
    Tore Aursand <toreaursand.no>
    Tore Aursand Guest

  3. #3

    Default Re: How to express "not followed by"?


    "Yi Mang" <yi_mang> wrote in message
    news:832301af.0308151249.71aca6e8posting.google.c om...
    > Hi,
    > My question is regarding to the regexp "not followed by", for example,
    > if a string has a number followed by a semi-colon, then pass;
    > otherwise, print an error.
    The regexp could look something like this:

    /\d+(?!;)/

    That would match one or more numeric digits as long as they're not followed
    by a semicolon.

    It could be used like this:

    #!/usr/bin/perl

    use strict;
    use warnings;

    my test_strings = ( "Test1357;" , "Test2468" );

    foreach ( test_strings ) {
    if ( $_ =~ /\d+(?!;)/ ) {
    print "$_ matched as $&\n";
    } else {
    print "$_ didn't match.\n";
    }
    }


    If you want to capture the number, it would be:


    /(\d+)(?!;)/

    And the portion that matched would now be captured in $1. Of course in this
    simple example, $& also contains what matched. Lookaheads are never
    considered part of the match, they're simply considered part of the criteria
    that allowed the match.

    Dave



    David Oswald Guest

  4. Moderated Post

    Default Re: How to express "not followed by"?

    Removed by Administrator
    Xaonon Guest
    Moderated Post

  5. #5

    Default Re: How to express "not followed by"?


    "Tore Aursand" <toreaursand.no> wrote in message
    news:pan.2003.08.15.21.13.38.329532aursand.no...
    > On Fri, 15 Aug 2003 13:49:04 -0700, Yi Mang wrote:
    > > My question is regarding to the regexp "not followed by", for example,
    > > if a string has a number followed by a semi-colon, then pass;
    > > otherwise, print an error.
    >
    > Hmm. If I understand you right, the solution should be pretty
    > straightforward?
    >
    > if ( /\d;/ ) {
    > # Pass
    > }
    > else {
    > # Error
    > }
    Actually you've just matched only one digit, and included the semicolon in
    the match. Of course he didn't specify if the semicolon sould be part of
    the match or not, or if it was just a criteria for matching.

    Previously I incorrectly stated that /\d+(?!;)/ would work, but I misread
    his strategy. I somehow missed where he said that semicolon SHOULD match.
    With that in mind, it seems that his match, as described in the above
    paragraph, could be written as:
    /\d+;/ if he wants the semicolon actually included in the matched text, or
    /\d+(?=;)/ if he only wants the semicolon to be a condition of the match,
    but not included in what gets matched.

    Dave


    David Oswald Guest

  6. #6

    Default Re: How to express "not followed by"?


    "Gunnar Hjalmarsson" <noreplygunnar.cc> wrote in message
    news:bhjkjr$aa4h$1ID-184292.news.uni-berlin.de...
    > David Oswald wrote:
    > > The regexp could look something like this:
    > >
    > > /\d+(?!;)/
    > >
    > > That would match one or more numeric digits ...
    > -------------------^^^^^^^^^^^
    >
    > Yes... And it would match 'abc123;edf' just fine - for the same reason
    > as OP's /\d+[^;]/ matches, i.e. "one or more".
    >
    > So your solution cannot work.
    >
    > If the number of digits is fixed, it's easy: Then both /\d{3}[^;]/ and
    > /\d{3}(?!;)/ fails.
    Actually I misread his request. Now that I re-read what he was looking for,
    I understand him to mean that numeric digits followed by a semicolon should
    pass, but numeric digits not followed by a semicolon should not match.

    I'm not sure why you're limiting the number of numeric digits. But here's
    my second stab at getting it right:

    /\d+(?=;)/ will match any string where any number of one or more digits is
    followed by a semicolon, but will not match if the numeric digits are not
    followed by a semicolon. In this case, the semicolon is not part of the
    match, but rather, a criteria of the match. That means that $& will contain
    the numeric digits, but not the semicolon.

    If he wanted to keep the semicolon, then a lookahead isn't even necessary.
    In that case, the match could be:
    /\d+;/ Now I'm specifying that one or more digits will match if followed by
    a semicolon. If not followed by a semicolon, there won't be a match.

    His subject line states "not followed by", but what he's really asking seems
    to be "followed by", since his first paragraph states what the number should
    be followed by, not what it shouldn't be followed by, except that it
    shouldn't be followed by anything else.

    If, on the other hand, he intends for a match to occur at end of line
    without semicolon, or anywhere in the line with a semicolon, that's a
    different beast, and could be written as follows:

    /(\d+$)|(\d+;)/

    or with lookahead:
    /(\d+$)|(\d+(?=;))/

    The second option would require that the semicolon appear if the number is
    not at the end of line, but wouldn't capture the semicolon into the match.

    Dave


    David Oswald Guest

  7. #7

    Default Re: How to express "not followed by"?

    David Oswald wrote:
    > here's my second stab at getting it right:
    >
    > /\d+(?=;)/ will match any string where any number of one or more
    > digits is followed by a semicolon, but will not match if the
    > numeric digits are not followed by a semicolon. In this case, the
    > semicolon is not part of the match, but rather, a criteria of the
    > match. That means that $& will contain the numeric digits, but not
    > the semicolon.
    >
    > If he wanted to keep the semicolon, then a lookahead isn't even
    > necessary. In that case, the match could be:
    > /\d+;/ Now I'm specifying that one or more digits will match if
    > followed by a semicolon. If not followed by a semicolon, there
    > won't be a match.
    /\d+(?=;)/ is basically just a more complicated way to write /\d+;/,
    and if you are not making use of the $& variable, there is no reason
    to bother with a zero-width assertion, is there?
    > His subject line states "not followed by", but what he's really
    > asking seems to be "followed by", since his first paragraph states
    > what the number should be followed by, not what it shouldn't be
    > followed by, except that it shouldn't be followed by anything else.
    Precisely.
    > If, on the other hand, he intends for a match to occur at end of
    > line without semicolon, or anywhere in the line with a semicolon,
    > that's a different beast, and could be written as follows:
    >
    > /(\d+$)|(\d+;)/
    >
    > or with lookahead:
    > /(\d+$)|(\d+(?=;))/
    Or shorter:

    /\d+(?:;|$)/
    > The second option would require that the semicolon appear if the
    > number is not at the end of line, but wouldn't capture the
    > semicolon into the match.
    Rather than thinking of what's captured in the $& variable, I pay
    usually more attention to whether things are captured in the $1, $2,
    etc. variables. Both your last options would capture the whole match
    in either $1 or $2 ...

    --
    Gunnar Hjalmarsson
    Email: [url]http://www.gunnar.cc/cgi-bin/contact.pl[/url]

    Gunnar Hjalmarsson Guest

  8. #8

    Default Re: How to express "not followed by"?

    Thanks for all the people who has helped me to tackle the problem.
    Sorry I'm vague on the description of the problem. Here I'll try to
    describe the problem more clearly:
    I have a test file, some of the lines have a sequence of digits, what
    I'm trying to do is to print out the lines that has "a sequence of
    digits not followed by a semicolon" (teh same as "a sequence of digits
    followed by a non-semicolon").

    Here is my test file(test.txt):
    123;edf
    abde
    abc1234
    12abc
    abc123;
    abcdef
    defg

    Here is my program:
    open(IN,"<./test.txt");
    while(<IN>)
    {
    chomp;
    # sequence of digits not followed by a semicolon
    if ( /\d+(?!;)/ ) #Thanks to those who suggest it.
    #if ( /\d+([^;])/ ) #my original post. Got the same result.
    {
    print "Error on line $.: $_\n";
    }
    }
    close(IN);

    The result shows line 1, line 3, line 4 and line 5. But actually only
    line 3 and line 4 should come out. Line 1 and line 5 are the extact
    opposite (followed by a semicolon). I just don't understand why they
    would also match.

    I actually get around the problem by doing the following:
    if ( /\d+/ )
    {
    if ( $' !~ /^;/ )
    {
    print "Error on line $.: $_\n";
    }
    }
    But there got be an easy way to do this by using only one if
    statement.
    Thanks for all your time and efforts. I appreciate it.
    Yi Mang Guest

  9. #9

    Default Re: How to express "not followed by"?

    Yi Mang wrote:
    > what I'm trying to do is to print out the lines that has "a
    > sequence of digits not followed by a semicolon" (teh same as "a
    > sequence of digits followed by a non-semicolon").
    >
    > Here is my test file(test.txt):
    > 123;edf
    > abde
    > abc1234
    > 12abc
    > abc123;
    > abcdef
    > defg
    <program snipped>

    Then I suppose that you can do:

    if ( /\d(?:[^\d;]|$)/ )

    Actually it's Randal's solution modified to also cover line 3 (which
    ends with a group of digits).

    --
    Gunnar Hjalmarsson
    Email: [url]http://www.gunnar.cc/cgi-bin/contact.pl[/url]

    Gunnar Hjalmarsson Guest

  10. #10

    Default Re: How to express "not followed by"?

    Randal L. Schwartz wrote:
    > Gunnar> Then I suppose that you can do:
    >
    > Gunnar> if ( /\d(?:[^\d;]|$)/ )
    >
    > Probably simpler to lookhead here:
    >
    > if (/\d(?![\d;])/) { ... }
    >
    > Solutions with "or end of line" often look more complicated than
    > they need to be.
    Yes, that looks cleaner.

    One reason why I seldom come to think of zero-width assertions is that
    I usually am particular about portability. However, I just checked,
    and I believe that it's just lookbehind assertions (available only in
    5.005) that are a risk in that respect.

    Anyway, in the spirit of TMTOWTDI, this is another approach that I was
    about to post before I saw your /\d[^\d;]/ suggestion:

    if ( /\d/ and !/\d;/ )

    --
    Gunnar Hjalmarsson
    Email: [url]http://www.gunnar.cc/cgi-bin/contact.pl[/url]

    Gunnar Hjalmarsson Guest

  11. #11

    Default Re: How to express "not followed by"?


    "Yi Mang" <yi_mang> wrote in message
    news:832301af.0308161458.62bdbda5posting.google.c om...
    > Thanks, guys, that did it. The problem actually should say "a sequence
    > of digits is not followed by a digit nor a semicolon". Now it seems
    > easy to solve.
    > Thanks again.
    You just had a breakthrough, seriously.

    The key to getting a regular expression right is to know exactly, EXACTLY
    what you're looking for it to match, and to not leave it too ambiguous. Now
    "exactly" can indeed mean it matches a broad spectrum of things. .*, for
    example, has an exact outcome though it matches a whole lot. But the
    starting point is pinning down precisely what you want to match, capture,
    leave behind, etc.

    At that point it just becomes a matter of using the tools of your regexp
    implementation to accomplish the precision you have dreamt up.

    Also, earlier you mentioned that "a sequence of digits not followed by a
    semicolon" was the same as "a sequence of digits followed by a
    non-semicolon". I'm sure that you've now found that not to be the case,
    which is what I'm getting at when I say you need to think of things with
    precision when implementing a regexp.

    But it's surprising how easy a regexp can become once you think through
    exactly what you want to have match, and it becomes a sharp sword.

    Dave


    David Oswald Guest

  12. #12

    Default Re: How to express "not followed by"?

    Also sprach Gunnar Hjalmarsson:
    > One reason why I seldom come to think of zero-width assertions is that
    > I usually am particular about portability. However, I just checked,
    > and I believe that it's just lookbehind assertions (available only in
    > 5.005) that are a risk in that respect.
    What do you mean with 'available only in 5.005'? Look-behind is still
    there (however, it still has the limitation of only allowing patterns
    with fixed length).

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus}) !JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+ii ixesixeseg;y~\n~~dddd;eval
    Tassilo v. Parseval Guest

  13. #13

    Default Re: How to express "not followed by"?

    Tassilo v. Pval wrote:
    > Also sprach Gunnar Hjalmarsson:
    >>One reason why I seldom come to think of zero-width assertions is that
    >>I usually am particular about portability. However, I just checked,
    >>and I believe that it's just lookbehind assertions (available only in
    >>5.005) that are a risk in that respect.
    >
    > What do you mean with 'available only in 5.005'? Look-behind is still
    > there (however, it still has the limitation of only allowing patterns
    > with fixed length).
    Yes, of course it is. I guess I should have said 'only as from 5.005'
    or something. Confusion of languages. :)

    Thanks for pointing it out.

    --
    Gunnar Hjalmarsson
    Email: [url]http://www.gunnar.cc/cgi-bin/contact.pl[/url]

    Gunnar Hjalmarsson Guest

Similar Threads

  1. Replies: 1
    Last Post: April 24th, 01:27 PM
  2. CFINPUT type="radio" w/ "value" requires "label"
    By Iceborer in forum Macromedia ColdFusion
    Replies: 2
    Last Post: February 21st, 06:16 PM
  3. dr("field").toString returns "400.0000" instead of "400"
    By Dan C Douglas in forum ASP.NET General
    Replies: 5
    Last Post: July 22nd, 05:48 PM
  4. Outlook Express - "SAVE TO"
    By Tom in forum Windows Networking
    Replies: 0
    Last Post: July 15th, 02:12 AM
  5. "Start" "Program" "Menu" list is empty
    By Pete in forum Windows XP/2000/ME
    Replies: 2
    Last Post: July 10th, 10:42 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139