Professional Web Applications Themes

Trouble with m///g - PERL Beginners

Hi, I'm trying to extract all four-digit numbers from a string in one fell swoop, but I can't seem to come up with the proper regexp. This is my first time using /g in a match so maybe there's a trick I'm missing. For example, the string "1111 2222aa3333 444 55555555 6666 7777-8888" should yield 1111, 2222, 3333, 6666, 7777, 8888. Here's one attempt that I thought had a reasonable chance. - - - - - #!/usr/bin/perl -w my $foo = "1111 2222aa3333 444 55555555 6666 7777-8888"; my a = ($foo =~ m'[\D^](\d{4})[\D$]'g); print "<$foo>\n"; print(join(":",a)."\n"); - - - - ...

  1. #1

    Default Trouble with m///g

    Hi,

    I'm trying to extract all four-digit numbers from a string in one fell
    swoop, but I can't seem to come up with the proper regexp. This is my
    first time using /g in a match so maybe there's a trick I'm missing.

    For example, the string

    "1111 2222aa3333 444 55555555 6666 7777-8888"

    should yield

    1111, 2222, 3333, 6666, 7777, 8888.

    Here's one attempt that I thought had a reasonable chance.

    - - - - -
    #!/usr/bin/perl -w
    my $foo = "1111 2222aa3333 444 55555555 6666 7777-8888";
    my a = ($foo =~ m'[\D^](\d{4})[\D$]'g);
    print "<$foo>\n";
    print(join(":",a)."\n");
    - - - - -

    <1111 2222aa3333 444 55555555 6666 7777-8888>
    2222:3333:6666

    Thanks for your consideration,
    Chap

    Chap Guest

  2. #2

    Default RE: Trouble with m///g

    I think this might work.

    /\b\d{4}\b/

    Rob

    -----Original Message-----
    From: Chap Harrison [mailto:com]
    Sent: Thursday, September 30, 2004 10:38 AM
    To: org
    Subject: Trouble with m///g


    Hi,

    I'm trying to extract all four-digit numbers from a string in one fell
    swoop, but I can't seem to come up with the proper regexp. This is my
    first time using /g in a match so maybe there's a trick I'm missing.

    For example, the string

    "1111 2222aa3333 444 55555555 6666 7777-8888"

    should yield

    1111, 2222, 3333, 6666, 7777, 8888.

    Here's one attempt that I thought had a reasonable chance.

    - - - - -
    #!/usr/bin/perl -w
    my $foo = "1111 2222aa3333 444 55555555 6666 7777-8888";
    my a = ($foo =~ m'[\D^](\d{4})[\D$]'g);
    print "<$foo>\n";
    print(join(":",a)."\n");
    - - - - -

    <1111 2222aa3333 444 55555555 6666 7777-8888>
    2222:3333:6666

    Thanks for your consideration,
    Chap


    --
    To unsubscribe, e-mail: org
    For additional commands, e-mail: org
    <http://learn.perl.org/> <http://learn.perl.org/first-response>

    Rob Guest

  3. #3

    Default Re: Trouble with m///g

    > For example, the string 

    That's actually kind of tricky. How about:

    $aa = "1111 2222aa3333 444 55555555 6666 7777-8888";
    aa = $aa =~ /(?<!\d)\d{4}(?!\d)/g;
    print "$_\n" for aa;

    That gets 2222 and 3333 also, which the \b solution skips. What it
    says is to get all groups of 4 numbers not following or followed by
    another number.

    Dave

    ps - also see perldoc -f perlre and look for zero-width negative
    look(ahead|behind) assertions
    Dave Guest

  4. #4

    Default RE: Trouble with m///g

    Please bottom post...
     

    It might, but doesn't. Some testing would be good before posting
    inaccurate responses.
     

    \b is matching on boundaries, so you miss the first set, and the set
    with the 'aa' around them, and then there is the set with the '-'....
     

    Out of curiousity based on your description shouldn't it return,

    1111:2222:3333:5555:5555:6666:7777:8888

    Or do you really mean, you are trying to capture all 4 digit strings
    that are not in a string of longer digits? You need to be very explicit
    about what you are after. I think (and have tested) that,

    my a = ($foo =~ m'(?<!\d{4})\d{4}(?!\d)'g);

    Gives you want you want, though I don't claim to be a regex expert like
    others on the list (are experts, rather than claiming). And I *believe*
    says, match any 4 digit string not preceded by a 4 digit string and not
    followed by a digit.

    Works?

    http://danconia.org


    Wiggins Guest

  5. #5

    Default Re: Trouble with m///g

    Hmmm...

    m'\b(\d{4})\b'g
    <1111 2222aa3333 444 55555555 6666 7777-8888>
    1111:6666:7777:8888

    Doesn't give me 2222 or 3333. I think the problem has to do with where
    m///g starts on subsequent iterations. The pattern specifies a
    delimiter for both the start and the end of the target substring, but
    that means it will want to find an ending delim on iteration n,
    followed by a beginning delim on iteration n+1.


    On Sep 30, 2004, at 9:41 AM, Hanson, Rob wrote:
     

    Chap Guest

  6. #6

    Default RE: Trouble with m///g

    Chap Harrison wrote: 

    TIMTOWTDI:

    list = grep length==4, /\d+/g
    Bob Guest

  7. #7

    Default Re: Trouble with m///g

    Chap Harrison wrote: 

    The first character class requires that the number is preceeded by a
    non-digit character. (The ^ character has no special meaning in a
    character class.) Since the first number is not preceeded by anything,
    1111 is not matched.

    I suppose you meant to do:

    my a = ($foo =~ m'(?:\D|^)(\d{4})(?:\D|$)'g);

    which gives

    1111:3333:6666:8888

    but that's not what you want either. The reason why e.g. 2222 is not
    matched is that the space after 1111 is included in the first match,
    so the second attempt to match starts at the first '2'...

    You'd better use extended patterns, i.e. zero-width assertions:

    my a = $foo =~ /(?<!\d)\d{4}(?!\d)/g;

    Read about extended patterns in "perldoc perlre".

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Guest

  8. #8

    Default Re: Trouble with m///g


    On Sep 30, 2004, at 9:55 AM, Wiggins d Anconia wrote:
     

    The example was intended to resolve the ambiguities of my informal
    description :-) You correctly surmised what I was after.
     

    And your solution works. Now I'm going to study up on *how* it works!

    Thanks, and also thanks to Dave and Gunnar for what appears to be the
    same solution, and the references to extended patterns and zero-width
    assertions.

    Chap

    Chap Guest

  9. #9

    Default Re: Trouble with m///g

    > TIMTOWTDI: 

    Shouldn't that be:

    list = grep length==4, $foo =~ /\d+/g;

    Cool solution, I wouldn't have thought to do it that way. I'm getting
    varying Benchmarking results, though. I think it might have something
    to do with grep speedups from 5.6.1 to 5.8.0... can anyone confirm
    this?

    On a box with 4 Xeon 2gigs with 5.6.1 and Benchmark v1:
    Rate grep wregex regex
    grep 55586/s -- -13% -23%
    wregex 64061/s 15% -- -12%
    regex 72569/s 31% 13% --

    But, on another box with 1 AMD 1gig with 5.8.0 and Benchmark v1.0501:
    Rate wregex regex grep
    wregex 31437/s -- -14% -18%
    regex 36470/s 16% -- -5%
    grep 38212/s 22% 5% --


    Confusing!

    #!/usr/bin/perl -w
    use strict;
    use Benchmark qw/cmpthese/;

    my ($aa);
    $aa = "1111 2222aa3333 444 55555555 6666 7777-8888";

    sub regex { my aa = $aa =~ /(?<!\d)\d{4}(?!\d)/g }

    # Wiggins ;-)
    sub wregex { my aa = $aa =~ /(?<!\d{4})\d{4}(?!\d{4})/g }

    sub grep { my aa = grep length==4, $aa =~ /\d+/g }

    cmpthese(100000, {
    regex => \&regex,
    wregex => \&wregex,
    grep => \&grep,
    });
    Dave Guest

  10. Moderated Post

    Default Re: Trouble with m///g

    Removed by Administrator
    Jan Guest
    Moderated Post

  11. #11

    Default Re: Trouble with m///g


    On Sep 30, 2004, at 10:41 AM, Jan Eden wrote:
     
    >>[/ref]
    > Careful, you mistyped the original proposition:
    >
    > my a = ($foo =~ m'(?<!\d)\d{4}(?!\d)'g);
    >[/ref]

    Oops, sorry - I copied that into the email from Wiggins' reply, but
    actually tested with Dave Gray's. Didn't notice the difference. What
    you posted gives the solution I was after. Thanks for the scrutiny!

    Chap Guest

Similar Threads

  1. FP trouble
    By edern5 in forum Macromedia Flash Player
    Replies: 1
    Last Post: May 21st, 10:46 PM
  2. Trouble with CF7 on Mac
    By wvhillbilly in forum Macromedia ColdFusion
    Replies: 2
    Last Post: March 30th, 10:53 PM
  3. Trouble installing Linux (probably 5 partition trouble)
    By Chris in forum Linux Setup, Configuration & Administration
    Replies: 6
    Last Post: December 17th, 09:21 AM
  4. Replies: 1
    Last Post: July 18th, 09:57 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139