all matches of a regex

Ask a Question related to PERL Beginners, Design and Development.

  1. #1

    Default all matches of a regex

    Hi,
    I have been trying to solve a problem which is about to drive me crazy.
    May be some one know the answer(hopefully:)

    I want to get all macthes of a pattern in a string including the overlaping ones.
    For example
    the string is "xHxxHyyKzDt"
    and the pattern is /^(.*)H(.*)K(.*)D(.*)$/

    so in one round of match $1=x $2=xxHyy $3=z $4=t
    in another $1=xHxx $2=yy $3=x $4=t


    while ($sequence=~/$pattern/g )
    doesn't work I think becaue the matches are overlapping

    while ($sequence=~/(?=$pattern)/g )
    also doesn't work

    I would really appreciate if someone can help.
    Thanks.
    oznur

    Öznur taþtan Guest

  2. Similar Questions and Discussions

    1. Querying data that matches in two different tables
      The code is attached. Basically there are two tables, contact and recruiter. When you initially add someone to contact, it also adds certain...
    2. Ruby idiom for all matches in a string
      What's the ruby idiom for all the matches in a string. x="ABA ABBBA CBBA" /A(B+)/.match(x) Ultimately, I want to get Thanks -- David...
    3. How to find grid's row that matches row in dataset
      how to set SelectedIndex of the grid to the row that matches a datarow found in the underlying dataset (besides looping thru all rows and comparing...
    4. Named matches for regular expressions (was: Specification of Ruby regex?)
      On Wed, 27 Aug 2003 04:29:22 +0900, Wesley J. Landaker wrote: No. See and following. ----------- list of all captures Version 1.9.2 ...
    5. convert string to <> ( multipul matches)
      This is a question i'm sure has been asked before but i can't see to dig a a answer. say your recieving a text stream: contained in $stream. (...
  3. #2

    Default Re: all matches of a regex


    Öznur Taþtan wrote:
    >Hi,
    >I have been trying to solve a problem which is about to drive me crazy.
    >May be some one know the answer(hopefully:)
    >
    >I want to get all macthes of a pattern in a string including the overlaping
    >ones.
    >For example
    >the string is "xHxxHyyKzDt"
    >and the pattern is /^(.*)H(.*)K(.*)D(.*)$/
    >
    >so in one round of match $1=x $2=xxHyy $3=z $4=t
    >in another $1=xHxx $2=yy $3=x $4=t
    >
    I am not sure what concept your are referring to by "round", but you will never get the first result in any "round": Your quantifiers are greedy, so the first pair of brackets will always try to match as many characters as possible, as long as they are followed by H. So $1 will always be "xHxx", given your example string, $2 will be "yy" etc.

    Your pattern assumes three capital letters in a string as a kind of delimiter and will not be able to use more than one "H" as the first delimiter.

    If the only difference is more than one "H" delimiter in all of your strings, you could try to run two different pattern, the second one using a non-greedy quantifier for your first grouping parentheses:

    /^(.*?)H(.*)K(.*)D(.*)$/

    Could you explain a little more detailed what your are trying to achieve? Maybe there's another way to do it.

    HTH,

    Jan
    --
    There are 10 kinds of people: those who understand binary, and those who don't
    Jan Eden Guest

  4. #3

    Default Re: all matches of a regex

    I didn't hope that the reply will be that soon:)
    Indeed I know the greedy and the non-greedy rules, which is the source of
    the problem: /

    I need a more general solution.
    What I am trying to do is to extract all sets of combinations
    when I a string matches some patterns in the example I give first pattern is
    H the second is K and th third D. I put them in the single
    /^(.*?)H(.*)K(.*)D(.*)$/ to be able to retrieve the substrings in between.
    more generall they can be more looser patterns
    for ex [HED] first
    and [KL]{3 }second
    I want to divide the sequence into substrings at the sites where the
    patterns match in all combinations and retrieve the substrings between the
    matched sites of the two patterns.
    I hope that was more clearer.
    Thanks
    oznur
    ----- Original Message -----
    From: "Jan Eden" <lists@jan-eden.de>
    To: "Öznur Tastan" <oznurtastan@su.sabanciuniv.edu>; "Perl Lists"
    <beginners@perl.org>
    Sent: Sunday, February 15, 2004 2:19 PM
    Subject: Re: all matches of a regex

    >
    > Öznur Taþtan wrote:
    >
    > >Hi,
    > >I have been trying to solve a problem which is about to drive me crazy.
    > >May be some one know the answer(hopefully:)
    > >
    > >I want to get all macthes of a pattern in a string including the
    overlaping
    > >ones.
    > >For example
    > >the string is "xHxxHyyKzDt"
    > >and the pattern is /^(.*)H(.*)K(.*)D(.*)$/
    > >
    > >so in one round of match $1=x $2=xxHyy $3=z $4=t
    > >in another $1=xHxx $2=yy $3=x $4=t
    > >
    > I am not sure what concept your are referring to by "round", but you will
    never get the first result in any "round": Your quantifiers are greedy, so
    the first pair of brackets will always try to match as many characters as
    possible, as long as they are followed by H. So $1 will always be "xHxx",
    given your example string, $2 will be "yy" etc.
    >
    > Your pattern assumes three capital letters in a string as a kind of
    delimiter and will not be able to use more than one "H" as the first
    delimiter.
    >
    > If the only difference is more than one "H" delimiter in all of your
    strings, you could try to run two different pattern, the second one using a
    non-greedy quantifier for your first grouping parentheses:
    >
    > /^(.*?)H(.*)K(.*)D(.*)$/
    >
    > Could you explain a little more detailed what your are trying to achieve?
    Maybe there's another way to do it.
    >
    > HTH,
    >
    > Jan
    > --
    > There are 10 kinds of people: those who understand binary, and those who
    don't

    Öznur tastan Guest

  5. #4

    Default Re: all matches of a regex

    Öznur tastan wrote:
    >
    > Hi,
    > I have been trying to solve a problem which is about to drive me crazy.
    > May be some one know the answer(hopefully:)
    >
    > I want to get all macthes of a pattern in a string including the overlaping ones.
    > For example
    > the string is "xHxxHyyKzDt"
    > and the pattern is /^(.*)H(.*)K(.*)D(.*)$/
    >
    > so in one round of match $1=x $2=xxHyy $3=z $4=t
    > in another $1=xHxx $2=yy $3=x $4=t
    >
    >
    > while ($sequence=~/$pattern/g )
    > doesn't work I think becaue the matches are overlapping
    >
    > while ($sequence=~/(?=$pattern)/g )
    > also doesn't work
    >
    Hi Öznur.

    The problem is that wildcards in regexes will match either the maximum
    number of characters for a match to work (.*) or the mimumum (.*?) and
    nothing in between. The only way I can think of to do this is to
    put an explicit count on your first field and try all possible values,
    like the program below. Others are likely to come up with something
    neater.

    HTH,

    Rob



    use strict;
    use warnings;;

    my $sequence = 'xHxxHyyKzDt';

    foreach my $n (1 .. length $sequence) {

    next unless $sequence =~ /^(.{$n})H(.+)K(.+)D(.+)/;

    printf "\$1 = %-6s", $1;
    printf "\$2 = %-6s", $2;
    printf "\$3 = %-6s", $3;
    printf "\$4 = %-6s", $4;
    print "\n\n";
    }

    **OUTPUT

    $1 = x $2 = xxHyy $3 = z $4 = t

    $1 = xHxx $2 = yy $3 = z $4 = t



    Rob Dixon Guest

  6. #5

    Default Re: all matches of a regex

    wow! creative!
    i think i can modify this for the general case.
    Thanks
    oznur

    ----- Original Message -----
    From: "Rob Dixon" <rob@dixon.port995.com>
    To: "Öznur taþtan" <oznurtastan@su.sabanciuniv.edu>
    Sent: Sunday, February 15, 2004 2:52 PM
    Subject: Re: all matches of a regex

    > Öznur tastan wrote:
    > >
    > > Hi,
    > > I have been trying to solve a problem which is about to drive me crazy.
    > > May be some one know the answer(hopefully:)
    > >
    > > I want to get all macthes of a pattern in a string including the
    overlaping ones.
    > > For example
    > > the string is "xHxxHyyKzDt"
    > > and the pattern is /^(.*)H(.*)K(.*)D(.*)$/
    > >
    > > so in one round of match $1=x $2=xxHyy $3=z $4=t
    > > in another $1=xHxx $2=yy $3=x $4=t
    > >
    > >
    > > while ($sequence=~/$pattern/g )
    > > doesn't work I think becaue the matches are overlapping
    > >
    > > while ($sequence=~/(?=$pattern)/g )
    > > also doesn't work
    > >
    >
    > Hi Öznur.
    >
    > The problem is that wildcards in regexes will match either the maximum
    > number of characters for a match to work (.*) or the mimumum (.*?) and
    > nothing in between. The only way I can think of to do this is to
    > put an explicit count on your first field and try all possible values,
    > like the program below. Others are likely to come up with something
    > neater.
    >
    > HTH,
    >
    > Rob
    >
    >
    >
    > use strict;
    > use warnings;;
    >
    > my $sequence = 'xHxxHyyKzDt';
    >
    > foreach my $n (1 .. length $sequence) {
    >
    > next unless $sequence =~ /^(.{$n})H(.+)K(.+)D(.+)/;
    >
    > printf "\$1 = %-6s", $1;
    > printf "\$2 = %-6s", $2;
    > printf "\$3 = %-6s", $3;
    > printf "\$4 = %-6s", $4;
    > print "\n\n";
    > }
    >
    > **OUTPUT
    >
    > $1 = x $2 = xxHyy $3 = z $4 = t
    >
    > $1 = xHxx $2 = yy $3 = z $4 = t
    >
    >
    >
    Öznur taþtan Guest

  7. #6

    Default Re: all matches of a regex

    Öznur Tastan wrote: 

    Hi Öznur.

    Yes, I did suspect my previous answer wouldn't handle the general case, but
    I hoped it may be good enough. I'm sure it's not possible using a simple regex.

    I've written subroutine split_list() below, which takes a string of characters to
    split on and a string to split, a little like the split() built-in. It finds all
    the ways to split the string at each of the characters in sequence.

    It's a recursive subroutine to make it neater. It works by finding a way to
    split on the first character in the list and then calling itself to split
    the right-hand half on the remaining characters.

    The return value is an array of all the possibilities with the split
    characters replaced with hyphens.

    Post to the list if you need any help with any part of it.

    (Thanks, this was an interesting little problem!)

    Cheers,

    Rob



    use strict;
    use warnings;

    sub split_list {

    my ($list, $string) = @_;

    return ($string) unless $list =~ s/(.)//;
    my $split = $1;

    my @splits;

    while ( $string =~ /$split/g ) {

    my $pos = pos $string;
    my $left = substr $string, 0, $pos - 1;
    my $right = substr $string, $pos;

    push @splits, map "$left-$_", split_list($list, $right);
    }

    return @splits;
    }

    my @ret = split_list ('HKD', 'xHxxHyyKzDt');

    print map "$_\n", @ret;


    **OUTPUT

    x-xxHyy-z-t
    xHxx-yy-z-t



    Rob Guest

  8. #7

    Default Re: all matches of a regex

    my $string = "pass test1\n abasdfasdfkl asd asfk as;d as asdf sdja s;lfjasldk f sdlk fjs\n\n\nal;s asdlkj fsdfjl;jkdsf pass test2\n ads;lk pass test3\n as;ljk";

    for my $val ($string =~ /pass\s(.*)\n/g) {
    print ("\n\n\n$val \n\n\n");
    }

    output:



    test1





    test2





    test3
    Unregistered Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139