Pattern match with 2 conditions

Ask a Question related to Perl / CGI, Design and Development.

  1. #1

    Default Re: Pattern match with 2 conditions

    Stephan Bour <sbour@niaid.nih.gov> writes:
    > #! usr/bin/perl -w
    use strict; # is your friend
    >
    > $/="\n>";
    >
    > while(<>){
    >
    > $seq=$_;
    what's the point of this when you just set it back to "" below?
    >
    > @fields=split(/\n/,$_);
    > $i=-1;
    > $seq="";
    >
    > foreach $line (@fields) {
    why assign $line when you never use it?
    >
    > $i++;
    > if($i==0) {
    > $def=$fields[$i];
    > }
    > else {
    > $seq.=$fields[$i];
    > }
    > }
    This is not in answer to your question, but you could clean this up
    and get rid of that nasty $i stuff with something along the lines of
    (untested):
    $def = shift(@fields);
    foreach my $line (@fields) {
    $seq .= $line
    }
    > if ($seq=~ /(.....)(LY.L..L..L)(.....)/ and $def=~ /[Homo sapiens]/) {
    The problems you asked about are in this line. They are:
    1.) [Homo sapiens] is a character class; perldoc perlre for more
    information. Drop the square brackets if (as I assume) you just
    want to check for the string "Homo sapiens" somewhere in the
    string. Or, if you need it at the beginning, use /^Homo sapiens/
    or, for that matter, ( ...and substr($def, 0, 12) eq 'Homo sapiens')
    2.) The second regex resets the $1..$3 variables to undef(), since
    they refer to the last match. You could try switching the order
    of the regexen, which might speed things up a little anyway.


    Hope this helps.

    --
    ----------------------------------------------------------------------
    Andrew J Perrin - [url]http://www.unc.edu/~aperrin[/url]
    Assistant Professor of Sociology, U of North Carolina, Chapel Hill
    [email]clists@perrin.socsci.unc.edu[/email] * andrew_perrin (at) unc.edu
    Andrew Perrin Guest

  2. Similar Questions and Discussions

    1. pattern match
      Where can I find infi or doc on "pattern match" used within WHERE clause (mysql). As I need to matche with PHP variables I'd prfer something...
    2. [ADMIN] Pattern Match
      It was Wednesday, December 10, 2003 when Rob Dixon took the soap box, saying: : Before I finally burst my cyanide capsule, may I.. ? No, you may...
    3. Pattern Match Question..
      I want to replace all instances of the combination of \" with \&quot; Using the Regex code below, I end up replacing ALL " with &guot; and all...
    4. please help !! pattern match
      Hi , I need some help me to extract a pattern. The delimiters is a pair of "abcd" and "efgh". Can some one help me with an efficient use of Greedy...
    5. uninitialized value in pattern match
      #!/usr/bin/perl use warnings; use strict "refs"; use strict "subs"; use strict "vars"; our $netscape; $netscape = ($ENV{HTTP_USER_AGENT}...
  3. #2

    Default Re: Pattern match with 2 conditions

    in article [email]847k77uthr.fsf@perrin.socsci.unc.edu[/email], Andrew Perrin CLists at
    [email]clists@perrin.socsci.unc.edu[/email] wrote on 6/27/03 16:59:
    > Stephan Bour <sbour@niaid.nih.gov> writes:
    >
    >> #! usr/bin/perl -w
    >
    > use strict; # is your friend
    >
    >>
    >> $/="\n>";
    >>
    >> while(<>){
    >>
    >> $seq=$_;
    >
    > what's the point of this when you just set it back to "" below?
    >
    >>
    >> @fields=split(/\n/,$_);
    >> $i=-1;
    >> $seq="";
    >>
    >> foreach $line (@fields) {
    >
    > why assign $line when you never use it?
    >
    >>
    >> $i++;
    >> if($i==0) {
    >> $def=$fields[$i];
    >> }
    >> else {
    >> $seq.=$fields[$i];
    >> }
    >> }
    >
    > This is not in answer to your question, but you could clean this up
    > and get rid of that nasty $i stuff with something along the lines of
    > (untested):
    > $def = shift(@fields);
    > foreach my $line (@fields) {
    > $seq .= $line
    > }
    >
    >> if ($seq=~ /(.....)(LY.L..L..L)(.....)/ and $def=~ /[Homo sapiens]/) {
    >
    > The problems you asked about are in this line. They are:
    > 1.) [Homo sapiens] is a character class; perldoc perlre for more
    > information. Drop the square brackets if (as I assume) you just
    > want to check for the string "Homo sapiens" somewhere in the
    > string. Or, if you need it at the beginning, use /^Homo sapiens/
    > or, for that matter, ( ...and substr($def, 0, 12) eq 'Homo sapiens')
    > 2.) The second regex resets the $1..$3 variables to undef(), since
    > they refer to the last match. You could try switching the order
    > of the regexen, which might speed things up a little anyway.
    >
    >
    > Hope this helps.
    You're right, you're right and you're right. It is still very slow (the
    FASTA file is over 600 MB) but it works!
    Thanks for your help,
    Stephan.

    Stephan Bour Guest

  4. #3

    Default Re: Pattern match with 2 conditions

    Stephan Bour <sbour@niaid.nih.gov> wrote:

    : I have a FASTA file containing alternated ID and sequence lines. I need to
    : find a pattern in the sequence but only return it as a match if the
    : corresponding ID line contains the string [Home sapiens]. The code below
    : returns all matches when I omit the "and $def=~ /[Homo sapiens]/" string
    ^^^^^^^^^^^^^^
    Square brackets have meaning in a regex. That will actually match any
    string containg the letters H, a, e, i, m, n, o, p, or s. Escape the
    brackets to match a literal '[Homo sapiens]' .

    $def=~ /\[Homo sapiens\]/

    Better still, use index(), which is more suited to finding literal
    substrings.

    : but gives me the following error when included:
    :
    : Use of uninitialized value in concatenation (.) at blast.pl line 26, <>
    : chunk 22308.
    : Use of uninitialized value in concatenation (.) at blast.pl line 26, <>
    : chunk 22308.
    : Use of uninitialized value in concatenation (.) at blast.pl line 26, <>
    : chunk 22308.

    Those are warnings, not errors.

    : Does that mean that $def is null in the regex?

    The warnings point at the print statement, not at the match.

    [code trimmed]

    : if ($seq=~ /(.....)(LY.L..L..L)(.....)/

    At this point, $1, $2 and $3 are defined when that match succeeds.

    : and $def=~ /[Homo sapiens]/

    And when that match succeeds, the values in $1, $2 and $3 get
    clobbered. Doh!

    : ) {
    : $j++;
    : print ">$def\n\n$1\t$2\t$3\n\n\n";
    : }

    Use the list returned by m// to keep captured substrings around,

    if (
    if (
    my @grabs = $seq=~ /(.....)(LY.L..L..L)(.....)/
    and
    $def=~ /\[Homo sapiens\]/
    ) {
    $j++;
    print ">$def\n\n$grabs[0]\t$grabs[1]\t$grabs[2]\n\n\n";
    }

    or use index() in place of the second m//,

    if(
    $seq=~ /(.....)(LY.L..L..L)(.....)/
    and
    index($def, '[Homo sapiens]') >= 0
    ) {
    $j++;
    print ">$def\n\n$1\t$2\t$3\n\n\n";
    }

    or just swap the order of the matches in the condition.

    Jay Tilton Guest

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139