Ask a Question related to Perl / CGI, Design and Development.
-
Andrew Perrin #1
Re: Pattern match with 2 conditions
Stephan Bour <sbour@niaid.nih.gov> writes:
use strict; # is your friend> #! usr/bin/perl -w
what's the point of this when you just set it back to "" below?>
> $/="\n>";
>
> while(<>){
>
> $seq=$_;
why assign $line when you never use it?>
> @fields=split(/\n/,$_);
> $i=-1;
> $seq="";
>
> foreach $line (@fields) {
This is not in answer to your question, but you could clean this up>
> $i++;
> if($i==0) {
> $def=$fields[$i];
> }
> else {
> $seq.=$fields[$i];
> }
> }
and get rid of that nasty $i stuff with something along the lines of
(untested):
$def = shift(@fields);
foreach my $line (@fields) {
$seq .= $line
}
The problems you asked about are in this line. They are:> if ($seq=~ /(.....)(LY.L..L..L)(.....)/ and $def=~ /[Homo sapiens]/) {
1.) [Homo sapiens] is a character class; perldoc perlre for more
information. Drop the square brackets if (as I assume) you just
want to check for the string "Homo sapiens" somewhere in the
string. Or, if you need it at the beginning, use /^Homo sapiens/
or, for that matter, ( ...and substr($def, 0, 12) eq 'Homo sapiens')
2.) The second regex resets the $1..$3 variables to undef(), since
they refer to the last match. You could try switching the order
of the regexen, which might speed things up a little anyway.
Hope this helps.
--
----------------------------------------------------------------------
Andrew J Perrin - [url]http://www.unc.edu/~aperrin[/url]
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
[email]clists@perrin.socsci.unc.edu[/email] * andrew_perrin (at) unc.edu
Andrew Perrin Guest
-
pattern match
Where can I find infi or doc on "pattern match" used within WHERE clause (mysql). As I need to matche with PHP variables I'd prfer something... -
[ADMIN] Pattern Match
It was Wednesday, December 10, 2003 when Rob Dixon took the soap box, saying: : Before I finally burst my cyanide capsule, may I.. ? No, you may... -
Pattern Match Question..
I want to replace all instances of the combination of \" with \" Using the Regex code below, I end up replacing ALL " with &guot; and all... -
please help !! pattern match
Hi , I need some help me to extract a pattern. The delimiters is a pair of "abcd" and "efgh". Can some one help me with an efficient use of Greedy... -
uninitialized value in pattern match
#!/usr/bin/perl use warnings; use strict "refs"; use strict "subs"; use strict "vars"; our $netscape; $netscape = ($ENV{HTTP_USER_AGENT}... -
Stephan Bour #2
Re: Pattern match with 2 conditions
in article [email]847k77uthr.fsf@perrin.socsci.unc.edu[/email], Andrew Perrin CLists at
[email]clists@perrin.socsci.unc.edu[/email] wrote on 6/27/03 16:59:
You're right, you're right and you're right. It is still very slow (the> Stephan Bour <sbour@niaid.nih.gov> writes:
>>>> #! usr/bin/perl -w
> use strict; # is your friend
>>>>
>> $/="\n>";
>>
>> while(<>){
>>
>> $seq=$_;
> what's the point of this when you just set it back to "" below?
>>>>
>> @fields=split(/\n/,$_);
>> $i=-1;
>> $seq="";
>>
>> foreach $line (@fields) {
> why assign $line when you never use it?
>>>>
>> $i++;
>> if($i==0) {
>> $def=$fields[$i];
>> }
>> else {
>> $seq.=$fields[$i];
>> }
>> }
> This is not in answer to your question, but you could clean this up
> and get rid of that nasty $i stuff with something along the lines of
> (untested):
> $def = shift(@fields);
> foreach my $line (@fields) {
> $seq .= $line
> }
>>>> if ($seq=~ /(.....)(LY.L..L..L)(.....)/ and $def=~ /[Homo sapiens]/) {
> The problems you asked about are in this line. They are:
> 1.) [Homo sapiens] is a character class; perldoc perlre for more
> information. Drop the square brackets if (as I assume) you just
> want to check for the string "Homo sapiens" somewhere in the
> string. Or, if you need it at the beginning, use /^Homo sapiens/
> or, for that matter, ( ...and substr($def, 0, 12) eq 'Homo sapiens')
> 2.) The second regex resets the $1..$3 variables to undef(), since
> they refer to the last match. You could try switching the order
> of the regexen, which might speed things up a little anyway.
>
>
> Hope this helps.
FASTA file is over 600 MB) but it works!
Thanks for your help,
Stephan.
Stephan Bour Guest
-
Jay Tilton #3
Re: Pattern match with 2 conditions
Stephan Bour <sbour@niaid.nih.gov> wrote:
: I have a FASTA file containing alternated ID and sequence lines. I need to
: find a pattern in the sequence but only return it as a match if the
: corresponding ID line contains the string [Home sapiens]. The code below
: returns all matches when I omit the "and $def=~ /[Homo sapiens]/" string
^^^^^^^^^^^^^^
Square brackets have meaning in a regex. That will actually match any
string containg the letters H, a, e, i, m, n, o, p, or s. Escape the
brackets to match a literal '[Homo sapiens]' .
$def=~ /\[Homo sapiens\]/
Better still, use index(), which is more suited to finding literal
substrings.
: but gives me the following error when included:
:
: Use of uninitialized value in concatenation (.) at blast.pl line 26, <>
: chunk 22308.
: Use of uninitialized value in concatenation (.) at blast.pl line 26, <>
: chunk 22308.
: Use of uninitialized value in concatenation (.) at blast.pl line 26, <>
: chunk 22308.
Those are warnings, not errors.
: Does that mean that $def is null in the regex?
The warnings point at the print statement, not at the match.
[code trimmed]
: if ($seq=~ /(.....)(LY.L..L..L)(.....)/
At this point, $1, $2 and $3 are defined when that match succeeds.
: and $def=~ /[Homo sapiens]/
And when that match succeeds, the values in $1, $2 and $3 get
clobbered. Doh!
: ) {
: $j++;
: print ">$def\n\n$1\t$2\t$3\n\n\n";
: }
Use the list returned by m// to keep captured substrings around,
if (
if (
my @grabs = $seq=~ /(.....)(LY.L..L..L)(.....)/
and
$def=~ /\[Homo sapiens\]/
) {
$j++;
print ">$def\n\n$grabs[0]\t$grabs[1]\t$grabs[2]\n\n\n";
}
or use index() in place of the second m//,
if(
$seq=~ /(.....)(LY.L..L..L)(.....)/
and
index($def, '[Homo sapiens]') >= 0
) {
$j++;
print ">$def\n\n$1\t$2\t$3\n\n\n";
}
or just swap the order of the matches in the condition.
Jay Tilton Guest



Reply With Quote

