Ask a Question related to PERL Beginners, Design and Development.

  1. #1

    Default formatting the loop

    Hi all!

    Well, based on the input I have received from everyone thus far I have
    been able to cobble the following code together (See below for the
    input and out put of of this script).

    Anyway, though it works great I am having a tough time trying to figure
    out WHY it works. I am especially having trouble with the line: "next
    unless s/^\s*(\S+)//" in relation to the while loop it is in.
    Basically, I do not understand how the script is differentiating the
    ">bob" line in the input from the lines of "agactgatcg" (again see
    input and output at bottom). I know that the "$/" has something to do
    with this, but I am not sure how or why it works.

    I hate to sound like a dummy, but if anyone can help me understand WHAT
    the script is doing in the "while loop" I would really appreciate it. I
    think if I can understand the mechanics behind this script it will only
    help me my future understanding of writing PERL scripts. Especially,
    when it comes to regular expressions and loops. Heck, if there is a
    better way to do certain parts of this let me know! Also, special
    thanks to James Gray for the help thus far!! Till then, I'll be
    wracking my head with my PERL books!

    The working script:
    _________

    #!/usr/bin/perl

    use warnings;
    use strict;

    print "Enter the path of the INFILE to be processed:\n";

    # For example "rotifer.txt" or "../Desktop/Folder/rotifer.txt"

    chomp (my $infile = <STDIN>);

    open(INFILE, $infile)
    or die "Can't open INFILE for input: $!";

    print "Enter in the path of the OUTFILE:\n";

    # For example "rotifer_out.txt" or "../Desktop/Folder/rotifer_out.txt"

    chomp (my $outfile = <STDIN>);

    open(OUTFILE, ">$outfile")
    or die "Can't open OUTFILE for input: $!";

    print "Enter in the LENGTH you want the sequence to be:\n";
    my ( $len ) = <STDIN> =~ /(\d+)/ or die "Invalid length parameter";


    print OUTFILE "R 1 $len\n\n\n\n"; # The top of the file.

    $/ = '>'; # Set input operator

    while ( <INFILE> ) {
    chomp;
    next unless s/^\s*(\S+)//;
    my $name = $1;
    my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ];
    my $sequence = join( ' ', @char);
    $sequence =~ tr/Tt/Uu/;
    print OUTFILE " $sequence $name\n";
    }


    close INFILE;
    close OUTFILE;

    ___________

    Again this script is to convert the following data existing as either
    single line or multiline sequence data:

    ### input type 1 ###
    >bob
    atcgactagcatcgatcg
    acacgtacgactagcac
    >fred
    actgactacgatcgaca
    acgcgcgatacggcat
    #####

    or (as I posted originally)

    ### input type 2 ###
    >bob
    atcgactagcatcgatcgacacgtacgactagcac
    >fred
    actgactacgatcgacaacgcgcgatacggcat
    #####

    ###output##
    ## Note that the T's are converted to U's in the output! ##

    R 1 42


    a u c g a c u a g c a u c g a u c g a c a c g u a c g a c u a g c a c
    - - - - - - - bob
    a c u g a c u a c g a u c g a c a a c g c g c g a u a c g g c a u - -
    - - - - - - - fred

    ####


    Michael S. Robeson II Guest

  2. Similar Questions and Discussions

    1. Loop option set, but flash doesn't loop
      I'm loading some swf files on my website and they all use the same code. They include the loop=true command but some loop and some don't. Does...
    2. Can a film loop play once, then loop on the last frame(s)?
      I need a film loop to play once, then loop playback on the last frame so I can keep the LOOP of the film loop set. This will allow the tell commands...
    3. Film loop rollovers working with tell sprite, but only if Loop is checked
      on mouseWithin me cursor 280 tell sprite 40 --the sprite containing the film loop sprite(60).member = member("networkmapsbuttonroll") --swapping...
    4. Urgent: Repeat loop and Film loop clash!
      Hi All, Scenario I have a script running in which the spelling which was typed in by the student is corrected. The alphabets are moved to...
    5. Help with loop inside loop and mysql queries
      Hi List. I cannot see my error: I have relation tables setup. main id entity_name main_type etc etc date_in 1 test type1 x y 2003-06-02...
  3. #2

    Default Re: formatting the loop

    On Feb 11, 2004, at 1:27 PM, Michael S. Robeson II wrote:

    [snip]
    > Anyway, though it works great I am having a tough time trying to
    > figure out WHY it works.
    See comments below, in the code.

    [snip]
    > I think if I can understand the mechanics behind this script it will
    > only help me my future understanding of writing PERL scripts.
    Perl. The language you are learning is called Perl, not PERL. :)

    [snip]
    > The working script:
    > _________
    >
    > #!/usr/bin/perl
    >
    > use warnings;
    > use strict;
    >
    > print "Enter the path of the INFILE to be processed:\n";
    >
    > # For example "rotifer.txt" or "../Desktop/Folder/rotifer.txt"
    >
    > chomp (my $infile = <STDIN>);
    >
    > open(INFILE, $infile)
    > or die "Can't open INFILE for input: $!";
    >
    > print "Enter in the path of the OUTFILE:\n";
    >
    > # For example "rotifer_out.txt" or "../Desktop/Folder/rotifer_out.txt"
    >
    > chomp (my $outfile = <STDIN>);
    >
    > open(OUTFILE, ">$outfile")
    > or die "Can't open OUTFILE for input: $!";
    >
    > print "Enter in the LENGTH you want the sequence to be:\n";
    > my ( $len ) = <STDIN> =~ /(\d+)/ or die "Invalid length parameter";
    >
    >
    > print OUTFILE "R 1 $len\n\n\n\n"; # The top of the file.
    >
    > $/ = '>'; # Set input operator
    Here's most of the magic. This sets Perl's input separator to a >
    character. That means that <INFILE> won't return a sequence of
    characters ending in a \n like it usually does, but a sequence of
    characters ending in a >. It basically jumps name to name, in other
    words.
    > while ( <INFILE> ) {
    > chomp;
    chomp() will remove the trailing >.
    > next unless s/^\s*(\S+)//;
    > my $name = $1;
    Well, if we're reading name to name, the thing right a the beginning of
    our sequence is going to be a name, right? The above removes the name,
    and saves it for later use.
    > my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ];
    If I may, yuck! This builds up a list of all the A-Za-z characters in
    the string, adds a boat load of extra - characters, trims the whole
    list to the length you want and stuffs all that inside @char. It's
    also receives a rank of "awful", on the James Gray Scale of
    Readability. ;)
    > my $sequence = join( ' ', @char);
    join() the sequence on spaces.
    > $sequence =~ tr/Tt/Uu/;
    Convert formats.
    > print OUTFILE " $sequence $name\n";
    Send it out.
    > }
    >
    >
    > close INFILE;
    > close OUTFILE;
    Hope that helps.

    James

    James Edward Gray II Guest

  4. #3

    Default Re: formatting the loop

    James Edward Gray II wrote:
    >
    > > $/ = '>'; # Set input operator
    >
    > Here's most of the magic.
    Exactly. If you don't believe in magic, don't write in Perl:
    most people don't.

    Rob


    Rob Dixon Guest

  5. #4

    Default Re: formatting the loop

    See comments below.

    On Feb 11, 2004, at 2:55 PM, James Edward Gray II wrote:
    > On Feb 11, 2004, at 1:27 PM, Michael S. Robeson II wrote:
    >
    > [snip]
    >
    >> Anyway, though it works great I am having a tough time trying to
    >> figure out WHY it works.
    >
    > See comments below, in the code.
    >
    > [snip]
    >
    >> I think if I can understand the mechanics behind this script it will
    >> only help me my future understanding of writing PERL scripts.
    >
    > Perl. The language you are learning is called Perl, not PERL. :)
    >
    Hehe, thanks. :-)
    > [snip]
    >
    [snip]
    >> $/ = '>'; # Set input operator
    >
    > Here's most of the magic. This sets Perl's input separator to a >
    > character. That means that <INFILE> won't return a sequence of
    > characters ending in a \n like it usually does, but a sequence of
    > characters ending in a >. It basically jumps name to name, in other
    > words.
    >
    >> while ( <INFILE> ) {
    >> chomp;
    >
    > chomp() will remove the trailing >.
    OK that makes pretty good sense. I understand that now, I hope. See
    next comment.

    >
    >> next unless s/^\s*(\S+)//;
    >> my $name = $1;
    >
    > Well, if we're reading name to name, the thing right a the beginning
    > of our sequence is going to be a name, right? The above removes the
    > name, and saves it for later use.
    OK, I think this is were my problem is. That is how does it know that
    the characters as in "bob" or "fred" are the names and not mistaking
    the sequence of letters "agtcaccgatg" to be placed in memory ($name).
    Basically I am reading the following:

    next unless s/^\s*(\S+)//;

    as "Go to the next line unless you see a line with zero or more
    whitespace characters followed by one or more non-whitespace characters
    and save the non-whitespace characters in memory." If this is correct
    then how can perl tell the difference between the lines containing
    "bob" or "fred" (and put then in memory) and the "acgatctagc" (and not
    put these in memory) because both lines of data seem to fit the
    expression pattern to me. I think it has something to do with how perl
    is reading through the file that makes this work?

    So, there is something I am "missing", not noticing or realizing here.
    Maybe I've been staring at the code for far to long and should take a
    break! :-)

    >
    >> my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ];
    >
    > If I may, yuck! This builds up a list of all the A-Za-z characters in
    > the string, adds a boat load of extra - characters, trims the whole
    > list to the length you want and stuffs all that inside @char. It's
    > also receives a rank of "awful", on the James Gray Scale of
    > Readability. ;)
    >
    Yeah, I need to clean that up a bit!

    [snip]

    -Mike

    Michael S. Robeson II Guest

  6. #5

    Default Re: formatting the loop

    On Feb 11, 2004, at 2:35 PM, Michael S. Robeson II wrote:
    >>> next unless s/^\s*(\S+)//;
    >>> my $name = $1;
    >>
    >> Well, if we're reading name to name, the thing right a the beginning
    >> of our sequence is going to be a name, right? The above removes the
    >> name, and saves it for later use.
    >
    > OK, I think this is were my problem is. That is how does it know that
    > the characters as in "bob" or "fred" are the names and not mistaking
    > the sequence of letters "agtcaccgatg" to be placed in memory ($name).
    > Basically I am reading the following:
    >
    > next unless s/^\s*(\S+)//;
    >
    > as "Go to the next line
    Not line. We're not reading lines anymore. We're reading chunks of
    characters ending in a >, remember?
    > unless you see a line with zero or more whitespace characters followed
    > by one or more non-whitespace characters
    Not quite. ^ matching at the beginning of our chunk, not the beginning
    of a line. It's "unless you start with zero-or more whitespace
    characters, following by one or more non-white-space characters..."

    Those "one or more non-white-space characters" are going to be the name
    at the beginning. There's also going to be a \n (a whitespace
    character) at the end of that name, to keep it from going into the
    sequence.
    > and save the non-whitespace characters in memory."
    In my English, it reads, "Unless you can rip a name off the front of
    this chunk, skip it." ;) So the only time it ever does any skipping,
    is if the whole chunk is whitespace (or nothing), which would keep it
    from finding a name. I imagine this only skips the very first read,
    which probably won't have anything interesting between the front of the
    file and the first > character.
    > If this is correct then how can perl tell the difference between the
    > lines containing "bob" or "fred" (and put then in memory) and the
    > "acgatctagc" (and not put these in memory) because both lines of data
    > seem to fit the expression pattern to me. I think it has something to
    > do with how perl is reading through the file that makes this work?
    Yes, it's reading > to >. Also, ^ matches at the beginning of a
    string, not a line, by default.

    That's why I gave you the paragraph version earlier today. I thought
    it was a little easier to follow. ;)
    > So, there is something I am "missing", not noticing or realizing here.
    > Maybe I've been staring at the code for far to long and should take a
    > break! :-)
    Definitely. Have a break. It clears the mind. Come back refreshed
    and reread this message until you break through the fog.

    Or just ask more questions and I'll try again. :D

    James

    James Edward Gray II Guest

  7. #6

    Default Re: formatting the loop

    On Feb 11, 2004, at 2:55 PM, James Edward Gray II wrote:

    [snip]
    > my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ];
    >
    > If I may, yuck! This builds up a list of all the A-Za-z characters in
    > the string, adds a boat load of extra - characters, trims the whole
    > list to the length you want and stuffs all that inside @char. It's
    > also receives a rank of "awful", on the James Gray Scale of
    > Readability. ;)
    [snip]

    Ok, now I understand. I found that my problem was with how the "next"
    command was operating in conjunction with the grouping of characters.
    Ok, making progress. :-)

    Now, about that array slice I have:

    my @char = ( /[a-z]/ig, ( '-' ) x $len) [0 .. $len - 1];

    I know it wastes a lot of memory and makes perl do much extra work.
    However, when I try to replace that line with something like this:

    my @char = ( /[a-z]/ig, ( '-' ) x ($len - length) ;

    it doesn't work the way I thought it would (gee what a thought). I
    would like to express the code similar to
    ( '-' ) x ($len - length)
    because it is easy for me to read and it tells you clearly what is
    going on. However, every time I try to implement something like that I
    get unexpected output or I have to really rewrite the loop. Which I
    have been unable to troubleshot as you have been seeing. :-) I think
    the 'length' command it also counting any '\n' characters or something,
    because my out put ends up with different lengths like this when I use
    the ($len - length) way :

    a c u g a c g a g u - - - - - - - - bob
    a c u g a c u a g c u g - - - - - - - fred

    with this input:
    >bob
    actgacgagt
    >fred
    actgactagctg


    The reason I went with /[a-z]/ig is because some sequence data uses
    other letters to denote ambiguity and other things. I guess I can only
    list the letters it uses but I was just lazy and typed in the entire
    range of "a to z".

    I will be continuing to work on it but here is the code as it stands
    now (with that awful array slice).

    #!/usr/bin/perl

    use warnings;
    use strict;

    print "Enter the path of the INFILE to be processed:\n";

    # For example "rotifer.txt" or "../Desktop/Folder/rotifer.txt"

    chomp (my $infile = <STDIN>);

    open(INFILE, $infile)
    or die "Can't open INFILE for input: $!";

    print "Enter in the path of the OUTFILE:\n";

    # For example "rotifer_out.txt" or "../Desktop/Folder/rotifer_out.txt"

    chomp (my $outfile = <STDIN>);

    open(OUTFILE, ">$outfile")
    or die "Can't open OUTFILE for input: $!";

    print "Enter in the LENGTH you want the sequence to be:\n";
    my ( $len ) = <STDIN> =~ /(\d+)/ or die "Invalid length parameter";


    print OUTFILE "R 1 $len\n\n\n\n"; # The top of the file is supposed

    $/ = '>'; # Set input operator

    while ( <INFILE> ) {
    chomp;
    next unless s/^\s*(.+)//; # delete name and place in memory
    my $name = $1; # what ever in memory saved as $name
    my @char = ( /[a-z]/ig, ( '-' ) x $len) [0 .. $len -1]; # take only
    sequence letters and
    # and add '-' to the end
    my $sequence = join( ' ', @char); # turn into scalar
    $sequence =~ tr/Tt/Uu/; # convert T's to U's
    print OUTFILE " $sequence $name\n";
    }


    close INFILE;
    close OUTFILE;

    Michael S. Robeson II Guest

  8. #7

    Default Re: formatting the loop

    On Feb 12, 2004, at 10:06 AM, Michael S. Robeson II wrote:
    > On Feb 11, 2004, at 2:55 PM, James Edward Gray II wrote:
    >
    > [snip]
    >
    >> my @char = ( /[a-z]/ig, ( '-' ) x $len )[ 0 .. $len - 1 ];
    >>
    >> If I may, yuck! This builds up a list of all the A-Za-z characters
    >> in the string, adds a boat load of extra - characters, trims the
    >> whole list to the length you want and stuffs all that inside @char.
    >> It's also receives a rank of "awful", on the James Gray Scale of
    >> Readability. ;)
    >
    > [snip]
    >
    > Ok, now I understand. I found that my problem was with how the "next"
    > command was operating in conjunction with the grouping of characters.
    > Ok, making progress. :-)
    Excellent. I knew we would get there.
    > Now, about that array slice I have:
    >
    > my @char = ( /[a-z]/ig, ( '-' ) x $len) [0 .. $len - 1];
    >
    > I know it wastes a lot of memory and makes perl do much extra work.
    > However, when I try to replace that line with something like this:
    Ah, it's pretty small potatoes to quibble over, really. I don't think
    it's in any danger of slowing your code significantly or making you buy
    more RAM.
    > my @char = ( /[a-z]/ig, ( '-' ) x ($len - length) ;
    >
    > it doesn't work the way I thought it would (gee what a thought). I
    > would like to express the code similar to ( '-' ) x ($len - length)
    Na, something like this won't work because you won't know the length of
    those characters, until you stick the somewhere. Length by default
    works on $_, which still contains a big mess of sequence characters and
    whitespace.

    I think your big hang up is trying to do it all in one line. Two or
    three is fine, right? <laughs> And of course, there's nothing wrong
    with the current solution. It does work. You only need to replace it
    if you want to. There's always more than one way. Use what you like.

    I'll see if I can add a suggestion below...
    > #!/usr/bin/perl
    >
    > use warnings;
    > use strict;
    >
    > print "Enter the path of the INFILE to be processed:\n";
    >
    > # For example "rotifer.txt" or "../Desktop/Folder/rotifer.txt"
    >
    > chomp (my $infile = <STDIN>);
    >
    > open(INFILE, $infile)
    > or die "Can't open INFILE for input: $!";
    >
    > print "Enter in the path of the OUTFILE:\n";
    >
    > # For example "rotifer_out.txt" or "../Desktop/Folder/rotifer_out.txt"
    >
    > chomp (my $outfile = <STDIN>);
    >
    > open(OUTFILE, ">$outfile")
    > or die "Can't open OUTFILE for input: $!";
    >
    > print "Enter in the LENGTH you want the sequence to be:\n";
    > my ( $len ) = <STDIN> =~ /(\d+)/ or die "Invalid length parameter";
    >
    >
    > print OUTFILE "R 1 $len\n\n\n\n"; # The top of the file is supposed
    >
    > $/ = '>'; # Set input operator
    >
    > while ( <INFILE> ) {
    > chomp;
    > next unless s/^\s*(.+)//; # delete name and place in memory
    > my $name = $1; # what ever in memory saved as $name
    Right here, $_ holds our sequence, plus some junk. We can just work
    with $_ then, if we want to.
    > my @char = ( /[a-z]/ig, ( '-' ) x $len) [0 .. $len -1]; # take
    > only sequence letters and
    > # and add '-' to the end
    > my $sequence = join( ' ', @char); # turn into scalar
    >
    Alternative to the above two lines:

    tr/A-Za-z//cd; # remove junk from $_
    $_ .= '-' x ($len - length) if length() < $len; # add dashes
    s/\b|\B/ /g; # space out
    > $sequence =~ tr/Tt/Uu/; # convert T's to U's
    Then this would become:

    tr/Tt/Uu/;
    > print OUTFILE " $sequence $name\n";
    And this:

    print OUTFILE "$_ $name\n";
    > }
    >
    >
    > close INFILE;
    > close OUTFILE;
    Hope that helps.

    James

    James Edward Gray II Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139