Professional Web Applications Themes

counting gaps in sequence data - PERL Beginners

I have a set of data that looks something like the following:   acgtt---cgatacg---acgact-----t  acgtacgatac---actgca---ac  acgata---acgatcg----acgt I am having trouble setting up a hash etc., to count the number and types of continuous gaps. For example the 'human' sequence above has 2 sets of 3 gaps and 1 set of 5 gaps. The 'chimp' has 2 sets of 3 gaps and finally the 'mouse' has 1 set of 3 gaps and 1 set of 4 gaps. So, I am having trouble being able to assign a dynamic variable (i.e. gap length) and place that in a pattern match so that ...

  1. #1

    Default counting gaps in sequence data

    I have a set of data that looks something like the following:
     
    acgtt---cgatacg---acgact-----t 
    acgtacgatac---actgca---ac 
    acgata---acgatcg----acgt

    I am having trouble setting up a hash etc., to count the number and
    types of continuous gaps. For example the 'human' sequence above has 2
    sets of 3 gaps and 1 set of 5 gaps. The 'chimp' has 2 sets of 3 gaps
    and finally the 'mouse' has 1 set of 3 gaps and 1 set of 4 gaps.

    So, I am having trouble being able to assign a dynamic variable (i.e.
    gap length) and place that in a pattern match so that it can count how
    many gaps of that length are in that particular sequence. I know how to
    set up a hash to count the number of times a gap appears:
    '$gaptype{$gap}++' or something. The problem is: what is the best way
    (and how) can I set '$gap' to be dynamic.

    I need to know the length of each consecutive string of gaps. I know
    how to count the gaps by using the 'tr' function. But it gets confusing
    when I need to add counts to every instance of that gap length. I also
    need to know the position of each gap (denoted by the position of the
    first gap in that particular instance). I know that I can use the
    'pos()' command for this.

    So, my problem is that I think I know some of the bits of code to put
    into place the problem is I am getting lost on how to structure it all
    together. For now I am just trying to get my output to look like this:

    Human
    number of 3 base pair gaps: 2
    at positions: 6, 16
    number of 5 base pair gaps: 1
    at positions: 25

    Chimp
    ..... and so on ...

    So, any suggestions would be greatly appreciated. If anyone can help me
    out with all or even just bits of this I would greatly appreciate it.
    This should help me get started on some more advanced parsing I need to
    do after this. I like to try and figure things out on my own if I can,
    so even pseudo code would be of great help!

    -Thanks
    -Mike


    Michael Guest

  2. #2

    Default Re: counting gaps in sequence data

    On Thu, 14 Oct 2004 11:02:06 -0600, Michael Robeson <com> wrote: 
    <<SNIP>> 

    Hi Mike,

    This list works best if you show us some code you have tried and
    then we can help you troubleshoot it. It lets us know that you've
    already tried to solve your problem and aren't looking for a free
    scripting service! Show us some of that code you talked about and
    we'll help you out

    --Errin

    PS: is this a common problem/exercise in some class somewhere? I keep
    seeing requests for help having to do with those exact strings of DNA
    data. Is there a bunch of people working on DNA projects using Perl
    somewhere? Or, is this some homework?
    Errin Guest

  3. #3

    Default Re: counting gaps in sequence data

    > PS: is this a common problem/exercise in some class somewhere? I keep 

    bio-informatics is a big area in which Perl is involved... there's even
    a book from O'reilly on the subject...

    also, a mailing-list is available...


    from http://lists.perl.org/ -

    bioperl-announce-l List is for people only interested in announcements
    of Bioperl code releases, updates and events.

    bioperl-l Unmoderated list for general discussion of bioperl modules,
    current & future efforts and related topics.

    -

    in the latter of the two lists, i counted about 80 messages in the first half
    of this month.

    hmm... i might join it... :) might be fun!!

    --
    Willy
    http://www.hackswell.com/corenth
    Willy Guest

  4. #4

    Default Re: counting gaps in sequence data

    On Thu, 14 Oct 2004 15:40:24 -0400, Willy West <com> wrote: 
    >
    > bio-informatics is a big area in which Perl is involved... there's even
    > a book from O'reilly on the subject...
    >
    > also, a mailing-list is available...
    >
    > from http://lists.perl.org/ -
    >
    > bioperl-announce-l List is for people only interested in announcements
    > of Bioperl code releases, updates and events.
    >
    > bioperl-l Unmoderated list for general discussion of bioperl modules,
    > current & future efforts and related topics.
    >
    > -
    >
    > in the latter of the two lists, i counted about 80 messages in the first half
    > of this month.
    >
    > hmm... i might join it... :) might be fun!!
    >
    > --
    > Willy
    > http://www.hackswell.com/corenth
    >
    >[/ref]

    If what you say is true, then maybe Mike needs to take his questions
    to those list? I mean, if the problem he's describing is common and
    the data format he's using is common, I bet it's been solved already.

    Hey mike, have you searched on CPAN (search.cpan.org) for this?
    Errin Guest

  5. #5

    Default Re: counting gaps in sequence data

    On Thu, 14 Oct 2004 14:47:57 -0500, Errin Ln <com> wrote: 
     [/ref]

     

    that might be fine- but his question is fundamentally Perl in nature-
    he may use the information for bio-informatics, but he is looking for
    an answer regarding effective Perl use when dealing with strings-
    that's classic Perl and a classic question for this list :)

    i wish i could answer his question right off the bat, but i can't :/

    awell...



    --
    Willy
    http://www.hackswell.com/corenth
    Willy Guest

  6. #6

    Default Re: counting gaps in sequence data

    On Thu, 14 Oct 2004 16:08:44 -0400, Willy West <com> wrote: 
    > [/ref]

    >
    > that might be fine- but his question is fundamentally Perl in nature-
    > he may use the information for bio-informatics, but he is looking for
    > an answer regarding effective Perl use when dealing with strings-
    > that's classic Perl and a classic question for this list :)
    >
    > i wish i could answer his question right off the bat, but i can't :/
    >
    > awell...
    >[/ref]

    very true. I've been playin' with this problem, it seems very
    fun/interesting. I was thinking this would be alot easier to play
    with if the characters in the string were all in an array. so I
    played with the length() function and the substr() function to push
    them all into an array in a loop. something like this (prolly a
    prettier/easier way to do this):

    use Data::Dumper;

    my $human = "acgtt---cgatacg---acgact-----t";
    my human;

    for my $pos( 0 .. (length($human) -1)) {
    push human, substr($human, $pos, 1);
    }

    print Dumper(\human};

    The above produces the following output, which I was thinking might be
    easier to work with for Mike's problem:

    $VAR1 = [
    'a',
    'c',
    'g',
    't',
    't',
    '-',
    '-',
    '-',
    'c',
    'g',
    'a',
    't',
    'a',
    'c',
    'g',
    '-',
    '-',
    '-',
    'a',
    'c',
    'g',
    'a',
    'c',
    't',
    '-',
    '-',
    '-',
    '-',
    '-',
    't'
    ];

    This is kinda fun!

    --Errin
    Errin Guest

  7. #7

    Default Re: counting gaps in sequence data

    On Thu, Oct 14, 2004 at 11:02:06AM -0600, Michael Robeson wrote:
     
     
     

    Here's some code that works, incorporating most of your ideas. It's
    only a partial solution, but it think it covers the bit that was causing
    you problems. I'll present it without comment, but I'm sure people will
    be able to explain anything you don't understand.

    #!/usr/bin/perl -w

    use strict;
    use Data::Dumper;

    while (<DATA>)
    {
    print, next unless /-/;
    my %gaps;
    push {$gaps{length $1}}, pos() + 1 - length $1 while /(-+)/g;
    print Dumper \%gaps;
    }

    __DATA__
    human
    acgtt---cgatacg---acgact-----t
    chimp
    acgtacgatac---actgca---ac
    mouse
    acgata---acgatcg----acgt

    --
    Paul Johnson - net
    http://www.pjcj.net
    Paul Guest

  8. #8

    Default Re: counting gaps in sequence data

    Here is as far as I can get with some real code and pseudo code. This
    is just to show you that I am actually trying. :-)

    Pseudo - code

    ################################
    # open DNA sequence file
    ################################

    print "Enter in the name of the DNA sequence file:\n";
    chomp (my $dna_seq = <STDIN>);
    open(DNA_SEQ, $dna_seq)
    or die "Can't open sequence file for input: $!";

    ################################################## ######
    # read sequence data into a hash - not sure if this is how I should do
    it?
    ################################################## ######

    my %sequences;

    $/ = '>'; # set to read in paragraph mode

    print "\n***Discovered the following DNA sequences:***\n";

    while ( <DNA_SEQ> ) {
    chomp;
    next unless s/^\s*(.+)//;
    my $name = $1;
    s/\s//g;
    $sequences{$name} = $_;
    print "$name found!\n";
    }
    close DNA_SEQ;


    ###################################
    # search for and characterize gaps
    ###################################

    <<somehow get data from hash and present it to a loop>>

    %gaptype;

    <<major pseudo code below>>
    foreach /\D(-+)\D/ found in each sequece # searches for gaps flanked by
    sequence
    $position = pos($1);
    $gaplength = $1;
    $gaplength =~ tr/-//g; # count the number of '-' for that particular
    # gap being processed
    $gaptype{gaplength}++; # count the number of times each gap type
    appears

    <<somehow get information from loop an print as seen below>>

    ____OUTPUT_____

    Human
    number of 3 base pair gaps: 2
    at positions: 6, 16
    number of 5 base pair gaps: 1
    at positions: 25
    ..
    ... and so on...
    ..


    __DATA__
    human
    acgtt---cgatacg---acgact-----t
    chimp
    acgtacgatac---actgca---ac
    mouse
    acgata---acgatcg----acgt

    Michael Guest

  9. #9

    Default Re: counting gaps in sequence data

    On Thu, 14 Oct 2004 23:23:48 +0200, Paul Johnson <net> wrote: 
    > [/ref]

    Hi Paul,
    I think you missed a critical part of Mike's post!:
     

    I've put together something that will get the first part done
    (counting base pair gaps, I guess is the point!) Code is as follows:

    use warnings;
    use strict;

    use Data::Dumper;

    my %sequences = (
    'human' => "acgtt---cgatacg---acgact-----t",
    'chimp' => "acgtt---cgatacg---acgact-----t",
    'mouse' => "acgata---acgatcg----acgt",
    );
    my %results;

    foreach my $species( keys %sequences ) {
    my $base_pair = 0;
    my $base_pair_value;
    foreach( split( / */, $sequences{$species} )) {
    if( /-/ ) {
    $base_pair = 1;
    $base_pair_value .= $_;
    } elsif( $base_pair ) {
    $results{$species}{length($base_pair_value)} += 1;
    $base_pair = 0;
    $base_pair_value = undef;
    }
    }
    }

    foreach my $species( keys %results ) {
    print "$species = $sequences{$species}\n";
    foreach my $base_pair( keys %{$results{$species}} ) {
    print " Number of $base_pair base pair
    gaps:\t$results{$species}{$base_pair}\n";
    }
    }

    This will produce the following output:

    # dnatest
    human = acgtt---cgatacg---acgact-----t
    Number of 3 base pair gaps: 2
    Number of 5 base pair gaps: 1
    chimp = acgtt---cgatacg---acgact-----t
    Number of 3 base pair gaps: 2
    Number of 5 base pair gaps: 1
    mouse = acgata---acgatcg----acgt
    Number of 4 base pair gaps: 1
    Number of 3 base pair gaps: 1

    I put the sequence in the output for easy troubleshooting and
    checking. I'm still working on figuring out the positional data.

    This IS fun. I'll post when I've got it figured out
    --Errin
    Errin Guest

  10. #10

    Default Re: counting gaps in sequence data

    On Thu, Oct 14, 2004 at 04:50:30PM -0500, Errin Ln wrote:
     

    No, I didn't miss it. I just thought it likely that Mike could get to
    there from where I left off, so I gave a solution to the bit that seemed
    most troublesome.
     

    That data is included in the code I posted.
     

    I look forward to it.

    --
    Paul Johnson - net
    http://www.pjcj.net
    Paul Guest

  11. #11

    Default Re: counting gaps in sequence data

    Yeah, I have just submitted that same question verbatim to the bio-perl
    list. I am still running through some ideas though. I have both
    Bioinformatics perl books. They are not very effective teaching books.

    The books spend too much time on using modules. Though while I
    understand the usefulness of not having to re-write code, it is a bad
    idea for beginners like me. Because re-writing code at first gives me a
    lot of practice. Some of the scripts in the books use like 3-5 modules,
    so it gets confusing on what is going on.

    I mean the books are not useless, but they definitely are structured
    for a class with a teacher.

    :-)

    -Mike

    Michael Guest

  12. #12

    Default Re: counting gaps in sequence data

    On Thursday 14 October 2004 21:44, Michael Robeson wrote: 

    Hi Michael,

    this looks quite promising - I'll try to sketch a solution without giving
    everything away ;-)
     

    This is really important: Always start your scripts with

    use warnings;
    use strict;

    This will help you avoiding errors.
     

    This part is not working completely for me - I get the first sequence only.
    I've never got used to working with the $/ operator, therefore I'd rewrite
    this part like this (without wanting to say that this would be easier than
    your approach):

    print "\n***Discovered the following DNA sequences:***\n";

    # each line holds either a caption or a sequence
    my $is_sequence_line = 0; # first line holds a caption
    my $name;
    while ( <DNA_SEQ> ) {
    chomp;

    if ($is_sequence_line) {
    # this line holds a sequence
    $sequences{$name} = $_;
    print "sequence for $name found!\n";
    }
    else {
    # this line holds a name - remember it
    print "new name : $_\n";
    $name = $_;
    }

    $is_sequence_line = ! $is_sequence_line;
    }
    close DNA_SEQ;

    print "\n";
     

    # first, we need to iterate over the sequences
    while (my ($name, $sequence) = each(%sequences)) {
    # now we can work on each sequence
    print "working on sequence $name\n";
    print "sequence : $sequence\n";
    # find all gaps in the sequence
    # (the g modifier means to search for all occurences)
    while ($sequence =~ /(-+)/g) {
    print "\n";
    # the special variables $`, $& and $' give you the
    # text left of the match, the match itself, and the
    # text right of the match.
    # Using those variables and the length() function,
    # you can print something like this:
    print "found the following gap : '" . $& . "'\n";
    print "the text before the gap : '" . $` . "'\n";
    print "the text after the gap : '" . $' . "'\n";
    print "the text before the gap is " . length ($`) . " chars long.\n";
    }
    print "\n";

    }

    I hope this might give you an approach for handling this - with the variables
    I printed, you should be able to fill your %gaptype hash like you sketched
    above (you find the length of the gap in length($&), and the gap's position
    is the same as the "length of the text left of the gap" + 1).
     

    You could either (easy variant) print this information while running through
    the while loop or (a bit harder, but might be easier in the long run if you
    need to process the data further) store the amount and position of the gaps
    inside a two-dimensional hash of arrays - something like:

    my %gaps =
    ( human =>
    { 3 => [6, 16],
    5 => [25]
    },
    chimp =>
    ...
    )

    It's a bit tricky to deal with structures like this, but once you got used to
    them, you must love them ;-)

    Hope this helps.

    Philipp
    Philipp Guest

  13. #13

    Default RegExp

    Hi,
    I have a bunch of text (multiple paragraphs). I want to find a sentence
    like 'Susan * the water' (No its not a HW problem). Here's what I am
    doing:

    #! /usr/bin/perl


    use warnings;

    $file="temp.txt";
    open(fp,$file) or die "Cant open $file :$!\n";


    local $/="";
    while (<fp>) {
    while( /(susan)(\D+)[the|water]\b/xig )
    {
    print "$1 $2\n"
    }
    }

    this catches sentences like:

    susan swam in the water for long time whereas i wud like
    susan swam in the water

    What am i doing wrong ?

    Mandar

    Mandar Guest

  14. #14

    Default Re: RegExp



    Mandar Rahurkar wrote: 

    Hello,
     


    use strict;
     

    my $file = 'temp.txt';
     

    open(FP,$file) or die "Can not open $file : $!\n";
     

    why ??
     

    while(<FP>) {
     

    (\D+) matches more than you expect. What do you mean in "Susan * the
    water" with the *?

    A word with no digits?

    Try /(susan)\s+([a-zA-Z]+)\s+the water/

    Mayeb if your a bit more descriptive of what you are trying to get

    HTH :)

    Lee.M - JupiterHost.Net
    JupiterHost.Net Guest

  15. #15

    Default Re: RegExp

    From: "JupiterHost.Net" <net> 
    >
    > use strict;

    >
    > my $file = 'temp.txt';

    >
    > open(FP,$file) or die "Can not open $file : $!\n";

    >
    > why ??[/ref]

    Because otherwise the <FP> would read individual lines. And the
    sentence can easily be split to several lines.
    If you set $/ to "" then the file is read in paragraphs, not lines.
    That is each <FP> now returns several lines until it finds an empty
    one.
     
    >
    > while(<FP>) {
    > [/ref]

    \D means "non-digit", What's wrong with sentence
    Susan filled 3 glasses with the water?
    And the \D+ matches as much text as it can, it won't stop on the
    first "the water".

    [the|water] means exactly the same as for example [aehrtw|]. Not what
    you meant right? I believe you want something like this:

    while (/(susan[^.?!]+the\s+water\b)/ig) {
    print $1,"\n";
    }

    Jenda
    ===== cz === http://Jenda.Krynicky.cz =====
    When it comes to wine, women and song, wizards are allowed
    to get drunk and croon as much as they like.
    -- Terry Pratchett in Sourcery

    Jenda Guest

  16. #16

    Default Re: RegExp

    >>>local $/=""; 
    >
    >
    > Because otherwise the <FP> would read individual lines. And the
    > sentence can easily be split to several lines.
    > If you set $/ to "" then the file is read in paragraphs, not lines.
    > That is each <FP> now returns several lines until it finds an empty
    > one.[/ref]

    You rock Jenda! I always learn something new from you. Thanks

    Lee.M - JupiterHost.Net

    JupiterHost.Net Guest

  17. #17

    Default Re: counting gaps in sequence data

    On Thu, 14 Oct 2004 16:11:42 -0600, Michael Robeson <com> wrote: 

    Hi again, Mike!

    I've thrown together the following code. I have not commented this!
    If you have some questions, just ask. I hard coded the sequences for
    my ease-of-use. It looked to me like you have figured out how to grab
    the sequences out of a file and throw them in a hash. This code uses
    some deep nested references, and therefore, some crazy dereferences.
    Have fun with it, I know I did! Things that might look weird: check
    out perldoc -f split for info on using a null-string to split with
    (That's were I found it!) and of course perldoc perlref for all the
    deep nested references and dereferencing stuff! I'm currently reading
    "Learning Perl Objects, References & Modules" by Randal Schwartz. I
    highly recommend it. It helped a lot in this exercise. Here's the
    code:

    use warnings;
    use strict;

    my %sequences = (
    'Human' => "acgtt---cgatacg---acgact-----t",
    'Chimp' => "acgtt---cgatacg---acgact-----t",
    'Mouse' => "acgata---acgatcg----acgt",
    );
    my %results;

    foreach my $species( keys %sequences ) {
    my $is_base_pair_gap = 0;
    my $base_pair_gap;
    my $base_pair_gap_pos;
    my $position = 1;
    foreach( split( / */, $sequences{$species} )) {
    if( /-/ ) {
    unless( $is_base_pair_gap ) {
    $base_pair_gap_pos = $position;
    }
    $is_base_pair_gap = 1;
    $base_pair_gap .= $_;
    } elsif( $is_base_pair_gap ) {
    push
    {$results{$species}{length($base_pair_gap)}}, $base_pair_gap_pos;
    $is_base_pair_gap = 0;
    $base_pair_gap = undef;
    }
    $position++;
    }
    }

    foreach my $species( keys %results ) {
    print "$species:\n";
    foreach my $base_pair_gap( keys %{$results{$species}} ) {
    print " Number of $base_pair_gap base pair gaps:\t",
    scalar( {$results{$species}{$base_pair_gap}}), "\n";
    print " at position(s) ", join( ',',
    {$results{$species}{$base_pair_gap}} ), ".\n";
    }
    print "\n";
    }




    The heart of this code is this line:
    push {$results{$species}{length($base_pair_gap)}}, $base_pair_gap_pos;

    there is a %results hash which has keys that are the different
    species, and values that point to another hash. THAT hash (the inner
    hash) has keys that are the length of the base-pair-gaps, and values
    that point to an array. The array holds a list of the positions of
    those base-pair gaps! The first base pair gap in the human sequence
    is '---' at the 6th character. That looks like this (warning: pseudo
    code for clarity!)
    %results->{'Human'}->{ 3 }->[6]
    When we find the second '---' gap, we add it's position to the array:
    %results->{'Human'}->{ 3 }->[6,16]
    Then, we find a new base-pair-gap ('-----') so we add a new key to inner hash:
    %results->{'Human'}->{ 3 }->[6,16]
    ->{ 5 }->[25]
    Next, we move on to the next species ...
    %results->{'Human'}->{ 3 }->[6,16]
    ->{ 5 }->[25]
    ->{'Mouse'}->{ 3 }->[7]

    So, finally, with Data::Dumper, we can see the %results hash when the
    code is done processing the sequence:

    %results = {
    'Human' => {
    '3' => [
    6,
    16
    ],
    '5' => [
    25
    ]
    },
    'Mouse' => {
    '4' => [
    17
    ],
    '3' => [
    7
    ]
    },
    'Chimp' => {
    '3' => [
    6,
    16
    ],
    '5' => [
    25
    ]
    }
    };


    I hope this is helpful! This really was a lot of fun.

    --Errin
    Errin Guest

  18. #18

    Default Re: counting gaps in sequence data

    On Thu, 14 Oct 2004 15:40:24 -0400, Willy West wrote:
     
    >
    > bio-informatics is a big area in which Perl is involved... there's
    > even a book from O'reilly on the subject...[/ref]

    I think you'll find it's called bioinformatics, and yes it's a huge
    (scientific) field.

    The reason why perl is used so much in bioinformatics is that the majority
    of biological data out there is textual and there's terabytes of it around
    (if not more). There are large databases with +100K entries which are
    pd to extract either protein or DNA data (or other data) for further
    ysis. Look here if you're interested in finding out more:
    http://www.ebi.ac.uk
     

    The problem with bioperl mailing lists is that they are primarily for
    discussing BioPerl, which is a collection of perl modules designed to aid
    the anaylsis of biological data. The problem I find with BioPerl is that
    it isn't very good IMHO. In fact, no-one I know uses it and the few times
    I've tried using it I ended getting so frustrated with how it was written
    that I ended up writing my own code from scratch.

    I actually think there should a specific bioinformatics ng which is
    language agnostic to aid solving common problems, of which there are many.

    Chris.
    Chris Guest

  19. #19

    Default Re: counting gaps in sequence data

    Errin,

    Thanks so much! I will spend the weekend going over what you've posted.
    Looks like I will learn a lot from this post alone. This stuff is so
    addictive. I can spend hours doing this and not realize it. If I am
    successful or not is another story! :-)

    I'll definitely let you know if I have any trouble.

    -Cheers!
    -Mike

    On Oct 15, 2004, at 7:22 AM, Errin Ln wrote:
     
    >
    > Hi again, Mike!
    >
    > I've thrown together the following code. I have not commented this!
    > If you have some questions, just ask. I hard coded the sequences for
    > my ease-of-use. It looked to me like you have figured out how to grab
    > the sequences out of a file and throw them in a hash. This code uses
    > some deep nested references, and therefore, some crazy dereferences.
    > Have fun with it, I know I did! Things that might look weird: check
    > out perldoc -f split for info on using a null-string to split with
    > (That's were I found it!) and of course perldoc perlref for all the
    > deep nested references and dereferencing stuff! I'm currently reading
    > "Learning Perl Objects, References & Modules" by Randal Schwartz. I
    > highly recommend it. It helped a lot in this exercise. Here's the
    > code:
    >
    > use warnings;
    > use strict;
    >
    > my %sequences = (
    > 'Human' => "acgtt---cgatacg---acgact-----t",
    > 'Chimp' => "acgtt---cgatacg---acgact-----t",
    > 'Mouse' => "acgata---acgatcg----acgt",
    > );
    > my %results;
    >
    > foreach my $species( keys %sequences ) {
    > my $is_base_pair_gap = 0;
    > my $base_pair_gap;
    > my $base_pair_gap_pos;
    > my $position = 1;
    > foreach( split( / */, $sequences{$species} )) {
    > if( /-/ ) {
    > unless( $is_base_pair_gap ) {
    > $base_pair_gap_pos = $position;
    > }
    > $is_base_pair_gap = 1;
    > $base_pair_gap .= $_;
    > } elsif( $is_base_pair_gap ) {
    > push
    > {$results{$species}{length($base_pair_gap)}}, $base_pair_gap_pos;
    > $is_base_pair_gap = 0;
    > $base_pair_gap = undef;
    > }
    > $position++;
    > }
    > }
    >
    > foreach my $species( keys %results ) {
    > print "$species:\n";
    > foreach my $base_pair_gap( keys %{$results{$species}} ) {
    > print " Number of $base_pair_gap base pair gaps:\t",
    > scalar( {$results{$species}{$base_pair_gap}}), "\n";
    > print " at position(s) ", join( ',',
    > {$results{$species}{$base_pair_gap}} ), ".\n";
    > }
    > print "\n";
    > }
    >
    >
    >
    >
    > The heart of this code is this line:
    > push {$results{$species}{length($base_pair_gap)}}, $base_pair_gap_pos;
    >
    > there is a %results hash which has keys that are the different
    > species, and values that point to another hash. THAT hash (the inner
    > hash) has keys that are the length of the base-pair-gaps, and values
    > that point to an array. The array holds a list of the positions of
    > those base-pair gaps! The first base pair gap in the human sequence
    > is '---' at the 6th character. That looks like this (warning: pseudo
    > code for clarity!)
    > %results->{'Human'}->{ 3 }->[6]
    > When we find the second '---' gap, we add it's position to the array:
    > %results->{'Human'}->{ 3 }->[6,16]
    > Then, we find a new base-pair-gap ('-----') so we add a new key to
    > inner hash:
    > %results->{'Human'}->{ 3 }->[6,16]
    > ->{ 5 }->[25]
    > Next, we move on to the next species ...
    > %results->{'Human'}->{ 3 }->[6,16]
    > ->{ 5 }->[25]
    > ->{'Mouse'}->{ 3 }->[7]
    >
    > So, finally, with Data::Dumper, we can see the %results hash when the
    > code is done processing the sequence:
    >
    > %results = {
    > 'Human' => {
    > '3' => [
    > 6,
    > 16
    > ],
    > '5' => [
    > 25
    > ]
    > },
    > 'Mouse' => {
    > '4' => [
    > 17
    > ],
    > '3' => [
    > 7
    > ]
    > },
    > 'Chimp' => {
    > '3' => [
    > 6,
    > 16
    > ],
    > '5' => [
    > 25
    > ]
    > }
    > };
    >
    >
    > I hope this is helpful! This really was a lot of fun.
    >
    > --Errin[/ref]

    Michael Guest

Similar Threads

  1. I need some help closing these gaps (CS)
    By fritsche@adobeforums.com in forum Adobe Illustrator Windows
    Replies: 12
    Last Post: July 27th, 06:56 PM
  2. Gaps in page
    By nism in forum Macromedia Flash Sitedesign
    Replies: 2
    Last Post: June 2nd, 11:13 AM
  3. Huge Gaps
    By K in forum Web Design
    Replies: 3
    Last Post: May 2nd, 09:12 PM
  4. SELECTing sequence of data
    By Jacco in forum Microsoft SQL / MS SQL Server
    Replies: 8
    Last Post: July 22nd, 07:49 PM
  5. Gaps between Images
    By jojo in forum Macromedia Dreamweaver
    Replies: 2
    Last Post: July 9th, 08:44 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139