Professional Web Applications Themes

How to speed up processing two big files - PERL Beginners

Hi, I have two big text files, I need to read one line from first file, write some information from this line to a new file, and search second file to find lines with same control_id, and write more information to the new file, I wrote in perl, but it tooks half day to finish joining the two files. Do you have any suggest? Below are some of my code. ================================================== ================== #!/usr/bin/perl -w #use IO::FILE; #use strict 'subs'; # $file1="file1.txt"; $file2="file2.txt"; open (SOURCE, "$file1") or die "can't open the $file1: $!"; while (<SOURCE>) { $control_id = substr($_, 0, 22); ...

  1. #1

    Default How to speed up processing two big files

    Hi,
    I have two big text files, I need to read one line from first file, write
    some information from this line to a new file, and search second file to
    find lines with same control_id, and write more information to the new file,
    I wrote in perl, but it tooks half day to finish joining the two files.
    Do you have any suggest?

    Below are some of my code.
    ================================================== ==================

    #!/usr/bin/perl -w
    #use IO::FILE;
    #use strict 'subs';
    #

    $file1="file1.txt";
    $file2="file2.txt";

    open (SOURCE, "$file1")
    or die "can't open the $file1: $!";

    while (<SOURCE>) {
    $control_id = substr($_, 0, 22);

    open (SINK, ">>newFile.dat")
    or die "can't open the newFile.dat: $!";

    print SINK $control_id;
    #write more to newFile.dat

    open (ADDSOURCE, "$file2")
    or die "can't open the $file2: $!";

    while (<ADDSOURCE>) {
    if ($_ =~ /^$control_id/) {
    print SINK substr($_, 31, 3);
    #write more to newFile.dat
    $weight = substr($_, 48, 7);
    $totalWeight += $weight;
    $_ = <ADDSOURCE>;
    while ($_ =~ /^$control_id/) {
    print SINK substr($_, 31, 3);
    #write more to newFile.dat
    $weight = substr($_, 48, 7);
    $totalWeight += $weight;
    $_ = <ADDSOURCE>;
    }#end of while
    print SINK "$totalWeight";
    seek(ADDSOURCE, 0, 2)
    or die "Couldn't seek to the end: $!\n";

    }#end of if
    }#end of while for ADDSOURCE

    close(ADDSOURCE) or die "can't close $ADDSOURCE: $!\n";
    close(SINK) or die "can't close $SINK: $!\n";
    } #end of while for SOURCE
    close(SOURCE) or die "can't close $SOURCE: $!\n";


    ================================================== ==========================
    ===========

    Hannah Guest

  2. #2

    Default Re: How to speed up processing two big files

    On Monday 12 July 2004 10:59, Tang, Hannah (NIH/NLM) wrote: 

    Hello,
     

    You don't need to open this file inside the loop. Open it once before
    the loop starts.

     

    You are doing way too much inside the while loop. This may not speed
    up your program but it will make it a lot easier to read. :-)

    while ( <ADDSOURCE> ) {
    next unless /^$control_id/;
    print SINK substr $_, 31, 3;
    #write more to newFile.dat
    $totalWeight += substr $_, 48, 7;
    }
    print SINK $totalWeight;

     


    Can you fit all of the control ids from "file1.txt" into an array or
    hash in memory? Perhaps a tied hash will help.

    #!/usr/bin/perl -w
    use strict;
    # UNTESTED !!

    my $file1 = 'file1.txt';
    my $file2 = 'file2.txt';

    open SOURCE, $file1 or die "can't open the $file1: $!";
    my ( $order, %control_ids );
    while ( <SOURCE> ) {
    $control_ids{ substr $_, 0, 22 } = {
    order => ++$order,
    field => [], # don't know what to call this?
    weight => 0,
    };
    }
    close SOURCE or die "can't close $file1: $!\n";

    open ADDSOURCE, $file2 or die "can't open the $file2: $!";
    while ( <ADDSOURCE> ) {
    my $id = substr $_, 0, 22;
    next unless exists $control_ids{ $id };
    push { $control_ids{ $id }{ field } }, substr $_, 31, 3;
    $control_ids{ $id }{ weight } += substr $_, 48, 7;
    }
    close ADDSOURCE or die "can't close $file2: $!\n";

    open SINK, '>>newFile.dat' or die "can't open the newFile.dat: $!";
    for my $id ( sort { $control_ids{ $a }{ order } <=> $control_ids{ $a }{ order } } keys %control_ids ) {
    print SINK $id, { $control_ids{ $id }{ field } }, $control_ids{ $id }{ weight };
    }
    close SINK or die "can't close newFile.dat: $!\n";

    __END__



    John
    --
    use Perl;
    program
    fulfillment
    John Guest

  3. #3

    Default Re: How to speed up processing two big files

    On Monday 12 July 2004 12:20, John W. Krahn wrote: 

    Oops, that _should_ be:

    for my $id ( sort { $control_ids{ $a }{ order } <=> $control_ids{ $b }{ order } } keys %control_ids ) {

     


    John
    --
    use Perl;
    program
    fulfillment

    John Guest

Similar Threads

  1. Processing speed in CF 4.5/5.0
    By jamest827 in forum Macromedia ColdFusion
    Replies: 2
    Last Post: June 2nd, 03:44 PM
  2. Does splitting of the DB increase processing speed?
    By Bulkis in forum Coldfusion Database Access
    Replies: 1
    Last Post: March 22nd, 03:42 AM
  3. using ink for transparency slows processing speed
    By kovu in forum Macromedia Director Basics
    Replies: 3
    Last Post: September 29th, 06:40 PM
  4. Batch processing Raw files in CS
    By Lynne_Siler@adobeforums.com in forum Adobe Photoshop Mac CS, CS2 & CS3
    Replies: 4
    Last Post: April 27th, 05:41 PM
  5. Replies: 2
    Last Post: July 5th, 11:12 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139