Professional Web Applications Themes

getting one hash out of multiple files - PERL Beginners

Hi there! I'm fairly new to Perl and need some help to acomplish a (simple?) task. I extract strings from some logfiles, namely an ip-adress and bytes, by using regexes. I use a hash to store ip-adress and associated bytes. First i packed all logs in a temporary file but it was getting too big. Now i'm having problems to merge the hashes from the single logfiles to one hash for all. Either i have a hash for each of the files or just one for the last processed file. Thanks in advance Folker Naumann...

  1. #1

    Default getting one hash out of multiple files

    Hi there!

    I'm fairly new to Perl and need some help to acomplish a (simple?) task.

    I extract strings from some logfiles, namely an ip-adress and bytes, by
    using regexes. I use a hash to store ip-adress and associated bytes.
    First i packed all logs in a temporary file but it was getting too big.
    Now i'm having problems to merge the hashes from the single logfiles to
    one hash for all. Either i have a hash for each of the files or just one
    for the last processed file.

    Thanks in advance

    Folker Naumann




    Folker Guest

  2. #2

    Default Re: getting one hash out of multiple files



    Folker Naumann wrote: 

    Hello,
     


    Sounds like you have the hard part done :)
     


    Instead of createing a new file that has each fuile in it (doubling the
    space and memeory used) process them one ata time:

    my %ipbytes = ();


    for my $file(logfiles) {

    open LOG, $file or die $1;
    while(<LOG>) {
    my ($ip,$bytes) = split /\:/, $_; # or however you get the ip and
    bytes from a line
    $ipbytes{$ip} += $bytes; # may want top make sure $bytes is
    numeric to avoid possible errors
    }
    close LOG;
    }


    HTH :)

    Lee.M - JupiterHost.Net
    JupiterHost.Net Guest

  3. #3

    Default RE: getting one hash out of multiple files

    Folker Naumann <de> wrote:

    : I'm fairly new to Perl and need some help to acomplish a
    : (simple?) task.
    :
    : I extract strings from some logfiles, namely an ip-adress
    : and bytes, by using regexes. I use a hash to store
    : ip-adress and associated bytes. First i packed all logs
    : in a temporary file but it was getting too big.
    : Now i'm having problems to merge the hashes from the
    : single logfiles to one hash for all. Either i have a
    : hash for each of the files or just one for the last
    : processed file.


    Show us your code.


    Charles K. Clarkson
    --
    Mobile Homes Specialist
    254 968-8328

    Charles Guest

  4. #4

    Default Re: getting one hash out of multiple files

    JupiterHost.Net wrote:
     

    Hi,

    i should have include my code in the beginning, because i've done this
    already. I know that in this case a hash is generated for every file.
    But when i do the printing outside the foreach-loop just the hash of the
    last file is printed. I'm aware about that problem but could not figure
    out how to create only one hash for all files.

    Thanks

    Folker Naumann

    ----------------------------------------------------------
    (...)
    foreach $file (sortlist){

    open(LOG,$file) or die "Can't open $file: $!\n";
    lines = <LOG>;
    foreach my $logline (reverse(lines)) {

    #Search for Host-IP-Adress and bytes
    if( $logline =~ / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/ ){
    if($ipload{$1}) {$ipload{$1}+=$2}
    else {$ipload{$1}=$2}
    }
    }

    #Close log file
    close(LOG) or die "Can't close $file: $!\n";

    #Print hash sorted by Host-IP-Adress
    foreach $ip ( map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_,
    (/(\d+)$/)[0] ] } keys %ipload) {
    print "$ip = $ipload{$ip}\n";
    }

    ----------------------------------------------------------

    Folker Guest

  5. #5

    Default Re: getting one hash out of multiple files

    Always always always:

    use strict;
    use warnings;
     

    my sortlist = ...
    my %ipload =();
    foreach my $file (sortlist) {
     

    my lines = <LOG>;
     

    Wht not just
    $ipload{$1} += $2;
    instead of the if else?
     


    JupiterHost.Net Guest

  6. #6

    Default Re: getting one hash out of multiple files

    JupiterHost.Net wrote: 

    Sorry, i just used (...) to indicate that i left out some lines of code.
    Including use strict, use warnings and all initialisations.
     
    >
    >
    > Wht not just
    > $ipload{$1} += $2;
    > instead of the if else?
    >[/ref]

    Thought it would be safer, because some ip-adresses have multiple file
    entries. But i guess it's not necessary.

    Thanks

    Folker Naumann

    Folker Guest

  7. #7

    Default Re: getting one hash out of multiple files

    >> Always always always: 
    >
    > Sorry, i just used (...) to indicate that i left out some lines of code.
    > Including use strict, use warnings and all initialisations.
    >[/ref]

    Then how did the code you posted work? Non of it was initialized with
    the scope it should have been. (IE any of the places I added 'my' to in
    my previous email.
     
    >>
    >> Wht not just
    >> $ipload{$1} += $2;
    >> instead of the if else?
    >>[/ref]
    >
    > Thought it would be safer, because some ip-adresses have multiple file
    > entries. But i guess it's not necessary.[/ref]

    Don't you want the total for each IP from all the files?

    I'm still not sure where you where put all the log files into one ans
    still "already did it that way" like you said in the previous email.

    JupiterHost.Net Guest

  8. #8

    Default Re: getting one hash out of multiple files

    Try this 


    Try this script out (replacing the fake filenames with actual ones)
    and see if it will help you sort out what's happening:

    #!/usr/bin/perl

    use strict;
    use warnings;
    use Data::Dumper;

    my %ipload = ();
    my sortlist = qw(file1 file2);

    for my $file(sortlist) {
    open LOG, $file or die "Open $file: $!";
    while(<LOG>) {
    my ($ip,$bytes) = $_ =~ m/(\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+)
    [A-Z]+/;
    $ipload{$ip} += $bytes if $ip && $bytes;
    }
    close LOG;
    }

    print Dumper \%ipload;

    HTH :) - Lee.M - JupiterHost.Net
    JupiterHost.Net Guest

  9. #9

    Default Re: getting one hash out of multiple files

    Folker Naumann wrote: 

    Hello,
     

    You may be able to do this by using a tied hash which will actually store the
    hash's contents in a file.

    perldoc DB_File
    perldoc AnyDBM_File
    perldoc perldbmfilter

     

    You could use File::ReadBackwards (which is a lot more efficient) if you
    really need to this however there is no point as you are storing the data in a
    hash which will not preserve the input order.

     

    You don't need the if test as perl will do the right thing when $ipload{$1}
    doesn't exist (autovivification.) You can compress the IP address quite a bit
    by using Socket::inet_aton() which will also confirm that it is a valid IP
    address.

     

    You don't need the list slice because without the /g (global) option the
    expression can only match once. Your comment says you are sorting by IP
    address but your code says you are only sorting by the last octet in the
    address. Did you intend to sort by the complete IP address?

     

    This may work as it doesn't slurp the whole file(s) into memory:

    use warnings;
    use strict;
    use Socket;

    my %ipload;
    { local ARGV = sortlist;

    while ( <> ) {
    next unless / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/
    my $ip = inet_aton( $1 ) or do {
    warn "$1 is an invalid IP address.\n";
    next;
    };
    $ipload{ $1 } += $2
    }
    }

    #Print hash sorted by Host-IP-Adress
    for ( sort keys %ipload ) {
    my $ip = inet_ntoa( $_ );
    print "$ip = $ipload{$_}\n";
    }

    __END__



    John
    --
    use Perl;
    program
    fulfillment
    John Guest

  10. #10

    Default Re: getting one hash out of multiple files

    John W. Krahn wrote: 

    Tied hashes look fairly complicated to me, but i'll give them a try ;)
     
    >
    > You don't need the list slice because without the /g (global) option the
    > expression can only match once. Your comment says you are sorting by IP
    > address but your code says you are only sorting by the last octet in the
    > address. Did you intend to sort by the complete IP address?
    >[/ref]

    I have to admit that i'm not completly firm with the Schwartzian
    Transformation, but it does what i want. Because all adresses belong to
    only one subnet i just need to sort by the last octet and i get:

    192.168.0.1
    192.168.0.2
    ....
    192.168.0.255
     
    >
    > This may work as it doesn't slurp the whole file(s) into memory:
    >
    > use warnings;
    > use strict;
    > use Socket;
    >
    > my %ipload;
    > { local ARGV = sortlist;
    >
    > while ( <> ) {
    > next unless / (\d+\.\d+\.\d+\.\d+) \w*\/\w* (\d+) [A-Z]+/
    > my $ip = inet_aton( $1 ) or do {
    > warn "$1 is an invalid IP address.\n";
    > next;
    > };
    > $ipload{ $1 } += $2
    > }
    > }
    >
    > #Print hash sorted by Host-IP-Adress
    > for ( sort keys %ipload ) {
    > my $ip = inet_ntoa( $_ );
    > print "$ip = $ipload{$_}\n";
    > }
    >
    > __END__[/ref]

    This gives me "Bad arg length for Socket::inet_ntoa, length is 13,
    should be 4 at line..."

    Thanks

    Folker Naumann

    Folker Guest

  11. #11

    Default Re: getting one hash out of multiple files

    Folker Naumann wrote: 
    >
    > This gives me "Bad arg length for Socket::inet_ntoa, length is 13,
    > should be 4 at line..."[/ref]

    Sorry, the line:

    $ipload{ $1 } += $2

    should be:

    $ipload{ $ip } += $2



    John
    --
    use Perl;
    program
    fulfillment
    John Guest

Similar Threads

  1. Combine Multiple Files into 1 Layered Files
    By Xavier_Wynn@adobeforums.com in forum Adobe Photoshop Mac CS, CS2 & CS3
    Replies: 6
    Last Post: June 26th, 11:41 PM
  2. Creating hash with multiple keys for an array
    By Edward in forum PERL Beginners
    Replies: 3
    Last Post: August 7th, 08:59 AM
  3. Sort a hash based on values in the hash stored as arrays of hashes
    By Tore Aursand in forum PERL Miscellaneous
    Replies: 3
    Last Post: September 16th, 10:14 AM
  4. Building Hash of Arrays from multiple files
    By joeri in forum PERL Miscellaneous
    Replies: 3
    Last Post: July 17th, 03:31 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139