"David Byrne" <dbyrne_commerce> wrote in message news:20030804204911.18868.qmailweb40706.mail.yaho o.com... > I am fairly new to Perl and haven't approached a scipt > this complex or computation this intensive. So I > would certainly appreciate any advice. > > I have successfully created a hash of arrays > equivalent to a 122 x 6152 matrix that I want to run > in 'pairwise combinations' and execute the 'sum of the > difference squares' for each combination. > > In other words: > columns: y1...y122 > rows: x1...x6152 > > so... > comb(y1,y2): > {( y1[x1] - y2[x1] ) ^2 + ( y1[x2] ...
"David Byrne" <dbyrne_commerce> wrote in message news:20030804204911.18868.qmailweb40706.mail.yaho o.com...Hi David.> I am fairly new to Perl and haven't approached a scipt
> this complex or computation this intensive. So I
> would certainly appreciate any advice.
>
> I have successfully created a hash of arrays
> equivalent to a 122 x 6152 matrix that I want to run
> in 'pairwise combinations' and execute the 'sum of the
> difference squares' for each combination.
>
> In other words:
> columns: y1...y122
> rows: x1...x6152
>
> so...
> comb(y1,y2):
> {( y1[x1] - y2[x1] ) ^2 + ( y1[x2] - y2[x2] ) ^2 + ...
> + ( y1[x122] - y2[x122] ) ^2};
>
> comb(y1,y3):
> {( y1[x1] - y3[x1] ) ^2 + ( y1[x2] - y3[x2] ) ^2 + ...
> + ( y1[x122] - y3[x122] ) ^2};.
> .
> .
> comb(y1,y6152)
> comb(y2,y3)
> .
> .
> comb(y2,y6152)
> comb(y3,y4)
> .
> .
> etc.
>
> This is going to be very large. According to the
> combinations formula (nCk, n=6152, k=2), the output
> will be a hash (with, for example, 'y1y2' key and
> 'd^2' value) of about 19 million records.
>
> I think my next step is to create a combinations
> formula, but I'm having problems doing so.
This should do the trick, although without sight of your
original data structure I can't be sure. I assume you
have a hash of array references.
This works by grabbing a list of all the hash keys
into an array and then executing two nested loops.
The outer loop shifts the first key value off the
array (y1) and assigns it to $ya. The inner loop
just cycles $yb through the remaining values in the
list (y2, y3 etc). Next time around the outer loop
the next value (y2) is assigned to $ya and $yb will
be set to y3, y4 etc. The very inner loop just scans
through the data calculating the sum of squared
differences for each value pair and stuffs the answer
into hash %results using a key formed by concatenating
$ya and $yb.
I haven't tested this as it would be a pain to set up
some data, so I'll let you test it for me. I know it
compiles OK!
HTH,
Rob
use strict;
use warnings;
my %data;
my keys = keys %data;
my %results;
while ( keys ) {
my $ya = shift keys;
foreach my $yb (keys) {
my xa = {$data{$ya}};
my xb = {$data{$yb}};
for (my $i = 0; $i < xa and $i < xb; ++$i) {
$results{$ya.$yb} += ($xa[$i] - $xb[$i]) * ($xa[$i] - $xb[$i]);
}
}
}
Bookmarks