Ask a Question related to PERL Beginners, Design and Development.
-
Lone Wolf #1
Make this into a script to parse?
I'm back to dealing with the main issue of a badly formatted file being
brought down from an archaic system and needing to be cleaned up before
being passed to another user or a database table. I have the code
below, which pulls the whole file in and parse it line by line. That
problem is still that when the stuff is done parsing the file, the file
still has a ton of white spaces left in it.
What I would like to do is when I first open the file (another piece of
this massive script) is tell it to just run a sub program on each piece
that does the same thing as the stuff below, unfortunately I am not sure
of the way to do this.
This piece I DO have:
sub cleanup{
use strict;
my $file = "info/bad.sql";
my $newfile = "info/inventory.sql";
my $line;
open (OLDFILE, "< $file");
open (NEWFILE, "> $newfile");
while ($line = <OLDFILE>) {
$line =~ s/^ //mg;
$line =~ s/ $//mg;
$line =~ s/\t/|/mg;
$line =~ s/\s+/ /mg;
$line =~ s/^\s*//mg;
$line =~ s/\s*$//mg;
$line =~ s/\s*$//mg;
### The following lines mod the files to reflect inches and feet
$line =~ s/(?<=\d)"/in. /mg;
$line =~ s/(?<=\d)'/ft. /mg;
$line =~ s/^\s+//mg;
$line =~ s/\s+$//mg;
# $line =~ s/\s*\|\s*//mg;
### $line =~ s/ |/|/mg;
### $line =~ s/| /|/mg;
print NEWFILE "$line\n";
}
close OLDFILE;
close NEWFILE;
print "$newfile has now been created\n";
}
The first pass of the code which piece of the array of data into another
location further back in the file:
sub MySQL_id_data
{
$database_file = "info/salesa1";
open(INF,$database_file) or dienice("Can't open $database_file: $!
\n");
@grok = <INF>;
close(INF);
$file1 = "info/salesa1-data";
open (FILE, ">$file1") || die "Can't write to $file1 : error $!\n";
$inv = 1;
foreach $i (@grok)
{
chomp($i);
($item_num,$item_desc,$b1,$b2,$b3,$b4,$cc,$vn,$qoh ,$qc,$qor,$bc,$sc,$yp)
= split(/\|/,$i);
print FILE
"$inv|$item_num|$item_desc|$b1|$b2|$b3|$b4|$cc|$vn |$qoh|$qc|$qor|$bc|$it
em_num|$sc|$yp\n";
$inv++;
}
close FILE;
}
HELP!!
Thanks,
Robert
Lone Wolf Guest
-
Script to parse files
I've been working with this since wolf and jeff and john sent me some stuff, I think I actually based everything on wolf's code excerpts. I'm sure... -
How to parse large script faster
I've created a perl script automatically based on an ini file (we want to replace the ini file holding a number of rules by regular expressions in... -
#25348 [Opn->Csd]: make install: "parse error"
ID: 25348 Updated by: sniper@php.net Reported By: rjmooney at lsb dot syr dot edu -Status: Open +Status: ... -
#25348 [NEW]: make install: "parse error"
From: rjmooney at lsb dot syr dot edu Operating system: OpenBSD 3.2 PHP version: 4.3.3 PHP Bug Type: Reproducible crash Bug... -
how do you create a script that make a DUN
I need to know how do to write a script to create and configure a DUN connection -
Jeff 'Japhy' Pinyan #2
Re: Make this into a script to parse?
On Feb 4, Lone Wolf said:
>I'm back to dealing with the main issue of a badly formatted file being
>brought down from an archaic system and needing to be cleaned up before
>being passed to another user or a database table. I have the code
>below, which pulls the whole file in and parse it line by line. That
>problem is still that when the stuff is done parsing the file, the file
>still has a ton of white spaces left in it.These regexes (above and below) have NO need for the /m modifier, and only> open (OLDFILE, "< $file");
> open (NEWFILE, "> $newfile");
> while ($line = <OLDFILE>) {
> $line =~ s/^ //mg;
> $line =~ s/ $//mg;
> $line =~ s/\t/|/mg;
> $line =~ s/\s+/ /mg;
> $line =~ s/^\s*//mg;
> $line =~ s/\s*$//mg;
> $line =~ s/\s*$//mg;
a few of them have any need for the /g modifier.
$line =~ s/^\s+//; # remove leading spaces
$line =~ s/\s+$/; # remove trailing spaces
$line =~ tr/\t/|/; # change all \t's to |'s
$line =~ tr/ //s; # squash multiple spaces on one space
Those four lines (two regexes, two transliterations) do what the seven
lines above them do.
Still don't need the /m modifier.> $line =~ s/(?<=\d)"/in. /mg;
> $line =~ s/(?<=\d)'/ft. /mg;
The first one is totally useless, and the second is only needed because> $line =~ s/^\s+//mg;
> $line =~ s/\s+$//mg;
it's possible $line now ends in "in. ", which means the trailing space
should be removed. The solution, then, is to do the two \d regexes FIRST,
and THEN do the other regexes.
Are those not needed, or commented out because they're not working># $line =~ s/\s*\|\s*//mg;
>### $line =~ s/ |/|/mg;
>### $line =~ s/| /|/mg;
properly?
> print NEWFILE "$line\n";
> }
> close OLDFILE;
> close NEWFILE;
>
> print "$newfile has now been created\n";
>}There's no reason to slurp a file into an array. Just loop over the lines>sub MySQL_id_data {
> $database_file = "info/salesa1";
> open(INF,$database_file) or dienice("Can't open $database_file: $!\n");
> @grok = <INF>;
> close(INF);
of the file like you have with the while loop above.
Oh good God. Do you know what that for loop is DOING?> $file1 = "info/salesa1-data";
> open (FILE, ">$file1") || die "Can't write to $file1 : error $!\n";
> $inv = 1;
>
> foreach $i (@grok) {
> chomp($i);
>
>($item_num,$item_desc,$b1,$b2,$b3,$b4,$cc,$vn,$qo h,$qc,$qor,$bc,$sc,$yp)
>= split(/\|/,$i);
> print FILE
>"$inv|$item_num|$item_desc|$b1|$b2|$b3|$b4|$cc|$v n|$qoh|$qc|$qor|$bc|$it
>em_num|$sc|$yp\n";
> $inv++;
> }
for each element in @grok:
remove the newline
split it on pipes into some variables
print $inv, those variables with pipes in between, and add a newline
That is terribly insane.
Here's my rewrite:> close FILE;
>}
sub MySQL_id_data {
my $db_file = "info/salesa1";
my $info_file = "$db_file-data";
open DB, "< $db_file" or dienice("can't open $db_file: $!");
open INFO, "> $info_file" or dience("can't write $info_file: $!");
print INFO "$.|$_" while <DB>;
close INFO;
close DB;
}
--
Jeff "japhy" Pinyan [email]japhy@pobox.com[/email] [url]http://www.pobox.com/~japhy/[/url]
RPI Acacia brother #734 [url]http://www.perlmonks.org/[/url] [url]http://www.cpan.org/[/url]
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
[ I'm looking for programming work. If you like my work, let me know. ]
Jeff 'Japhy' Pinyan Guest
-
Wolf Blaum #3
Re: Make this into a script to parse?
For Quality purpouses, Lone Wolf 's mail on Thursday 05 February 2004 00:52
may have been monitored or recorded as:I assume by saying you are back that you are talking ofyour thread from 12/17:> I'm back to dealing with the main issue of a badly formatted file being
> brought down from an archaic system and needing to be cleaned up before
> being passed to another user or a database table. I have the code
"get rid of whitesace around pipes??".
did you try something like> below, which pulls the whole file in and parse it line by line. That
> problem is still that when the stuff is done parsing the file, the file
> still has a ton of white spaces left in it.
my @fields = split /\s*\|\s*/, $line;
as suggested by James, Jeff and Randy?
Why didnt it work - the problem looks still pretty much the same, does it?
Frankly, after a while of looking at your code Im still not sure what you want> What I would like to do is when I first open the file (another piece of
> this massive script) is tell it to just run a sub program on each piece
> that does the same thing as the stuff below, unfortunately I am not sure
> of the way to do this.
do - that might be due to my ignorance, but you would really help me (and I
guess others too) understand, if you could post some sample data before they
go into your program and a line of how you expect thme to look like after
they were processed by your code - I guess that would make it easier to
figure out, where what goes how (or so).
Wolf
Wolf Blaum Guest
-
Lone Wolf #4
RE: Make this into a script to parse?
I tried the my @fields and I did not get it to work, probably because my
coding skills have not improved enough lately to be worthy of perl.
Thank goodness I never said I had perfect code, because I would
definitely be lying.
I attached 2 files, one the beginning data, the other the .sql file that
I load into MySQL database. The files are about 3000 lines before and
after so I cut out the first 30 lines and put them in the files to the
list.
What I need to figure out is how to make a sub call that when I pull in
the file will remove all extraneous white space. Something I can copy
into another Perl program to parse another set of files (ARGH!). I've
learned not to tell the bosses I can write a script to handle the errors
of the salesmen. I currently use a back piece of PHP coding to handle
the extra spaces in the pages that use the data, but for another project
I can't use that work-around.
I know I can do something along the lines of:
(from an HTML generating page with a sort)
foreach $i (sort ByName @grok)
{
chomp($i);
($type,$description,$parts,$numb) = split(/\|/,$i);
print <<INFO2;
<tr><td>$type</td><td>$description</td><td>$parts</td><td>$numb</td></trINFO2>
}
The sub program:
sub ByName {
@a = split(/\|/,$a);
@b = split(/\|/,$b);
$a[1] cmp $b[1];
}
But I am still not sure how to make the $i go through, and it is
probably something simple I am missing.
Thanks!!
Robert
Lone Wolf Guest
-
John McKown #5
Re: Make this into a script to parse?
On Wed, 4 Feb 2004, Jeff 'japhy' Pinyan wrote:
<snip>Jeff, The input and output lines are not identical. The output line>> >
> > foreach $i (@grok) {
> > chomp($i);
> >
> >($item_num,$item_desc,$b1,$b2,$b3,$b4,$cc,$vn,$qo h,$qc,$qor,$bc,$sc,$yp)
> >= split(/\|/,$i);
> > print FILE
> >"$inv|$item_num|$item_desc|$b1|$b2|$b3|$b4|$cc|$v n|$qoh|$qc|$qor|$bc|$it
> >em_num|$sc|$yp\n";
> > $inv++;
> > }
> Oh good God. Do you know what that for loop is DOING?
>
> for each element in @grok:
> remove the newline
> split it on pipes into some variables
> print $inv, those variables with pipes in between, and add a newline
>
> That is terribly insane.
prefixes $inv at the front and inserts $item_num between $bc and $sc. I
don't know why $item_num is repeated. Granted that I think a more
efficient construct might be:
my ($item_num,$a,$b) = $i =~ /(.*?|)((?:.*?|){11})(.*)/;
print LINE "$inv|$item_num|$a|$item_num|$b\n";
I think that I have that right. Well, assuming that the original is
correct.
--
Maranatha!
John McKown
John McKown Guest
-
Jeff 'Japhy' Pinyan #6
Re: Make this into a script to parse?
On Feb 4, John McKown said:
Bah, I missed that. Then I'd use split(), but just use an array.>On Wed, 4 Feb 2004, Jeff 'japhy' Pinyan wrote:
>>>>>> > foreach $i (@grok) {
>> > chomp($i);
>> >
>> >($item_num,$item_desc,$b1,$b2,$b3,$b4,$cc,$vn,$qo h,$qc,$qor,$bc,$sc,$yp)
>> >= split(/\|/,$i);
>> > print FILE
>> >"$inv|$item_num|$item_desc|$b1|$b2|$b3|$b4|$cc|$v n|$qoh|$qc|$qor|$bc|$it
>> >em_num|$sc|$yp\n";
>> > $inv++;
>> > }
>> Oh good God. Do you know what that for loop is DOING?
>> That is terribly insane.
>Jeff, The input and output lines are not identical. The output line
>prefixes $inv at the front and inserts $item_num between $bc and $sc. I
>don't know why $item_num is repeated. Granted that I think a more
>efficient construct might be:
while (<IN>) {
local $" = "|";
my @fields = split /\|/;
print OUT "$.|@fields[0..11,0,12..13]";
}
But this begs the question, WHY does item_num have to be used TWICE in the
SAME line of data. This smells of poor coding on the other side. It's
still ugly.
--
Jeff "japhy" Pinyan [email]japhy@pobox.com[/email] [url]http://www.pobox.com/~japhy/[/url]
RPI Acacia brother #734 [url]http://www.perlmonks.org/[/url] [url]http://www.cpan.org/[/url]
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
[ I'm looking for programming work. If you like my work, let me know. ]
Jeff 'Japhy' Pinyan Guest
-
Wolf Blaum #7
Re: Make this into a script to parse?
For Quality purpouses, Lone Wolf 's mail on Thursday 05 February 2004 04:23
may have been monitored or recorded as:
Hi
no worries - I post code to get feedback. Thats the whole ideaof learning it.> Thank goodness I never said I had perfect code, because I would
> definitely be lying.
Ok - then, again: Do not read these files into mem at once unless you really> I attached 2 files, one the beginning data, the other the .sql file that
> I load into MySQL database. The files are about 3000 lines before and
> after so I cut out the first 30 lines and put them in the files to the
> list.
have to (which should be close to never).
here is a script that uses your given data:
---snip---
#!/usr/bin/perl
use strict;
use warnings;
my (@fields, $lng);
opendir INDIR , "./sql" or die "Can't open dir with before files:$!";
foreach my $infile (grep {!/^\./} readdir INDIR) {
#read all the files in your home/sql dir
#read only files that do not start with a .
my ($i,$rec);
open INFILE, "<./sql//$infile" or die "Can't open $infile: $!";
open OUTFILE, ">./${infile}.out" or die "Can't open ${infile}.out at home:
$!";
while (<INFILE>) {
$rec++;
chomp;
@fields = split /\s*\|\s*/, $_;
$fields[0] =~ s/^\s+//;
#there is probably a way to get rid of the trailing spaces in the first
entry using split,I just couldnt think of any.
$lng = @fields unless $lng; #set $lng for first record
print "The following record: $i has ", scalar @fields, " fields as compared
to $lng fields in the first record! Skip. : $_\n" and next unless $lng ==
@fields;
#poor quality control of your input data: check if all reords have the same
number of fields or skip and print record otherwise.
$i++;
print OUTFILE $i;
print OUTFILE "|$_" foreach (@fields);
print OUTFILE "|$fields[0]\n"; #your trailing ID
}
close INFILE;
close OUTFILE;
print "Read $rec records from ./sql/$infile and printed $i into ./
${infile}.out\n";
}
closedir INDIR;
---snap---
A couple of hints:
The script reads all files in the sql subdir of your home dir and produces the
corrosponding filname.out in your homedir.
the split splits as written by Jeff et al.
I coulndt think of a better way to substtute the leading spaces for the first
field.
Anyone better suggestions?
you end up with a final \n in each outfile.
You rewrite it into a sub by substititing the line
foreach my $infile (grep {!/^\./} readdir INDIR) {
with
sub whatever{
....
foreach my $infile (@_) {
and call th sub with
&whatever ("file1", "file2", ...);
of course you may want to change the open statements to, if you dont have your
infiles in ./sql
Hope that gets you started, Wolf
Wolf Blaum Guest
-
Wolf Blaum #8
Re: Make this into a script to parse?
For Quality purpouses, wolf blaum 's mail on Thursday 05 February 2004 06:07
may have been monitored or recorded as:
shame on me: of course it reads all the files in the sub dir sql of the> The script reads all files in the sql subdir of your home dir and produces
> the corrosponding filname.out in your homedir.
CURRENT DIR, not the home dir. use ~/ if you want your homedir...
Well, if been here a while...
Something else i forgot: why do you need the count on the beginning of the
line? I hope not as a unique (primary) key for the dbtable you feed that
into.There should be an AUTO_INCREMENT in your DB for that.
And talking about DBs:
According to te 3rd rule of Normalisation as outlined by e.f.codd of ibm in
the 1970s: (to that i was arround at this time...)
"An Entity is said to be in 3rd normal form if it is allready in 2nd normal
form and no nonidentifying attributs are dependent on any other
nonidentifying attributs."
The repeat of a value like $fields[0] clearly violates this rule.
See [url]www.databasejournal.com/sqletc/article.php/1428511[/url]
on Db Design.
Good night, wolf
Wolf Blaum Guest
-
R. Joseph Newton #9
Re: Make this into a script to parse?
John McKown wrote:
No John,>
> my ($item_num,$a,$b) = $i =~ /(.*?|)((?:.*?|){11})(.*)/;
> print LINE "$inv|$item_num|$a|$item_num|$b\n";
>
> I think that I have that right. Well, assuming that the original is
> correct.
If you are using $a and $b as variables in any context other than the sort
built-in function, then you do not have it right. Choose meaningful variable
names.
Joseph
R. Joseph Newton Guest



Reply With Quote

