Reading Data File Records

Ask a Question related to PERL Miscellaneous, Design and Development.

  1. #1

    Default Reading Data File Records

    I'm a little frustrated with Perl's line-by-line file reading and I am
    hoping that someone can help me.

    I have a data file that looks like:

    --
    ! Comment 1
    ! Comment 2
    ! Comment ...
    5 ! number of levels
    *aaa [aaa units] ! space deliminated is common
    1.0 2.0 3.0 4.0 5.0
    *bbb [bbb units] ! csv is possible
    1.0, 2.0, 3.0,
    4.0 5.0
    *ccc [ccc units] ! the file is written from fortran and the number of
    columns is not fixed
    10.0
    20.0
    30.0
    40.0
    50.0
    ....
    --

    Essentially, there is a header block that always begins with '!' in
    the first column. This is followed by the number of elements in each
    data block and an unknown number of data blocks having a set number of
    elements.

    The file is generated using about five lines of FORTRAN so it seems
    somehwat surprising that I am up to 30 lines of perl with almost no
    end in sight... Does anyone have an example showing how to process a
    file in blocks using Perl?

    Thanks,
    Graham
    Graham Guest

  2. Similar Questions and Discussions

    1. Reading - Parsing Records From An LDAP LDIF File In .Net?
      Reading - Parsing Records From An LDAP LDIF File In .Net? I am in need of a .Net class that will allow for the parsing of a LDAP LDIF file. An...
    2. Data File, turn fields on mulitple lines into records on one li ne.. .
      Please bottom post... John just provided a good one. Optionally if all you care about is determining whether a line contains a string it might...
    3. Data File, turn fields on mulitple lines into records on one li ne.. .
      Taylor Lewick wrote: Yes, you have the file name in the wrong place. The syntax is "do something to FILE | pipe data to second process | pipe...
    4. Reading a data file into dropdown box values
      Hi, I'm trying to modify a web page file uploader which uses dropdown boxes for directory level choices. Because I have several of these boxes...
    5. reading data from a text file
      Hi, Simple question but I cant seem to find an answer in the help files: How do I read certain text from a text file, as in a string. What should...
  3. #2

    Default Re: Reading Data File Records


    "Graham" <GrahamWilsonCA@yahoo.ca> wrote in message
    news:eda30d78.0309090714.2a6f6431@posting.google.c om...
    > I'm a little frustrated with Perl's line-by-line file reading and I am
    > hoping that someone can help me.
    >
    > I have a data file that looks like:
    >
    > --
    > ! Comment 1
    > ! Comment 2
    > ! Comment ...
    > 5 ! number of levels
    > *aaa [aaa units] ! space deliminated is common
    > 1.0 2.0 3.0 4.0 5.0
    > *bbb [bbb units] ! csv is possible
    > 1.0, 2.0, 3.0,
    > 4.0 5.0
    > *ccc [ccc units] ! the file is written from fortran and the number of
    > columns is not fixed
    > 10.0
    > 20.0
    > 30.0
    > 40.0
    > 50.0
    > ...
    > --
    >
    > Essentially, there is a header block that always begins with '!' in
    > the first column. This is followed by the number of elements in each
    > data block and an unknown number of data blocks having a set number of
    > elements.
    >
    > The file is generated using about five lines of FORTRAN so it seems
    > somehwat surprising that I am up to 30 lines of perl with almost no
    > end in sight... Does anyone have an example showing how to process a
    > file in blocks using Perl?

    What do you want to do with it?

    --
    Brian Wakem


    Brian Wakem Guest

  4. #3

    Default Re: Reading Data File Records

    On 9 Sep 2003 08:14:57 -0700
    [email]GrahamWilsonCA@yahoo.ca[/email] (Graham) wrote:
    <snip>
    > The file is generated using about five lines of FORTRAN so it seems
    > somehwat surprising that I am up to 30 lines of perl with almost no
    > end in sight... Does anyone have an example showing how to process
    > a file in blocks using Perl?
    Post your code - I have no idea what you are trying to do. Maybe it's
    just me ;)

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. [url]http://www.gnu.org/licenses/gpl.txt[/url]
    for more information.

    a fortune quote ...
    You cannot kill time without injuring eternity.

    James Willmore Guest

  5. #4

    Default Re: Reading Data File Records

    "Graham" <GrahamWilsonCA@yahoo.ca> wrote in message ...
    [snip..]
    > The file is generated using about five lines of FORTRAN so it seems
    > somehwat surprising that I am up to 30 lines of perl with almost no
    > end in sight... Does anyone have an example showing how to process a
    > file in blocks using Perl?
    I would download the File::Slurp module from cpan and installed it.
    [url]http://search.cpan.org/author/MUIR/File-Slurp-2004.0904/[/url]

    ====
    #!/usr/bin/perl
    use File::Slurp;

    @allLines = read_file("data_file_name");
    foreach my $line (@allLine) {
    # in case you need process each line
    if ($line =~ /^!/) { # comment lines }
    else { # datalines}
    }


    Tulan W. Hu Guest

  6. #5

    Default Re: Reading Data File Records

    [email]GrahamWilsonCA@yahoo.ca[/email] (Graham) wrote:

    : I have a data file that looks like:
    :
    : --
    : ! Comment 1
    : ! Comment 2
    : ! Comment ...
    : 5 ! number of levels
    : *aaa [aaa units] ! space deliminated is common
    : 1.0 2.0 3.0 4.0 5.0
    : *bbb [bbb units] ! csv is possible
    : 1.0, 2.0, 3.0,
    : 4.0 5.0
    ^
    ^
    Should there be a comma between those two values?

    : *ccc [ccc units] ! the file is written from fortran and the number of
    : columns is not fixed

    Is this really how the data file is formatted, or did your newsreader
    word-wrap that line for you?

    : 10.0
    : 20.0
    : 30.0
    : 40.0
    : 50.0
    : ...
    : --
    :
    : Essentially, there is a header block that always begins with '!' in
    : the first column. This is followed by the number of elements in each
    : data block and an unknown number of data blocks having a set number of
    : elements.

    The problem is determining where one block ends and another begins when
    the only thing known about the block is how many elements it contains.
    There's no apparent consistency or predictability to how the blocks may
    be formatted, or to how the elements are separated. Altering the input
    record separator, $/, then reading in a number of records isn't going to
    work.

    What might work would be to read lines of data until a block's requisite
    number of elements have been acquired, but the elements themselves will
    need to have a consistent, recognizable format, and a newline character
    has to mark the boundary between blocks. From the sample data, the
    elemets all seem to be numbers with one place after the decimal.

    As a first approximation of workable code,

    #!perl
    use warnings;
    use strict;
    my $elems_per_block;
    while(<DATA>) {
    next if /^!/;
    ($elems_per_block) = /^(\d+)/;
    last;
    }
    my @blocks;
    while(<DATA>) {
    my $block = $_;
    my $n = 0;
    while(<DATA>) {
    $block .= $_;
    last if $elems_per_block == ($n += () = /(\b\d+\.\d\b)/g);
    }
    push @blocks, $block;
    }
    for( @blocks ) {
    # whatever processing each block needs
    print "Block:\n$_\n";
    }

    __DATA__
    ! Comment 1
    ! Comment 2
    ! Comment ...
    5 ! number of levels
    *aaa [aaa units] ! space deliminated is common
    1.0 2.0 3.0 4.0 5.0
    *bbb [bbb units] ! csv is possible
    1.0, 2.0, 3.0,
    4.0 5.0
    *ccc [ccc units] ! the file is written from fortran and the number of
    columns is not fixed
    10.0
    20.0
    30.0
    40.0
    50.0

    : The file is generated using about five lines of FORTRAN so it seems
    : somehwat surprising that I am up to 30 lines of perl with almost no
    : end in sight...

    Why should that be surprising? You're trying to build a modicum of
    intelligence into one tool to compensate for another's lack of
    sophistication. The Perl program would have a much easier time reading
    if the FORTRAN program was only a little better at writing.

    Jay Tilton Guest

  7. #6

    Default Re: Reading Data File Records

    On 9 Sep 2003 15:41:03 -0700
    [email]GrahamWilsonCA@yahoo.ca[/email] (Graham) wrote:
    > It seems it isn't just you. All I am trying to do is get the data
    > blocks into a suitable perl structure so I can calculate some simple
    > statistics and reformat it for another program. See comments in the
    > second while loop.
    >
    > I really appreciate the help. I have a pile of files with this type
    > of structure (a legacy of an ancient postdoc) that I need to
    > manipulate and reformat.
    First, let me say that each language is going to handle files and
    variables differently. I say this because you commented on using
    FORTRAN. I know nothing about FORTRAN, but have had _some_ dealings
    with COBOL. Some functionality in COBOL is unavailable in Perl (such
    as strictly defining variables). By the same token, there's
    functionaility in Perl that is not available in COBOL (such as regular
    expressions). Having said that, here is some untested code that _may_
    fit the bill for you. Again, it's untested and may _not_ be exactly
    what you're looking for. If I'm off, I'm hoping someone will point
    out where the errors are.

    ==untested==
    #!/usr/bin/perl -w
    use strict;

    #define the name of the file
    my $file = 'name_of_file_here';

    #define a hash (associative array) for your records
    my %records;

    #open a file handle to the file - die if we can't open it
    open(FILE, $file)
    or die "Can't open file $file: $!\n";

    #get the header - if it's the first line and
    #leads with a "!"
    my $header = <FILE> if /^!/;
    #if you want the number of levels, get the portion before the first
    "!"
    #can be done with substr - regular expression used for
    #demonstration purposes
    my $numLev = $1 if $header =~ m/^(.*)!/;

    #while the file is open and does not return eof
    while(<FILE>){
    #chomp the newline off the line
    chomp;
    #stick the line of the file into variable $line
    my $line = $_;
    #get the begining of the line up until the first "!"
    #(strip the comments)
    #again - substr could be used
    my $uncommented_line = $1 if m/^(.*)!/;
    #if the record is 132 characters in length, separated by
    whitespace
    #spilt the line on whitespace and place each 'section' into an
    array
    my @data = split / /, $uncommented_line;
    #create the key for the record using the block id
    my $key = shift @data;
    #store the record as an array into the hash using the block id as the
    key
    push @{$records{$key}}, @data;
    }

    #to retrieve the records ...
    foreach my $k(sort keys %records){
    print "$k => ",join(" ",@{$record{$k}}),"\n";
    }
    ==untested==

    HTH

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. [url]http://www.gnu.org/licenses/gpl.txt[/url]
    for more information.

    a fortune quote ...
    What this country needs is a good five cent microcomputer.

    James Willmore Guest

  8. #7

    Default Re: Reading Data File Records

    Jay Tilton <tiltonj@erols.com> wrote in comp.lang.perl.misc:
    > [email]GrahamWilsonCA@yahoo.ca[/email] (Graham) wrote:
    > : The file is generated using about five lines of FORTRAN so it seems
    > : somehwat surprising that I am up to 30 lines of perl with almost no
    > : end in sight...
    >
    > Why should that be surprising? You're trying to build a modicum of
    > intelligence into one tool to compensate for another's lack of
    > sophistication. The Perl program would have a much easier time reading
    > if the FORTRAN program was only a little better at writing.
    Also, parsing input is generally harder than generating output. Printing
    what comes along is easy. To read it back in, you must often (as in
    the OPs case) understand what you have read so far to know how to
    proceed.

    The C functions printf() and scanf() are an attempt to make printing
    and scanning symmetric. A look at their respective frequency of use
    shows that the attempt wasn't a full success.

    Anno
    Anno Siegel Guest

  9. #8

    Default Re: Reading Data File Records



    Graham wrote:
    >
    > It seems it isn't just you. All I am trying to do is get the data
    > blocks into a suitable perl structure so I can calculate some simple
    > statistics and reformat it for another program. See comments in the
    > second while loop.
    >
    > I really appreciate the help. I have a pile of files with this type
    > of structure (a legacy of an ancient postdoc) that I need to
    > manipulate and reformat.
    snip


    Don't be afraid to slurp the whole file. I slurp 400,000+
    line files very quickly and do the processing. The only
    trouble is if you do it more than once in the program.
    You might see a big slowdown - at least on Win2000.

    I never found a good solution to this (yet), so I just
    run a bunch on individual perl scripts - one for each
    file.

    If you find a better solution, let us know.


    Mike


    Mike Flannigan Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139