splitting / unpacking line into array

Ask a Question related to PERL Beginners, Design and Development.

  1. #1

    Default splitting / unpacking line into array

    I have an input file which has many subrecords. The subrecord type is
    denoted by the first 4 characters of the file. The rest of the line is
    formatted like similar to the way that "pack" would format one. That is,
    each data point in a subtype is always at the same offset for the same
    length. E.g. 10 characters starting at offset 30, or some such. What I'm
    considering is using "unpack" and having a hash contain the unpack
    template based on the subrecord type. Something like:

    while (<FH>) {
    my $subrec = substr($_,0,4);
    my @values = unpack $template{$subrec}, $_;
    ....
    }

    Earlier in the code, I would have created the %template hash which would
    have the template associated with the $subrec from the input file.

    Is this a decent way to do this? Is there a better way?

    --
    --
    Maranatha!
    John McKown

    John McKown Guest

  2. Similar Questions and Discussions

    1. unpacking a tar-file
      marco@Ubuntu:~$ sudo tar -zxvf firefox-1.5.0.4.tar.gz Password: tar: firefox-1.5.0.4.tar.gz: Kan niet open: Onbekend bestand of map tar: Fout is...
    2. assoctiative array on one line
      hi ive always wondered how i can write an associative array on one line. i know how to do it with in an indexed array: var a = i...
    3. splitting a line by columns
      I have a line of text output in columnar form; what's the best way to split it into its requisite parts? Say I have lines of aaaaabbcccccddeee...
    4. splitting a line by columns<Pine.LNX.4.44.0310120829330.4234-100000@ool-4355dfae.dyn.optonline.net>
      Hi -- On Sun, 12 Oct 2003, Mike Campbell wrote: I think the main disadvantage of the above is that it's a bit awkward to write into a...
    5. splitting an array
      Hi All , I have one array of numbers say (12 17 18 19 120 121 122 123 124 379 480 481). Now I want to get the starting and ending of any...
  3. #2

    Default Re: splitting / unpacking line into array

    > I have an input file which has many subrecords. The subrecord type is
    > denoted by the first 4 characters of the file. The rest of the line is
    > formatted like similar to the way that "pack" would format one. That is,
    > each data point in a subtype is always at the same offset for the same
    > length. E.g. 10 characters starting at offset 30, or some such. What I'm
    > considering is using "unpack" and having a hash contain the unpack
    > template based on the subrecord type. Something like:
    >
    > while (<FH>) {
    > my $subrec = substr($_,0,4);
    > my @values = unpack $template{$subrec}, $_;
    > ...
    > }
    >
    > Earlier in the code, I would have created the %template hash which would
    > have the template associated with the $subrec from the input file.
    >
    > Is this a decent way to do this? Is there a better way?
    >
    Sounds pretty good to me. One concern, do the sub record types always
    have the same number of fields? Using your array to unpack into may
    turn into a maintenance nightmare with respect to indexing into it to
    get values if the record formats are signficantly different, etc.
    Second concern, are you processing the records completely within the
    loop or needing to parse them all before doing anything with them? In
    the latter case you may need to store them to an array based on type
    rather than directly to a 'values' temporary array, etc.

    For the first concern you may consider using a hash slice with the keys
    being associated with the subtype stored in the original hash where you
    retrieve the record format from.

    Obviously there is also the potential to use objects here but that may
    be overkill depending on what you are doing with the data after you have
    unpacked it....

    [url]http://danconia.org[/url]

    Wiggins D Anconia Guest

  4. #3

    Default Re: splitting / unpacking line into array

    On Mon, 2 Feb 2004, Wiggins d Anconia wrote:
    >
    > Sounds pretty good to me. One concern, do the sub record types always
    > have the same number of fields? Using your array to unpack into may
    > turn into a maintenance nightmare with respect to indexing into it to
    > get values if the record formats are signficantly different, etc.
    Actually, that was only an example. I really hope to have the result
    returned more like:

    if ($subrec = '0100') {
    ($name, $address, $city ) = unpack $template{$subrec}, $_ ;
    } elsif ($subrec = '0101') {
    ($some1, $some2) = unpack $template{$subrec}, $_;
    }

    and so on for each defined $subrec.
    > Second concern, are you processing the records completely within the
    > loop or needing to parse them all before doing anything with them? In
    > the latter case you may need to store them to an array based on type
    > rather than directly to a 'values' temporary array, etc.
    I will be processing the records one at a time and putting them in a
    "persistant storage" for retrieval later in a reporting program. I have
    not yet determined what sort of "persistant storage" that I want. Perhaps
    DBM, perhaps PostgreSQL, perhaps mySQL, <whatever>.

    I may end up not even doing this since PostgreSQL, at least, has a way to
    load records from a "flat file". I just like to leave my options open. And
    I'm looking a Perl solutions right now mainly because I'm trying to learn
    Perl.

    <off-topic>
    Also, if I find a "nice" Perl solution, I may implement it "in production"
    on our mainframe (IBM zSeries) at work. The actual data being parsed is a
    RACF (security system) database unload. If I can ftp that data from z/OS
    to our Linux/390 system and do all my reporting there, I can save z/OS CPU
    utilization. That's because Linux/390 on our zSeries runs on a separate
    processor from the z/OS work. The z/OS work cannot use this processor due
    to licensing restrictions. So, any work that I can "offload" from z/OS is
    a net gain because the IFL (Linux processor) is basically idle right now.
    I would then use Perl to create reports which would then be ftp'ed back to
    the z/OS system. This gets me "brownie points" by offloading z/OS
    processing. We are critically short of z/OS processor power and the next
    upgrade would cost 1.5 million dollars in software "upgrade" fees.

    If this works for the database unload, I can use a similar system for RACF
    reports run against the "reformatted audit logs". Again, getting "brownie
    points" for offloading work.

    This is why I'm considering a Perl-only solution. I have Perl on our SuSE
    Linux/390 system. I do not have any SQL database and am not really good
    enough to try to port something like PostgreSQL or mySQL.

    </off-topic>
    >
    > For the first concern you may consider using a hash slice with the keys
    > being associated with the subtype stored in the original hash where you
    > retrieve the record format from.
    >
    Good idea. I'll keep it in mind.

    thanks much!

    --
    Maranatha!
    John McKown


    John McKown Guest

  5. #4

    Default Re: splitting / unpacking line into array

    John McKown wrote:
    > On Mon, 2 Feb 2004, Wiggins d Anconia wrote:
    >
    >
    >>Sounds pretty good to me. One concern, do the sub record types always
    >>have the same number of fields? Using your array to unpack into may
    >>turn into a maintenance nightmare with respect to indexing into it to
    >>get values if the record formats are signficantly different, etc.
    >
    >
    > Actually, that was only an example. I really hope to have the result
    > returned more like:
    >
    > if ($subrec = '0100') {
    > ($name, $address, $city ) = unpack $template{$subrec}, $_ ;
    > } elsif ($subrec = '0101') {
    > ($some1, $some2) = unpack $template{$subrec}, $_;
    > }
    >
    > and so on for each defined $subrec.
    >
    That works, though you have to repeat your unpack over and over (not a
    big deal) but using the slices you only need it once and don't have to
    check the subrec type, though you will again when you use them...unless
    again you push to an array in a hash where the key is the subtype and
    then just loop over each of the different types, which might make the
    code more modular, granted the data structure would be more complicated
    (and unordered at that point).
    >
    >>Second concern, are you processing the records completely within the
    >>loop or needing to parse them all before doing anything with them? In
    >>the latter case you may need to store them to an array based on type
    >>rather than directly to a 'values' temporary array, etc.
    >
    >
    > I will be processing the records one at a time and putting them in a
    > "persistant storage" for retrieval later in a reporting program. I have
    > not yet determined what sort of "persistant storage" that I want. Perhaps
    > DBM, perhaps PostgreSQL, perhaps mySQL, <whatever>.
    >
    > I may end up not even doing this since PostgreSQL, at least, has a way to
    > load records from a "flat file". I just like to leave my options open. And
    > I'm looking a Perl solutions right now mainly because I'm trying to learn
    > Perl.
    >
    MySQL can load flat files as well, though I don't know about formatted
    files like you describe.
    > <off-topic>
    > Also, if I find a "nice" Perl solution, I may implement it "in production"
    > on our mainframe (IBM zSeries) at work. The actual data being parsed is a
    > RACF (security system) database unload. If I can ftp that data from z/OS
    > to our Linux/390 system and do all my reporting there, I can save z/OS CPU
    > utilization. That's because Linux/390 on our zSeries runs on a separate
    > processor from the z/OS work. The z/OS work cannot use this processor due
    > to licensing restrictions. So, any work that I can "offload" from z/OS is
    > a net gain because the IFL (Linux processor) is basically idle right now.
    > I would then use Perl to create reports which would then be ftp'ed back to
    > the z/OS system. This gets me "brownie points" by offloading z/OS
    > processing. We are critically short of z/OS processor power and the next
    > upgrade would cost 1.5 million dollars in software "upgrade" fees.
    >
    Yikes, I understood just enough of that to know that I am running for
    the hills :-)... Though I will say that it should be doable, and I
    assume you have checked out Net::FTP...
    > If this works for the database unload, I can use a similar system for RACF
    > reports run against the "reformatted audit logs". Again, getting "brownie
    > points" for offloading work.
    >
    > This is why I'm considering a Perl-only solution. I have Perl on our SuSE
    > Linux/390 system. I do not have any SQL database and am not really good
    > enough to try to port something like PostgreSQL or mySQL.
    >
    If you decide against using a "real" database you might consider using
    some of the CSV text file modules, there is even a DBD::CSV that will
    allow you to implement using "real" SQL and the DBI if in the future you
    might get to port to a database and don't want to change the code later,
    though it is not speedy by any means. There is also XML, but that is
    all I will say for now :-)....
    > </off-topic>
    >
    >>For the first concern you may consider using a hash slice with the keys
    >>being associated with the subtype stored in the original hash where you
    >>retrieve the record format from.
    >>
    >
    >
    > Good idea. I'll keep it in mind.
    >
    > thanks much!
    >
    Good luck,

    [url]http://danconia.org[/url]
    Wiggins D'Anconia Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139