how to parse blank-line-separated records

Ask a Question related to PERL Beginners, Design and Development.

  1. #1

    Default how to parse blank-line-separated records

    Hi, all --

    I'm wrestling with a data file containing owners and contact info and it
    suddenly occurred to me that I could probably change my record separator
    from \n to \n\n (a blank line) and grab the whole record that way.
    Assuming I figure out how to do that, then how do I match the pieces?

    The file looks a lot like

    header stuff
    code unit
    owner home_phone work_phone
    addr
    city, st zip

    where any of the phone numbers or the addresses might be missing, but we
    can count on the column positions for formatting (and thus parsing).

    So I probably go through a

    while (<>)

    loop and it sucks in each record for me, but then how do I match to get
    the various pieces -- around the newlines?

    Yes, sample code would be welcome :-) So would pointers to where this
    has been done before; I'm just not finding it as I read the code examples
    from _Programming_ (2e) this morning :-(


    TIA & HAND

    :-D
    --
    David T-G * There is too much animal courage in
    (play) [email]davidtg@justpickone.org[/email] * society and not sufficient moral courage.
    (work) [email]davidtgwork@justpickone.org[/email] -- Mary Baker Eddy, "Science and Health"
    [url]http://justpickone.org/davidtg/[/url] Shpx gur Pbzzhavpngvbaf Qrprapl Npg!


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.0.7 (FreeBSD)

    iD8DBQE/ZbVYGb7uCXufRwARAkuYAJ9n2+e49gNMyvlGEp7c2MeZFGe7sw CeNJZc
    B6h/2EnwO5GD7do4V8CXOjw=
    =sJEx
    -----END PGP SIGNATURE-----

    David T-G Guest

  2. Similar Questions and Discussions

    1. line separated file
      I have line separated file. Every line represent something. I need to extraxt these lines into Array or something else so after that I can...
    2. blank line between records
      this sort of requirement requires custom work to be done in the itemdatabound an approach would be to check the type of cell being fired (header,...
    3. #25256 [Opn->Bgs]: Parse error: parse error, unexpected $ in ... on line 642
      ID: 25256 Updated by: iliaa@php.net Reported By: a dot schat at streamedge dot com -Status: Open +Status: ...
    4. #25256 [NEW]: Parse error: parse error, unexpected $ in ... on line 642
      From: a dot schat at streamedge dot com Operating system: Linux PHP version: 4.3.1 PHP Bug Type: Compile Failure Bug...
    5. Getting a comma separated list of all records from a view.
      What I would like to do is add records from a view to a table based upon user input. What is making it hard is since I can't give an identity field...
  3. #2

    Default RE: how to parse blank-line-separated records

    David T-G wrote:
    > Hi, all --
    >
    > I'm wrestling with a data file containing owners and contact
    > info and it
    > suddenly occurred to me that I could probably change my
    > record separator
    > from \n to \n\n (a blank line) and grab the whole record that way.
    Yes, or set $/ = '' to get "paragraph" mode. That's a little more flexible,
    as Perl will treat any sequence of multiple blank lines as a record
    separator.
    > Assuming I figure out how to do that, then how do I match the pieces?
    >
    > The file looks a lot like
    >
    > header stuff
    > code unit
    > owner home_phone work_phone
    > addr city, st zip
    >
    > where any of the phone numbers or the addresses might be
    > missing, but we
    > can count on the column positions for formatting (and thus parsing).
    >
    > So I probably go through a
    >
    > while (<>)
    >
    > loop and it sucks in each record for me, but then how do I
    > match to get
    > the various pieces -- around the newlines?
    If the paragraphs have a definite fixed format, usually unpack() is the
    easiest way to grab the data. Use 'x' in your pattern to skip over bytes,
    and 'A' to extract a sequence of bytes. So, if you want to skip 20 chars,
    then grab 8 chars, then skip 32 chars, then grab 15 chars, you use:

    my @fields = unpack('x20 A8 x32 A15', $record);

    (use lowercase 'a' instead of 'A' if you want to preserve trailing blanks on
    the fields you extract.)

    Otherwise, you can construct regexes that grab what you're looking for.
    Depending on your regex, you might need to use the /s and/or /m modifiers to
    change the way ^, $, and . match within the multi-line string.
    >
    > Yes, sample code would be welcome :-) So would pointers to where this
    > has been done before; I'm just not finding it as I read the
    > code examples
    > from _Programming_ (2e) this morning :-(
    Bob Showalter Guest

  4. #3

    Default Re: how to parse blank-line-separated records

    Bob, et al --

    ...and then Bob Showalter said...
    %
    % David T-G wrote:
    % >
    % > suddenly occurred to me that I could probably change my
    % > record separator
    % > from \n to \n\n (a blank line) and grab the whole record that way.
    %
    % Yes, or set $/ = '' to get "paragraph" mode. That's a little more flexible,

    Ah, yes; that's what I meant.


    % as Perl will treat any sequence of multiple blank lines as a record
    % separator.

    Right. That's a Good Thing(tm).


    %
    % > Assuming I figure out how to do that, then how do I match the pieces?
    % >
    % > The file looks a lot like
    % >
    % > header stuff
    % > code unit
    % > owner home_phone work_phone
    % > addr city, st zip
    % >
    % > where any of the phone numbers or the addresses might be
    % > missing, but we
    % > can count on the column positions for formatting (and thus parsing).
    ...
    %
    % If the paragraphs have a definite fixed format, usually unpack() is the
    % easiest way to grab the data. Use 'x' in your pattern to skip over bytes,
    % and 'A' to extract a sequence of bytes. So, if you want to skip 20 chars,
    % then grab 8 chars, then skip 32 chars, then grab 15 chars, you use:
    %
    % my @fields = unpack('x20 A8 x32 A15', $record);
    %
    % (use lowercase 'a' instead of 'A' if you want to preserve trailing blankson
    % the fields you extract.)

    Well, they do but the lines might be short. That is, we have

    unitcode 101 Short Way
    Owner
    Address
    ...
    unitcode 206 Longer Circle
    Owner vonLongName, III
    Address

    as well as phone numbers that might or might not be there. I'm not sure
    how I'd unpack anything beyond the first possibly-in-a-different-column
    newline. I suppose I should have said "we can count on the starting
    column positions if we get that far out in a given record", which is
    probably different!


    %
    % Otherwise, you can construct regexes that grab what you're looking for.
    % Depending on your regex, you might need to use the /s and/or /m modifiersto
    % change the way ^, $, and . match within the multi-line string.

    I suppose this will be the way to go, then. I don't see how /s and /m
    will change the begin- and end-of-line matching, though... Where do I
    look for that?


    Thanks again & HAND

    :-D
    --
    David T-G * There is too much animal courage in
    (play) [email]davidtg@justpickone.org[/email] * society and not sufficient moral courage.
    (work) [email]davidtgwork@justpickone.org[/email] -- Mary Baker Eddy, "Science and Health"
    [url]http://justpickone.org/davidtg/[/url] Shpx gur Pbzzhavpngvbaf Qrprapl Npg!


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.0.7 (FreeBSD)

    iD8DBQE/ZtivGb7uCXufRwARAoNaAKCqQ2/N6dfd1g8RORL75PfY666k/wCg2iTP
    TwwpKBYIxIMwIMfI+MZv9KA=
    =ohA2
    -----END PGP SIGNATURE-----

    David T-G Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139