Professional Web Applications Themes

Quick removal of the begging of a file? - PERL Miscellaneous

Ed Kulis wrote: > > Hi, > > I'd like to use perl to chop the head off a file in Unix that an > application is writing to without disturbing the file. (Apps we use are > Oracle, PeopleSoft, Vantive, Unix scripting) Not easily. You can (if your filesystem supports this feature) use syscall(SYS_punch(), ...) to free up the disk pages from the front of the file. This releases them back to the filesystem, so that your file uses less space. Note that the data still in the file will still appear to be at the same ofset as ...

  1. #1

    Default Re: Quick removal of the begging of a file?

    Ed Kulis wrote:
    >
    > Hi,
    >
    > I'd like to use perl to chop the head off a file in Unix that an
    > application is writing to without disturbing the file. (Apps we use are
    > Oracle, PeopleSoft, Vantive, Unix scripting)
    Not easily. You can (if your filesystem supports this feature) use
    syscall(SYS_punch(), ...) to free up the disk pages from the front of
    the file. This releases them back to the filesystem, so that your file
    uses less space. Note that the data still in the file will still appear
    to be at the same ofset as it had been, and the file will appear to
    still have the same length as it did before. The only difference is
    that less disk space is used, and the fact that the leading bytes of the
    file will now appear to be filled with "\0" bytes.
    > We've got some immense trace files and I'd like to remove say the first
    > 50% of the lines of a 200 meg file while an has the file open and the
    > app is writing to it. That way we can manage the filesystem space while
    > retaining the recent interesting information.
    If the app has the file open in append mode, and if you can temporarily
    force the app to stop writing, there's another way to do it. By reading
    and seeking and writing, you can copy bytes from the latter 50% of the
    file to the front, and then call truncate() to make the file shorter.

    Something like this:

    open( FH, "<+", $filename ) or die horribly;
    seek( FH, ((-s FH)/2), 0 ) or die horribly; # 50%
    scalar(<FH>); # skip to the end of the line.
    my $readfrom = tell FH;
    my $writeto = 0;
    my $n;
    my $lastbytes = 0;
    LOOP: while( 1 ) {
    lseek( FH, $readfrom, 0 ) or die horribly;
    my $buf;
    while(1) {
    $n = sysread( FH, $buf, 8192 ) and last;
    unless( defined $n ) {
    redo if $!{EINTR} or $!{EAGAIN};
    die horribly;
    }
    last LOOP if $lastbytes;
    $lastbytes = 1;
    stop_other_app_from_writing();
    } # end while(1)
    lseek( FH, $writeto, 0 ) or die horribly;
    TRYWRITE: {
    my $m = syswrite( FH, $buf );
    unless( $m ) {
    redo TRYWRITE if $!{EINTR} or $!{EAGAIN};
    die horribly;
    }
    substr( $buf, 0, $m ) = "";
    $writeto += $m;
    $n -= $m;
    redo TRYWRITE if $n;
    } # end TRYWRITE:
    } # end LOOP: while(1)
    truncate( FH, $writeto ) or die horribly;
    allow_other_app_to_write();
    close FH;

    Isn't this ugly? :)

    If you leave out the stop_other_app_from_writing and allow_other_app_to_
    _write, then there's a chance you'll some lines, due to a race
    condition.

    You might be better off rotating your log files every so often, then
    removing the oldest ones. This is the standard technique for what you
    want.
    > cat /dev/null > file
    > sometimes works to zero the entire file while it's being written to
    This should *always* truncate the file to 0 length. Not "sometimes".

    You could get the same effect by doing, in perl:

    truncate( "file", 0 );

    Or:

    open( FILE, ">file" );
    close(FILE);

    However, for none of these do you retain any of your old data.
    > because I think that the Unix inode stays the same so it won't make any
    > difference to the apps open state. I'm not clear on the details of the
    > file open states.
    True -- the inode stays the same, so it doesn't make a difference to the
    app which is writing.

    But if you want to keep recent data, then it *does* make a difference.
    > I don't want to do anything to the app configuration or startup. I'd
    > like to know if there are any Perl functions/techniques that could say
    > move the beginning of file pointer to the mid file some where.
    The perl techniques for this aren't signifcantly different from the C
    techniques for it.
    > Is there a clever way to get the size of a file and remove the beginning
    > of it?
    The clever way of getting the size of a file is to use the -s operator,
    either on a filename or on a filehandle.

    See:
    perldoc -f -X

    --
    $a=24;split//,240513;s/\B/ => /for=qw(ac ab bc ba cb ca
    );{push(b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "$[$a%6
    ]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop b))&&redo;}
    Benjamin Goldberg Guest

  2. #2

    Default Re: Quick removal of the begging of a file?

    Thanks, Benjamin,

    Great techniques in your response. See some of my comments.


    On 8/18/03 7:54 PM, in article [email]3F419148.81D4E7FFhotpop.com[/email], "Benjamin
    Goldberg" <ben.goldberghotpop.com> wrote:
    > Ed Kulis wrote:
    >>
    >> Hi,
    >>
    >> I'd like to use perl to chop the head off a file in Unix that an
    >> application is writing to without disturbing the file. (Apps we use are
    >> Oracle, PeopleSoft, Vantive, Unix scripting)
    >
    > Not easily. You can (if your filesystem supports this feature) use
    > syscall(SYS_punch(), ...) to free up the disk pages from the front of
    > the file. This releases them back to the filesystem, so that your file
    > uses less space. Note that the data still in the file will still appear
    > to be at the same ofset as it had been, and the file will appear to
    > still have the same length as it did before. The only difference is
    > that less disk space is used, and the fact that the leading bytes of the
    > file will now appear to be filled with "\0" bytes.
    >
    I was afraid that it wasn't easy.
    >> We've got some immense trace files and I'd like to remove say the first
    >> 50% of the lines of a 200 meg file while an has the file open and the
    >> app is writing to it. That way we can manage the filesystem space while
    >> retaining the recent interesting information.
    >
    > If the app has the file open in append mode, and if you can temporarily
    > force the app to stop writing, there's another way to do it. By reading
    > and seeking and writing, you can copy bytes from the latter 50% of the
    > file to the front, and then call truncate() to make the file shorter.
    >
    I'm trying to stay from the app.
    > Something like this:
    >
    > open( FH, "<+", $filename ) or die horribly;
    > seek( FH, ((-s FH)/2), 0 ) or die horribly; # 50%
    > scalar(<FH>); # skip to the end of the line.
    This is cool.
    > my $readfrom = tell FH;
    > my $writeto = 0;
    > my $n;
    > my $lastbytes = 0;
    > LOOP: while( 1 ) {
    > lseek( FH, $readfrom, 0 ) or die horribly;
    > my $buf;
    > while(1) {
    > $n = sysread( FH, $buf, 8192 ) and last;
    > unless( defined $n ) {
    > redo if $!{EINTR} or $!{EAGAIN};
    > die horribly;
    > }
    > last LOOP if $lastbytes;
    > $lastbytes = 1;
    > stop_other_app_from_writing();
    > } # end while(1)
    > lseek( FH, $writeto, 0 ) or die horribly;
    > TRYWRITE: {
    > my $m = syswrite( FH, $buf );
    > unless( $m ) {
    > redo TRYWRITE if $!{EINTR} or $!{EAGAIN};
    > die horribly;
    > }
    > substr( $buf, 0, $m ) = "";
    > $writeto += $m;
    > $n -= $m;
    > redo TRYWRITE if $n;
    > } # end TRYWRITE:
    > } # end LOOP: while(1)
    > truncate( FH, $writeto ) or die horribly;
    > allow_other_app_to_write();
    > close FH;
    >
    > Isn't this ugly? :)
    >
    > If you leave out the stop_other_app_from_writing and allow_other_app_to_
    > _write, then there's a chance you'll some lines, due to a race
    > condition.
    That's what puts the "curse" in "recursive"
    >
    > You might be better off rotating your log files every so often, then
    > removing the oldest ones. This is the standard technique for what you
    > want.
    >
    Yep. I was trying to be fancy so that a developer could always see the last
    x% of a file. BUT! why don't I just write it to another file before I do the
    cat /dev/null.
    >> cat /dev/null > file
    >> sometimes works to zero the entire file while it's being written to
    Always truncates but I've had some gremline reports from developers that the
    app didn't like it and hung or crashed. I don't quite believe it and I think
    is was the "Fallacy of the Last Change"
    >
    > This should *always* truncate the file to 0 length. Not "sometimes".
    >
    > You could get the same effect by doing, in perl:
    >
    > truncate( "file", 0 );
    >
    > Or:
    >
    > open( FILE, ">file" );
    > close(FILE);
    Nice. perlish and not a qx{cat ...
    >
    > However, for none of these do you retain any of your old data.
    >
    >> because I think that the Unix inode stays the same so it won't make any
    >> difference to the apps open state. I'm not clear on the details of the
    >> file open states.
    >
    > True -- the inode stays the same, so it doesn't make a difference to the
    > app which is writing.
    >
    Thanks for the confirm on this.

    > But if you want to keep recent data, then it *does* make a difference.
    >
    Sounds like I can loop and post reads on the source log file and send the
    lines to other files of a certain fixed size then truncate the source log
    file from another process.

    Might lose a line or two but that doesn't make much difference in a verbose
    log.

    I can then manage the chunks of the log files and eliminate older ones.
    >> I don't want to do anything to the app configuration or startup. I'd
    >> like to know if there are any Perl functions/techniques that could say
    >> move the beginning of file pointer to the mid file some where.
    >
    > The perl techniques for this aren't signifcantly different from the C
    > techniques for it.
    >
    >> Is there a clever way to get the size of a file and remove the beginning
    >> of it?
    >
    > The clever way of getting the size of a file is to use the -s operator,
    > either on a filename or on a filehandle.
    >
    > See:
    > perldoc -f -X
    Ed Kulis Guest

  3. #3

    Default Re: Quick removal of the begging of a file?

    Benjamin Goldberg <ben.goldberghotpop.com> wrote in message news:<3F419148.81D4E7FFhotpop.com>...
    [snip]
    > open( FH, "<+", $filename ) or die horribly;
    Slightly OT question... what's with this "die horribly" usage?
    I've seen it a couple of times and even tried it, but it didn't seem
    to do anything more spectacular than the obvious result of printing
    the string "horribly."
    Carlton Brown Guest

  4. #4

    Default Re: Quick removal of the begging of a file?

    [email]carltonbrownhotmail.com[/email] (Carlton Brown) writes:
    > Benjamin Goldberg <ben.goldberghotpop.com> wrote in message news:<3F419148.81D4E7FFhotpop.com>...
    > [snip]
    >> open( FH, "<+", $filename ) or die horribly;
    >
    > Slightly OT question... what's with this "die horribly" usage?
    It's a very odd way to write

    die "couldn't open file: $!"

    AFAICT. I mean, why throw away useful debugging info, just to be
    cute? :)
    > I've seen it a couple of times and even tried it, but it didn't seem
    > to do anything more spectacular than the obvious result of printing
    > the string "horribly."
    That's why it's fun to read, not much fun to use.

    -=Eric
    --
    Come to think of it, there are already a million monkeys on a million
    typewriters, and Usenet is NOTHING like Shakespeare.
    -- Blair Houghton.
    Eric Schwartz Guest

  5. #5

    Default Re: Quick removal of the begging of a file?

    >> On Tue, 19 Aug 2003 15:30:37 -0600,
    >> Eric Schwartz <emschwarpobox.com> said:
    > [email]carltonbrownhotmail.com[/email] (Carlton Brown) writes:
    >> Benjamin Goldberg <ben.goldberghotpop.com> wrote in
    >> message news:<3F419148.81D4E7FFhotpop.com>... [snip]
    >> open( FH, "<+", $filename ) or die horribly;
    >>
    >> Slightly OT question... what's with this "die horribly"
    >> usage?
    > It's a very odd way to write
    > die "couldn't open file: $!"
    > AFAICT. I mean, why throw away useful debugging info,
    > just to be cute? :)
    I think it's meant to be "insert suitable diagnostic
    and/or handling code here".

    "die horribly" has a certain je ne sais quoi about it.
    Tony Curtis Guest

  6. #6

    Default Re: Quick removal of the begging of a file?

    Ed Kulis wrote:
    > Benjamin Goldberg wrote:
    [snip]
    > > Something like this:
    > >
    > > open( FH, "<+", $filename ) or die horribly;
    > > seek( FH, ((-s FH)/2), 0 ) or die horribly; # 50%
    > > scalar(<FH>); # skip to the end of the line.
    > This is cool.
    I got the idea of skipping to the end of the line this way from look.pl.

    [snip]
    > > If you leave out the stop_other_app_from_writing and
    > > allow_other_app_to_write, then there's a chance you'll
    > > some lines, due to a race condition.
    >
    > That's what puts the "curse" in "recursive"
    Recursion is not involved.

    The specific race condition I'm referring to is:
    It's possible, that after you read the last stuff from the end
    of the file, and before you truncate the file, the other app
    could decide to add more lines.
    If you've *blocked* app from writing, when you're reading the last stuff
    from the file, then this cannot happen.
    > > You might be better off rotating your log files every so often, then
    > > removing the oldest ones. This is the standard technique for what you
    > > want.
    > >
    > Yep. I was trying to be fancy so that a developer could always see the
    > last x% of a file. BUT! why don't I just write it to another file before
    > I do the cat /dev/null.
    And what happens if, after you copy the last x% to another file, and
    before you do the cat /dev/null, the other app decides to add more
    lines?

    This is the exact same race condtion I mentioned above.
    > >> cat /dev/null > file
    > >> sometimes works to zero the entire file while it's being written to
    > >
    > > This should *always* truncate the file to 0 length. Not "sometimes".
    >
    > Always truncates but I've had some gremline reports from developers that
    > the app didn't like it and hung or crashed.
    This isn't entirely surprising.

    If the app does not use a filehandle in opened in write-only append
    mode, then it might be seeking around in and reading from the file,
    while at the same time you're truncating the file. It's not
    unsurprising for the app to be confused, and hang or die.
    > I don't quite believe it and I think is
    > was the "Fallacy of the Last Change"
    Not believing your users is also a fallacy :)

    [snip]
    > > True -- the inode stays the same, so it doesn't make a difference to
    > > the app which is writing.
    >
    > Thanks for the confirm on this.
    Actually, to be more pendantic, because the inode stays the same, the
    app will continue writing to the same old file (even though it's now
    truncated). Whereas, if you renamed the file and created a new, empty
    one with the old name, the app would continue writing to the same *file*
    (which now has a new name), instead of the new file with the old file's
    name.

    Note that "doesn't make a difference" only *truly* applies if the app
    has the file opened in write-only append mode.

    If the app has the file opened for reading (or reading and writing),
    then definitely it makes a difference. If the app isn't using append
    mode, and merely used seek to get to the end, new data printed will
    *not* start appearing where you truncated the file to, and will instead
    appear at the same offset that earlier writes are, thus creating a
    "sp file", with the new data someplace you really didn't expect it.
    > > But if you want to keep recent data, then it *does* make a difference.
    >
    > Sounds like I can loop and post reads on the source log file and send
    > the lines to other files of a certain fixed size then truncate the
    > source log file from another process.
    >
    > Might lose a line or two but that doesn't make much difference in a
    > verbose log.
    >
    > I can then manage the chunks of the log files and eliminate older ones.
    If you don't mind losing a line or two, then it sounds like you've got
    your solution.

    --
    $a=24;split//,240513;s/\B/ => /for=qw(ac ab bc ba cb ca
    );{push(b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "$[$a%6
    ]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop b))&&redo;}
    Benjamin Goldberg Guest

  7. #7

    Default Re: Quick removal of the begging of a file?

    On Tue, 19 Aug 2003, [email]ben.goldberghotpop.com[/email] wrote:
    >> > scalar(<FH>); # skip to the end of the line.
    >
    > I got the idea of skipping to the end of the line this way from
    > look.pl.
    Doesn't this cause the whole file to be put into memory temporarily?

    Why not just seek(FH, 0, SEEK_END)? It's O(1) * as opposed to any O(n)
    solution that loops through the lines of the file.

    Ted

    * depending on the filesystem, but generally O(1) I think
    Ted Zlatanov Guest

  8. #8

    Default Re: Quick removal of the begging of a file?

    Ted Zlatanov <tzzlifelogs.com> wrote:
    > On Tue, 19 Aug 2003, [email]ben.goldberghotpop.com[/email] wrote:
    >
    >>> > scalar(<FH>); # skip to the end of the line.
    > Doesn't this cause the whole file to be put into memory temporarily?

    In scalar context the input operator reads a single line, so no,
    it does not cause the whole file to be put into memory.


    --
    Tad McClellan SGML consulting
    [email]tadmcaugustmail.com[/email] Perl programming
    Fort Worth, Texas
    Tad McClellan Guest

  9. #9

    Default Re: Quick removal of the begging of a file?

    On Thu, 21 Aug 2003 14:05:07 -0400,
    Ted Zlatanov <tzzlifelogs.com> wrote:
    > On Tue, 19 Aug 2003, [email]ben.goldberghotpop.com[/email] wrote:
    >
    >>> > scalar(<FH>); # skip to the end of the line.
    >>
    >> I got the idea of skipping to the end of the line this way from
    >> look.pl.
    >
    > Doesn't this cause the whole file to be put into memory temporarily?
    No. Just one line.
    > Why not just seek(FH, 0, SEEK_END)? It's O(1) * as opposed to any O(n)
    > solution that loops through the lines of the file.
    Because the seek() seeks to the end of the file, instead of the end of
    the line. It's a different thing.

    If the OP reallty did want to seek to the end of the file, and not
    just the end of the line, then I apologise for intruding. I couldn't
    get enough history on this thread from my news server to determine.

    Martien
    --
    |
    Martien Verbruggen |
    Trading Post Australia | The gene pool could use a little chlorine.
    |
    Martien Verbruggen Guest

Similar Threads

  1. Quick guide to publishing an FLV file to FMS?
    By Sefi G in forum Macromedia Flash Flashcom
    Replies: 1
    Last Post: November 14th, 10:40 PM
  2. Catch 22 with file removal or mysql queries?
    By Bonge Boo! in forum PHP Development
    Replies: 2
    Last Post: July 11th, 04:59 AM
  3. File Attachment Removal
    By Bob in forum Windows Setup, Administration & Security
    Replies: 1
    Last Post: July 7th, 04:21 PM
  4. How to put header onto one file for quick editing
    By Kian Goh in forum ASP.NET General
    Replies: 1
    Last Post: June 30th, 11:47 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139