Professional Web Applications Themes

slow IO - Ruby

Hello everyone, I'm having an IO problem. The code below is designed to read a series of text files (most of these are about 8k) and output them into a single text file separated by form feed characters. This works very quickly with sets of files that are in directories containing (say) 90K doents or fewer. But when there are 200k + doents in the directories, it begins a substantial amount of time. The main problem seems to be that sysread seems to be painfully slow with files in very large directories. At this point, it would be faster to ...

  1. #1

    Default slow IO

    Hello everyone,

    I'm having an IO problem. The code below is designed to read a series
    of text files (most of these are about 8k) and output them into a
    single text file separated by form feed characters. This works very
    quickly with sets of files that are in directories containing (say) 90K
    doents or fewer. But when there are 200k + doents in the
    directories, it begins a substantial amount of time.

    The main problem seems to be that sysread seems to be painfully slow
    with files in very large directories. At this point, it would be
    faster to do the following:

    system("cat #{input_file} >> #{outputFile}")
    system("echo \f >> #{outputFile}")

    It seems to me that there is no way that this should be faster than
    doing a sysread.

    Any help would be appreciated.

    Best,

    Dave

    -- begin code (This has been cleaned a bit and changed to protect the
    innocent)

    # docInfo object is a wrapper for pages array with some additional info
    outputFile = docInfo.outputFile
    output = nil

    isOpen = false

    chunk = (10240 * 2) # 20k
    fmode = File::CREAT|File::TRUNC|File::WRONLY

    begin

    docInfo.each do | pageInfo |
    pageNo = pageInfo.pageNo
    start = 0
    count = chunk

    begin
    # open source
    input = File.open(pageInfo.inputFile)
    fileSize = input.stat.size

    # open destination if not already open
    unless isOpen
    output = File.open(outputFile, fmode, 0664)
    isOpen = true
    end

    # loop to make sure that no
    while start < fileSize
    count = (fileSize - start) if (start + chunk) > fileSize
    output.syswrite(input.sysread(count))
    start += count
    end
    output.syswrite("\f")

    ensure
    begin
    input.close
    rescue Exception => err
    STDERR << "WARNING: couldn't close #{inputFile}\n"
    end
    end
    end

    ensure
    begin
    output.close if isOpen
    rescue Exception
    STDERR << "WARNING: couldn't close #{outputFile}\n"
    end
    end
    --end code



    David Guest

  2. #2

    Default Re: slow IO

    On Sat, 14 Feb 2004, David King Landrith wrote:
     

    input = output = nil
     

    buf = input.sysread count
    output.syswrite buf
    buf = nil
     

     
    # you probably want to _know_ if the system is having probs
    # closing files
    input.close if input
    output.close if output 

    alternatively you might be able to use the

    open(path) do |f|
    ...
    end

    idiom with 'output'. i suspect that you were grinding to a halt with too many
    output files open (they were never closed)...

    -a
    --
    ================================================== =============================
    | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
    | PHONE :: 303.497.6469
    | ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
    | URL :: http://www.ngdc.noaa.gov/stp/
    | TRY :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done
    ================================================== =============================

    Ara.T.Howard Guest

  3. #3

    Default Re: slow IO


    On Feb 13, 2004, at 9:16 AM, David King Landrith wrote:
     

    Suffice it to say that few file systems are optimized for the case of >
    200k files per directory. From your example, I'm guessing that you're
    using Windows which I don't know much about. But my advice is to
    restructure your program to using more, smaller directories.

    Seriously, that's a lot of files...

    -J



    J.Herre Guest

  4. #4

    Default Re: slow IO

    In Message-Id: <com>
    David King Landrith <com> writes:
     

    Probably here is a problem: many Strings created and discarded may
    stimulate GC.

    Can you rewrite this to

    # buf should be allocated before the loop.
    input.sysread(count, buf)
    output.syswrite(buf)

    and test its performance? Here buf is updated by sysread and no more
    extra Strings are created.

    # Note that this feature is incorporated from version 1.7.x or later
    # where x > ....well, some point :-P


    --
    to February 14, 2004
    Slow and steady wins the race.



    YANAGAWA Guest

  5. #5

    Default Re: slow IO


    On Feb 13, 2004, at 10:55 AM, David King Landrith wrote:
     

    Sorry, no offense intended. I read this as a forward slash, i.e. a
    windows command line switch. My bad.

    system("echo \f >> #{outputFile}")

    It's possible that you're wasting time in the garbage collector. Try
    using a static buffer:

    buf = input.sysread( count, buf )









    J.Herre Guest

  6. #6

    Default Re: slow IO

    Hi,

    At Sat, 14 Feb 2004 03:55:05 +0900,
    David King Landrith wrote in [ruby-talk:92806]: 

    On what filesystem did you execute it? EXT[23]-fs is not
    efficient for a huge directory.

    --
    Nobu Nakada


    nobu.nokada@softhome.net Guest

  7. #7

    Default Re: slow IO

    On Saturday 14 February 2004 09:49, net wrote: 
    >
    > On what filesystem did you execute it? EXT[23]-fs is not
    > efficient for a huge directory.[/ref]

    with htree its perfectly fine. try a tune2fs

    --
    When in doubt, use brute force.
    -- Ken Thompson


    Alexander Guest

  8. #8

    Default Re: slow IO

    We're using Reiserfs already.


    On Feb 14, 2004, at 9:45 AM, Kirk Haines wrote:
     
    >>
    >> On what filesystem did you execute it? EXT[23]-fs is not
    >> efficient for a huge directory.[/ref]
    >
    > Yes. If you have a partition that you can do it on (or you can put
    > another drive into the machine to make one), try a reiserfs partition
    > for
    > your data and see if that changes your results in a positive way.
    >
    >
    > Kirk Haines
    >
    >
    >
    >[/ref]
    -------------------------------------------------------
    David King Landrith
    (w) 617.227.4469x213
    (h) 617.696.7133

    Generic quotes offend nobody.
    --Unknown
    -------------------------------------------------------
    public key available upon request



    David Guest

Similar Threads

  1. FP on Mac = slow
    By schildkroeter in forum Macromedia Flash Player
    Replies: 0
    Last Post: April 6th, 12:31 PM
  2. 7.0.8 on OS X 10.4.6 Slow?
    By David_E._S._Stein@adobeforums.com in forum Adobe Acrobat Macintosh
    Replies: 2
    Last Post: July 31st, 06:47 PM
  3. Slow printing on fast copier but fast printing on slow printer!
    By Davie_Helms@adobeforums.com in forum Adobe Acrobat Windows
    Replies: 0
    Last Post: May 7th, 08:24 PM
  4. SLOW DOWN..Why is my Photoshop CS Super Slow in PANTHER?
    By Aerosyn-Lex@adobeforums.com in forum Adobe Photoshop Mac CS, CS2 & CS3
    Replies: 4
    Last Post: February 23rd, 10:42 PM
  5. Slow! Slow! Slow
    By Al Millstein in forum Adobe Photoshop Elements
    Replies: 4
    Last Post: August 17th, 07:59 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139