Professional Web Applications Themes

Ruby 1.8.1 REXML performance - Ruby

I have a script that uses REXML to stream p an XML file and load a MySQL database from it. The file is about 65000 lines, 2.2 MB. With Ruby 1.8.0, it completes in about 25 seconds. I just installed Ruby 1.8.1, and the performance is at least a factor of 20 worse. This is on a 1 GHz P3 running Gentoo Linux. I don't have head-to-head comparison numbers, but I'm observing the same effect on a 3.2 GHz P4 running RedHat 9. The script runs for 16 minutes when it should have completed in a few seconds. I've commented ...

  1. #1

    Default Ruby 1.8.1 REXML performance

    I have a script that uses REXML to stream p an XML file and load a
    MySQL database from it. The file is about 65000 lines, 2.2 MB. With Ruby
    1.8.0, it completes in about 25 seconds. I just installed Ruby 1.8.1,
    and the performance is at least a factor of 20 worse.

    This is on a 1 GHz P3 running Gentoo Linux. I don't have head-to-head
    comparison numbers, but I'm observing the same effect on a 3.2 GHz P4
    running RedHat 9. The script runs for 16 minutes when it should have
    completed in a few seconds.

    I've commented out the database calls, so I'm just measuring the
    Ruby/REXML performance.

    Any ideas?



    Steven Guest

  2. #2

    Default Re: Ruby 1.8.1 REXML performance

    Here's a little more data. I've stripped the application down to a
    bare-bones stream pr:

    ---------------
    #!/usr/bin/env ruby

    require "rexml/doent"
    require "rexml/streamlistener"

    class Handler
    include REXML::StreamListener

    def initialize
    end

    def tag_start(name, attrs)
    end

    def tag_end(name)
    end

    def text(t)
    end

    def cdata(d)
    end
    end

    REXML::Doent.p_stream($stdin, Handler.new)
    ---------------

    I'm running the tests on two different machines because it's a pain to
    switch minor versions on the same machine. But I have run tests on the
    same machine to verify the results.

    To p a 65,000 line XML file:

    machine1 machine2
    1 GHz 3.2 GHz
    Ruby 1.8.0 Ruby 1.8.1
    17 seconds 515 seconds

    Steve


    Steven Guest

  3. #3

    Default Re: Ruby 1.8.1 REXML performance

    Quick update--I'll post numbers later. It appears that the REXML stream
    pr in Ruby 1.8.1 is worse than O(n) in the number of input lines (or
    tags). Ruby 1.8.0 appears to be pretty close to O(n) (as it should).

    I hope soon I won't be the only person in this thread :-).

    Steve




    Steven Guest

  4. #4

    Default Re: Ruby 1.8.1 REXML performance

    Steven Jenkins <org> wrote in message news:<org>... 

    Steve,

    One thing you can do is run your code through the profiler (ruby -r
    profile yourcode.rb) and show us the results. This could help us
    identify a potential bottleneck in REXML and/or Ruby itself.

    Regards,

    Dan
    Daniel Guest

  5. #5

    Default Re: Ruby 1.8.1 REXML performance

    Daniel Berger wrote: 

    I've been running lots of tests. Unfortunately, the profiler adds a
    large O(n) component to the runtime, so the nonlinear behavior doesn't
    emerge until larger n. (n is the number of tags in the input XML file.)

    I have a lot of data now and I'll have to organize it a little. The
    method call counts look fine--they are all linear in n (with a small
    constant offset.) One thing that does jump out, however, is that as n
    goes from 1000 to 50000, the time per call for String#strip goes
    increases by factor of about 10. For n = 1000, String#strip accounted
    for about 6% of the total runtime. For n = 50000, it's about 30%.
    (Without the profiler, the percentages would be even higher.) That must
    mean String#strip is being passed longer arguments for larger n.

    String#strip is called exactly the same number of times as:

    REXML::Prs::BasePr#pull
    REXML::Prs::BasePr#has_next?
    REXML::Prs::BasePr#empty?
    REXML::IOSource#empty?
    REXML::Source#empty?

    IOSource#empty? and Source#empty both call String#strip directly, so I
    suspect that's where the problem is.

    Steve


    Steven Guest

  6. #6

    Default Re: Ruby 1.8.1 REXML performance

    I've repeated all the tests with Ruby 1.8.0 and it sure looks like
    String#strip is the problem.

    To summarize, I have a bare REXML stream pr that reads its input
    file and does nothing. I'm handing it an input file that looks like this

    <?xml version="1.0" ?>
    <multistatus>
    <response>
    <dsref>0</dsref>
    </response>
    <response>
    <dsref>1</dsref>
    </response>
    ...
    </multistatus>

    I'm varying the number of <response> elements in the file from 100 to
    50000 and calling that n. In Ruby 1.8.0, the runtime is roughly
    proportional to n, as it should be. In 1.8.1, the total runtime is
    greater (which is bad but not pathological) and worse than linear in n
    (which is pathological).

    Here's the profile data for strip over all values of n:

    time/call % total runtime
    min max min max

    Ruby 1.8.0 .01 .02 .82 2.48
    Ruby 1.8.1 .10 1.21 5.08 29.04

    In both cases, the number of calls is proportional to n, although strip
    is called 2.5 times more often in 1.8.0 than 1.8.1. In 1.8.1 (only),
    there is a strong increasing trend in the time per call as n increases.

    I don't suspect the problem is in strip itself. I suspect the problem is
    that REXML creates a string that grows as more input is read and hands
    it repeatedly to strip. I looked briefly at the REXML code and didn't
    see anything obvious. (Other than the fact that calling strip on a
    buffer just to see whether it has any tokens in it is inefficient. That
    just makes it slow, however, not nonlinear--unless the buffer is growing.)

    I hope this is enough for someone to diagnose and fix the problem. I'll
    send the data to anyone who wants it.

    Steve


    Steven Guest

  7. #7

    Default Re: Ruby 1.8.1 REXML performance

    Hello Steven,

    Sunday, February 1, 2004, 6:17:27 PM, you wrote:

    SJ> I've repeated all the tests with Ruby 1.8.0 and it sure looks like
    SJ> String#strip is the problem.

    SJ> To summarize, I have a bare REXML stream pr that reads its input
    SJ> file and does nothing. I'm handing it an input file that looks like this

    SJ> <?xml version="1.0" ?>
    SJ> <multistatus>
    SJ> <response>
    SJ> <dsref>0</dsref>
    SJ> </response>
    SJ> <response>
    SJ> <dsref>1</dsref>
    SJ> </response>
    SJ> ...
    SJ> </multistatus>

    SJ> I'm varying the number of <response> elements in the file from 100 to
    SJ> 50000 and calling that n. In Ruby 1.8.0, the runtime is roughly
    SJ> proportional to n, as it should be. In 1.8.1, the total runtime is
    SJ> greater (which is bad but not pathological) and worse than linear in n
    SJ> (which is pathological).

    SJ> Here's the profile data for strip over all values of n:

    SJ> time/call % total runtime
    SJ> min max min max

    SJ> Ruby 1.8.0 .01 .02 .82 2.48
    SJ> Ruby 1.8.1 .10 1.21 5.08 29.04

    SJ> In both cases, the number of calls is proportional to n, although strip
    SJ> is called 2.5 times more often in 1.8.0 than 1.8.1. In 1.8.1 (only),
    SJ> there is a strong increasing trend in the time per call as n increases.

    SJ> I don't suspect the problem is in strip itself. I suspect the problem is
    SJ> that REXML creates a string that grows as more input is read and hands
    SJ> it repeatedly to strip. I looked briefly at the REXML code and didn't
    SJ> see anything obvious. (Other than the fact that calling strip on a
    SJ> buffer just to see whether it has any tokens in it is inefficient. That
    SJ> just makes it slow, however, not nonlinear--unless the buffer is growing.)

    SJ> I hope this is enough for someone to diagnose and fix the problem. I'll
    SJ> send the data to anyone who wants it.

    You are the only one talking with you in this thread. Do you really
    think that someone else will do your job ? Do it for free ? Welcome to the
    world of OpenSource Software (call it non active supported software),
    write a personal email to the REXML author hope he respond, otherwise
    fix it and hope he adds it to the code, otherwise publish your source
    patch and hope more people use it then the original branch, otherwise repeat
    the same thing every few month as long as you need this library.

    Yes, no doubt, i'm not an open source fan.

    --
    Best regards,
    Lothar mailto:com



    Lothar Guest

  8. #8

    Default Re: Ruby 1.8.1 REXML performance

    On Mon, Feb 02, 2004 at 04:01:09AM +0900, Lothar Scholz wrote: 

    His job ? it's part of the base system after all.
     

    So we should ask for money if you come here and ask for help ?
     

    Go use Java on solaris then. (http://www.archub.org/javamemo.txt).
    And pay Matz for ruby.

    Best regards,

    -Martin


    Martin Guest

  9. #9

    Default Re: Ruby 1.8.1 REXML performance

    Lothar Scholz wrote:
     
    Tho i think this thread would be more relevant on the rexml ml, I have
    been reading the thread and it does interest me (I've not been
    responding because i prob could not help). I think you've been dilegent
    and helpful in pointing out either a bug or performance hit. I did not
    think you were asking for anything for free or whatever - besides, to
    me, that's what these ml's are for, in any case. Good luck in
    finding/helping find the culprit for the performance problem.


    paul Guest

  10. #10

    Default Re: Ruby 1.8.1 REXML performance

    Lothar Scholz wrote: 

    Other people are reading it. You read it. It gets archived.
     

    It's not my job. I observed something interesting and I reported it.
     

    Thanks for the advice. I reported this problem Saturday morning US
    Pacific time. It is now Sunday morning. In the intervening day others
    and I have (1) confirmed my observations, (2) carried on an off-list
    discussion of the problem, and (3) applied a patch developed by kapheine
    that completely solves the problem.

    And we all did it for free. It was even fun.
     

    So what?

    Steve




    Steven Guest

  11. #11

    Default Re: Ruby 1.8.1 REXML performance

    For the record, here's a workaround:

    http://butter.homeunix.net/~kapheine/source.rb

    Here's the bug report:

    http://www.ruby-lang.org/cgi-bin/ruby-bugs/incoming?id=1271;expression=rexml;page=1;user=gues t

    Thanks to all who helped on IRC.

    Steve


    Steven Guest

  12. #12

    Default Re: Ruby 1.8.1 REXML performance

    On Monday, February 2, 2004, 6:29:30 AM, Steven wrote:
     [/ref]
     

    Well done. I was just about to send a private note thanking you for
    your diligence and good investigation and reporting. Lothar's poor
    public comment, however, has compelled me to make my comment public.

    So thank you :)

    Cheers,
    Gavin



    Gavin Guest

  13. #13

    Default (OT) Open Source != Do it yourself (was: Re: Ruby 1.8.1 REXML performance)

    Open source == Learn it yourself by trying and get it right if you can,
    with as much help as others are willing to give you in your pursuit.

    That is one reason why I've loved Ruby so much. The community _is_ a
    community, not just a collection of individuals with similar traits.

    I don't think I have a clue about programming yet, but there are those who
    are willing to help me get a clue.

    I will forever be grateful to those who help - no matter how large or small
    their help might end up being.
     
    repeat 

    No doubt I _am_ an open source and Ruby fan.

    -Rich



    Rich Guest

  14. #14

    Default Re: (OT) the Ruby community (was (OT) Open Source != Do it yourself

    On Mon, 02 Feb 2004 07:12:26 +0900, Rich wrote:
     

    Here's something I never thought I'd get from working on an O/S project:
    I'm in the U.S., but I've communicated via email with Ruby users in Spain,
    Germany, England, Austrailia, and Japan. We're not just a community, but a
    _global_ community.
    Tim Guest

  15. #15

    Default Re: Ruby 1.8.1 REXML performance

    On Mon, 2 Feb 2004 04:01:09 +0900, Lothar Scholz
    <com> wrote:
     

    Hmm, that's the third or fifth message I am reading from you today,
    and they all were negative and/or rude. You don't seem to like Ruby or
    its community much. Someone asked you, in another thread, if you are a
    troll. I believe the question was on target.
     

    Quite apparently not one of decent manners, either.

    M.
    Michael Guest

  16. #16

    Default Re: Ruby 1.8.1 REXML performance

    * Sam Roberts <com> [0228 16:28]: 
    > >
    > > Hmm, that's the third or fifth message I am reading from you today,
    > > and they all were negative and/or rude. You don't seem to like Ruby or
    > > its community much. Someone asked you, in another thread, if you are a
    > > troll. I believe the question was on target.
    > > [/ref]
    >
    > Which is particularly weird, given that Lothar is the principle of a
    > company which claims to be developing IDEs for open source scripting
    > languages (betas expected at end of the year...)!
    >
    > Like Ruby:
    >
    > http://www.ruby-ide.com/arachno_ruby.php[/ref]

    Maybe its a troll impersonating him?
    Or he has MPD?

    --
    If you can read this, you're too close.
    Rasputin :: Jack of All Trades - Master of Nuns


    Dick Guest

  17. #17

    Default Re: Ruby 1.8.1 REXML performance

    Steven Jenkins <org> wrote in message news:<org>... 


    Steve,

    Sorry for the delay. I've been really distracted for the past few
    months with this and that. I'll check your workaround, and your bug
    report. I've got a ton of work to do with REXML; one reason I've been
    taking so long to fix things is that the performance issue with 1.8.x
    is really bothering me; the memory usage of REXML is much worse under
    1.8 than it is under 1.6, and I've been blocked trying to figure that
    out.

    Anyway, sorry -- to everybody -- for being such a laggard. Things
    should be better now.

    --- SER
    Sean Guest

  18. #18

    Default Re: Ruby 1.8.1 REXML performance

    Sean Russell wrote: 

    I know how that goes. Fortunately, Zachary Landau already had a
    workaround; I just had to discover it.

    Good luck with the 1.8 work. I use REXML in a couple of simple but
    important applications. I'm happy with it overall. Performance, as you
    note, could be better, but it's not a show-stopper.

    Steve


    Steven Guest

  19. #19

    Default Re: Ruby 1.8.1 REXML performance

    Steven Jenkins <org> wrote in message news:<org>... 

    Yeah, well... I think REXML is at the point where I'm squeezing
    optimizations out of it. Many of the optimizations I've tried have
    made things worse, and I end up backing them out. I'm not out of
    ideas yet, but I doubt that we'll ever see an order of magnitude
    improvement, until I write an optional back-end that makes use of a
    native parsing library.

    One thing you can do, though, is drop out of the advanced API and use
    the light pr. The light pr API is pretty limited, at the
    moment; it'll be a lot more useful when I get it hooked into XPath,
    but it should nearly double the speed of parsing. It does this by
    eschewing the heavy, traditional, REXML tree objects. The price is
    that you don't get all of the neat helper functions; at that point,
    you're dealing with pretty raw data. The API isn't well doented
    yet, but it works.

    --- SER
    Sean Guest

Similar Threads

  1. ANN: REXML 2.7.2
    By Sean in forum Ruby
    Replies: 17
    Last Post: December 15th, 08:25 PM
  2. Performance: Ruby vs Java
    By Lalit Pant in forum Ruby
    Replies: 16
    Last Post: September 23rd, 11:09 AM
  3. [ANN] ruby-freedb, ruby-serialport, ruby-mp3info moved to Rubyforge
    By guillaume.pierronnet@ratp.fr in forum Ruby
    Replies: 0
    Last Post: August 31st, 11:57 PM
  4. Replies: 0
    Last Post: August 12th, 02:00 AM
  5. Replies: 2
    Last Post: July 16th, 09:49 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139