Professional Web Applications Themes

How can a process consume kernel CPU time without making system calls? - AIX

I'm trying to track down what is going on with a particular rogue application process on a customer site running AIX 4.3.3. Under defined cirstances, the application process can be fired up and while seemingly doing nothing will soar to the top of the topas list of jobs whilst taking the system from largely idle to heavily CPU bound. When this happens, it is always the kernel histogram that goes highest, and if I time(1) the process from start to when I break it out of the resource hogging section I get output like this: usr:5.41 sys:19.53 total:24.94 elapsed:0m31.11s It ...

  1. #1

    Default How can a process consume kernel CPU time without making system calls?

    I'm trying to track down what is going on with a particular rogue
    application process on a customer site running AIX 4.3.3.

    Under defined cirstances, the application process can be fired up and
    while seemingly doing nothing will soar to the top of the topas list of jobs
    whilst taking the system from largely idle to heavily CPU bound.

    When this happens, it is always the kernel histogram that goes highest, and
    if I time(1) the process from start to when I break it out of the resource
    hogging section I get output like this:
    usr:5.41 sys:19.53 total:24.94 elapsed:0m31.11s

    It seemed to me therefore that I could use trace/syscalls to see what system
    calls are contributing to this resource usage, but both of them tell me that
    the pid for the rogue process (the one at the top of the topas list) has
    made *NO* system calls from the time I detect it going into the runaway
    state to the time I stop it.

    Any idea how else it can have spent so much sys time?

    Cheers,

    Ken


    Ken Wallis Guest

  2. #2

    Default Re: How can a process consume kernel CPU time without making system calls?


    run a kernel trace, then use curt to see what is going on.. it will tell
    you exactly what your process is doing..

    you could also use pprof and trpof to do this.

    Rgds
    Mark

    --
    Posted via [url]http://dbforums.com[/url]
    Mark Taylor Guest

  3. #3

    Default Re: How can a process consume kernel CPU time without making system calls?

    Ken Wallis <kenwsempower.aust.com> wrote:
    KW> I'm trying to track down what is going on with a particular rogue
    KW> application process on a customer site running AIX 4.3.3.

    KW> Under defined cirstances, the application process can be fired up and
    KW> while seemingly doing nothing will soar to the top of the topas list of jobs
    KW> whilst taking the system from largely idle to heavily CPU bound.

    KW> When this happens, it is always the kernel histogram that goes highest, and
    KW> if I time(1) the process from start to when I break it out of the resource
    KW> hogging section I get output like this:
    KW> usr:5.41 sys:19.53 total:24.94 elapsed:0m31.11s

    KW> It seemed to me therefore that I could use trace/syscalls to see what system
    KW> calls are contributing to this resource usage, but both of them tell me that
    KW> the pid for the rogue process (the one at the top of the topas list) has
    KW> made *NO* system calls from the time I detect it going into the runaway
    KW> state to the time I stop it.

    That's probably because it has already entered kernel mode by the
    time you start the trace.

    KW> Any idea how else it can have spent so much sys time?

    It's hard to say. You don't tell us what the application does.

    Regards,

    Nicholas

    --
    "Why shouldn't I top-post?" [url]http://www.aglami.com/tpfaq.html[/url]
    "Meanings are another story." [url]http://www.ifas.org/wa/glossolalia.html[/url]
    Nicholas Dronen Guest

  4. #4

    Default Re: How can a process consume kernel CPU time without making system calls?

    Nicholas Dronen wrote:
    > Ken Wallis <kenwsempower.aust.com> wrote:
    >> I'm trying to track down what is going on with a particular rogue
    >> application process on a customer site running AIX 4.3.3.
    >
    >> Under defined cirstances, the application process can be fired up
    >> and while seemingly doing nothing will soar to the top of the topas
    >> list of jobs whilst taking the system from largely idle to heavily
    >> CPU bound.
    >
    >> When this happens, it is always the kernel histogram that goes
    >> highest, and if I time(1) the process from start to when I break it
    >> out of the resource hogging section I get output like this:
    >> usr:5.41 sys:19.53 total:24.94 elapsed:0m31.11s
    >
    >> It seemed to me therefore that I could use trace/syscalls to see
    >> what system calls are contributing to this resource usage, but both
    >> of them tell me that the pid for the rogue process (the one at the
    >> top of the topas list) has made *NO* system calls from the time I
    >> detect it going into the runaway state to the time I stop it.
    >
    > That's probably because it has already entered kernel mode by the
    > time you start the trace.
    I wondered about that.

    The other possibility is that I'm driving trace wrong. Are there
    permissions issues associated with trace? If I start trace as root and then
    run the rogue process as a non-privileged user would that explain why
    tracerpt -p doesn't show anything for the process?
    >> Any idea how else it can have spent so much sys time?
    >
    > It's hard to say. You don't tell us what the application does.
    It reports on a database. By the time it is sitting hogging all the sys CPU
    it is at an input prompt, displaying the first screen of data and asking
    nicely for input regarding what to do next. I can see that a keyboard
    read() would be a system call, and I can understand that if it kept timing
    out and then being restarted that would use resource, but surely then I'd
    see lots of read()s in the trace output. Oh, and BTW, its a single threaded
    app process - there's no background thread working its way through the rest
    of the result set, it just gathers a screenful of data and displays it then
    asks for directions. Its built using a third party toolset so I don't think
    I can profile it that easily.

    Thanks for your input,

    Ken



    Ken Wallis Guest

  5. #5

    Default Re: How can a process consume kernel CPU time without making system calls?

    Mark Taylor wrote:
    > run a kernel trace, then use curt to see what is going on.. it will
    > tell you exactly what your process is doing..
    I'm a bit of an AIX newbie. How do I do a 'kernel trace'. I've been
    looking through the man pages for trace and tried using that and tracerpt to
    tell me what is going on, but tracerpt -p says there are no system calls for
    the process concerned.
    > you could also use pprof and trpof to do this.
    Thanks, I'll do a man tprof and see what that does.

    Cheers,

    Ken


    Ken Wallis Guest

  6. #6

    Default Re: How can a process consume kernel CPU time without making system calls?

    Ken Wallis <kenwsempower.aust.com> wrote:
    KW> Nicholas Dronen wrote:
    >> Ken Wallis <kenwsempower.aust.com> wrote:
    >>> I'm trying to track down what is going on with a particular rogue
    >>> application process on a customer site running AIX 4.3.3.
    >>
    >>> Under defined cirstances, the application process can be fired up
    >>> and while seemingly doing nothing will soar to the top of the topas
    >>> list of jobs whilst taking the system from largely idle to heavily
    >>> CPU bound.
    >>
    >>> When this happens, it is always the kernel histogram that goes
    >>> highest, and if I time(1) the process from start to when I break it
    >>> out of the resource hogging section I get output like this:
    >>> usr:5.41 sys:19.53 total:24.94 elapsed:0m31.11s
    >>
    >>> It seemed to me therefore that I could use trace/syscalls to see
    >>> what system calls are contributing to this resource usage, but both
    >>> of them tell me that the pid for the rogue process (the one at the
    >>> top of the topas list) has made *NO* system calls from the time I
    >>> detect it going into the runaway state to the time I stop it.
    >>
    >> That's probably because it has already entered kernel mode by the
    >> time you start the trace.
    KW> I wondered about that.

    KW> The other possibility is that I'm driving trace wrong. Are there
    KW> permissions issues associated with trace? If I start trace as root and then
    KW> run the rogue process as a non-privileged user would that explain why
    KW> tracerpt -p doesn't show anything for the process?

    I don't have the docs handy, so I don't know what -p does, but
    if you're doing a global trace, it should pick up all process
    activity. (Good tool in AIX 5.x: truss. Better than the
    silly syscalls by *far*.)
    >>> Any idea how else it can have spent so much sys time?
    >>
    >> It's hard to say. You don't tell us what the application does.
    KW> It reports on a database. By the time it is sitting hogging all the sys CPU
    KW> it is at an input prompt, displaying the first screen of data and asking
    KW> nicely for input regarding what to do next. I can see that a keyboard
    KW> read() would be a system call, and I can understand that if it kept timing
    KW> out and then being restarted that would use resource, but surely then I'd
    KW> see lots of read()s in the trace output. Oh, and BTW, its a single threaded
    KW> app process - there's no background thread working its way through the rest
    KW> of the result set, it just gathers a screenful of data and displays it then
    KW> asks for directions. Its built using a third party toolset so I don't think
    KW> I can profile it that easily.

    Perhaps it's doing a select(2) on standard input with a low timeout.

    Regards,

    Nicholas

    --
    "Why shouldn't I top-post?" [url]http://www.aglami.com/tpfaq.html[/url]
    "Meanings are another story." [url]http://www.ifas.org/wa/glossolalia.html[/url]
    Nicholas Dronen Guest

  7. #7

    Default Re: How can a process consume kernel CPU time without making system calls?

    Nicholas Dronen wrote:
    > Ken Wallis <kenwsempower.aust.com> wrote:
    >> Nicholas Dronen wrote:
    >>> Ken Wallis <kenwsempower.aust.com> wrote:
    >>>> I'm trying to track down what is going on with a particular rogue
    >>>> application process on a customer site running AIX 4.3.3.
    >>>
    >>>> Under defined cirstances, the application process can be fired
    >>>> up and while seemingly doing nothing will soar to the top of the
    >>>> topas list of jobs whilst taking the system from largely idle to
    >>>> heavily CPU bound.
    >>>
    >>>> When this happens, it is always the kernel histogram that goes
    >>>> highest, and if I time(1) the process from start to when I break it
    >>>> out of the resource hogging section I get output like this:
    >>>> usr:5.41 sys:19.53 total:24.94 elapsed:0m31.11s
    >>>
    >>>> It seemed to me therefore that I could use trace/syscalls to see
    >>>> what system calls are contributing to this resource usage, but both
    >>>> of them tell me that the pid for the rogue process (the one at the
    >>>> top of the topas list) has made *NO* system calls from the time I
    >>>> detect it going into the runaway state to the time I stop it.
    >>>
    >>> That's probably because it has already entered kernel mode by the
    >>> time you start the trace.
    >
    >> I wondered about that.
    >
    >> The other possibility is that I'm driving trace wrong. Are there
    >> permissions issues associated with trace? If I start trace as root
    >> and then run the rogue process as a non-privileged user would that
    >> explain why tracerpt -p doesn't show anything for the process?
    >
    > I don't have the docs handy, so I don't know what -p does, but
    > if you're doing a global trace, it should pick up all process
    > activity. (Good tool in AIX 5.x: truss. Better than the
    > silly syscalls by *far*.)
    Sadly, this problem only seems to happen on some 4.3.3 sites so I can't use
    truss to see what it is doing.

    tracerpt -p pid restricts the output to calls made by the specified process
    id.

    tracerpt without -p shows me heaps of stuff is happening of course, but
    tracerpt -p <rogue pid> shows nothing.
    >>>> Any idea how else it can have spent so much sys time?
    >>>
    >>> It's hard to say. You don't tell us what the application does.
    >
    >> It reports on a database. By the time it is sitting hogging all the
    >> sys CPU it is at an input prompt, displaying the first screen of
    >> data and asking nicely for input regarding what to do next. I can
    >> see that a keyboard read() would be a system call, and I can
    >> understand that if it kept timing out and then being restarted that
    >> would use resource, but surely then I'd see lots of read()s in the
    >> trace output. Oh, and BTW, its a single threaded app process -
    >> there's no background thread working its way through the rest of the
    >> result set, it just gathers a screenful of data and displays it then
    >> asks for directions. Its built using a third party toolset so I
    >> don't think I can profile it that easily.
    >
    > Perhaps it's doing a select(2) on standard input with a low timeout.
    Perhaps, but shouldn't that show up as lots of select() calls, not *NO*
    calls and lots of sys CPU time?

    Cheers,

    Ken


    Ken Wallis Guest

  8. #8

    Default Re: How can a process consume kernel CPU time without making systemcalls?

    Ken Wallis wrote:
    [SNIP]
    > It reports on a database. By the time it is sitting hogging all the sys CPU
    > it is at an input prompt, displaying the first screen of data and asking
    > nicely for input regarding what to do next. I can see that a keyboard
    > read() would be a system call, and I can understand that if it kept timing
    > out and then being restarted that would use resource, but surely then I'd
    > see lots of read()s in the trace output. Oh, and BTW, its a single threaded
    > app process - there's no background thread working its way through the rest
    > of the result set, it just gathers a screenful of data and displays it then
    > asks for directions. Its built using a third party toolset so I don't think
    > I can profile it that easily.
    >
    > Thanks for your input,
    >
    > Ken
    >
    >
    >
    What is the product?

    Stuff that has been 'ported from DOS/Windows often has nasty little
    "check for keyboard input" loops in it (I should know, I've done it a
    fair few times), but it does not necessarily show up as "read" calls in
    a trace, there are other ways to check for data present on a file
    decriptor (e.g. the FIOCONTROL ioctl).

    For example, xflex programmers have a KEYCHECK command that they use
    heavily in DOS/Windoze programs to look for a keystroke while updating
    some other field on screen (e.g. the current time) which will _always_
    ramp the CPU up to 100% on UNIX systems. It tends to lead to complaints.

    Cheers,
    Gary B-)

    Gary Schmidt Guest

  9. #9

    Default Re: How can a process consume kernel CPU time without making system calls?

    >From: Dale Talcott
    >Sent: Tuesday, August 12, 2003 11:47 PM
    >In comp.unix.aix you write:
    >
    >>Gary Schmidt wrote:
    >>>> It reports on a database. By the time it is sitting hogging all the
    >>>> sys CPU it is at an input prompt, displaying the first screen of
    >>>> data and asking nicely for input regarding what to do next.
    >...
    >>The third party toolset is called jBASE. It comprises a set
    >of libraries
    >>and a compiler that generates 'C' from a variant of BASIC.
    >jBASE originated
    >>on UNIX but also supports Win32.
    >
    >This is an interesting problem. Be sure to post the answer if
    >you get it figured out.
    >
    >Another thing you might try is to use dbx to attach (-a pid) to the
    >process when it is in this state. If dbx is able to attach, you can
    >then get a traceback which should give a good clue about where to
    >look further.
    Thanks Dale, I did that, it showed I was in a read(), and I was able to do a
    where
    and see that absolutely nothing seemed strange about the call stack at that
    point:
    in a read() called from a library function named appropriately for an input
    routine.
    >Also, you asked whether running trace as root would affect whether you
    >could monitor a process run as non-root. The answer is no.
    >Trace records
    >kernel calls, no matter what the source. To double-check, though, if
    >you run trcrpt without -p, do you get any activity for your process?
    >Another check: /var/adm/ras/trcfile should be mode 666, so anyone can
    >write/read it.
    >
    >Here is my little reminder for how to use trace:
    >
    > trace -a -d -L 5000000 -s -k 1B,100,116,117
    > # The -L parameter affects the max data collected.
    > trcon # to start collection
    > trcoff # to stop collection
    > trcstop # to turn off trace
    > trcrpt -O pid=on,timestamp=1 | less
    >
    >Substitute "more" for "less" if you don't have less installed.
    >
    >In your case, I would get your program into its busy state, run the
    >trace command from the above list, then run
    >
    > trcon;sleep 5;trcoff;trcstop
    >
    >Then, run trcrpt and watch for dispatch events for the process id of
    >interest.
    Thanks, I finally managed to get trace to show that my rogue process was
    actually making system calls. The issue was the size of the log file. I
    had to increase the size by a factor of 10 over your suggestion to get it to
    hold enough info to be useful.

    The fundamental difference between when the process runs nicely and when it
    goes wild seems to be that on the nice invocation tracerpt -p gets to a
    single read() and blocks:

    101 130506 kread 9.095927 kread LR = D0186840
    163 130506 kread 9.095927 read(0,2FF1FA50,1)
    /dev/pts/28
    7FF 130506 kread 9.095930 STREAMS (PSE)
    mps_read(001B001C, 2FF3B390, 00000000, 00000000)
    402 130506 kread 9.095936 (001B,001C) ldterm
    wput ptr 708D4400 msg 70799900 msg_type M_READ
    402 130506 kread 9.095937 (001B,001C) ldterm
    wput break at line 1459
    402 130506 kread 9.095937 (001B,001C) ldterm
    wput ret 0 from line 1787
    460 130506 kread 9.095938 e_assert_wait:
    tid=131537 anchor=707BDE44 flag=1 lr=14CF7A8
    462 130506 kread 9.095939 e_block_thread:
    tid=131537 anchor=707BDE44 t_flags=0000 lr=14CF830

    While in the rogue mode, it loops continuously like this:

    101 183150 kread 8.309180 kread LR = D0186840
    163 183150 kread 8.309180 read(0,2FF1FA50,1)
    /dev/pts/80
    7FF 183150 kread 8.309181 STREAMS (PSE)
    mps_read(001B0050, 2FF3B390, 00000000, 00000000)
    402 183150 kread 8.309184 (001B,0050) ldterm
    wput ptr 70865400 msg 70698F00 msg_type M_READ
    402 183150 kread 8.309186 (001B,0050) ldterm
    wput break at line 1459
    402 183150 kread 8.309186 (001B,0050) ldterm
    wput ret 0 from line 1787
    7FF 183150 kread 8.309189 STREAMS (PSE)
    mps_read returned 0
    104 183150 kread 8.309190 return from kread [9 usec]
    101 183150 times 8.309191 times LR = D0846770
    18E 183150 times 8.309192 times: times u=18
    s=0 cu=179 cs=81 (ticks)
    104 183150 times 8.309192 return from times [1 usec]
    101 183150 kread 8.309193 kread LR = D0186840
    163 183150 kread 8.309194 read(0,2FF1FA50,1)
    /dev/pts/80
    7FF 183150 kread 8.309195 STREAMS (PSE)
    mps_read(001B0050, 2FF3B390, 00000000, 00000000)
    402 183150 kread 8.309198 (001B,0050) ldterm
    wput ptr 70865400 msg 70698F00 msg_type M_READ
    402 183150 kread 8.309199 (001B,0050) ldterm
    wput break at line 1459
    402 183150 kread 8.309200 (001B,0050) ldterm
    wput ret 0 from line 1787
    7FF 183150 kread 8.309203 STREAMS (PSE)
    mps_read returned 0
    104 183150 kread 8.309203 return from kread [9 usec]
    101 183150 times 8.309204 times LR = D0846770
    18E 183150 times 8.309205 times: times u=18
    s=0 cu=179 cs=81 (ticks)
    104 183150 times 8.309205 return from times [1 usec]

    So I suspect that in rogue mode stdin has been set up for non-blocking IO
    with ioctl, but I can't see enough detail in the trace output to be sure on
    this. I've tried hacking in a bit of 'C' to do an ioctl() and switch
    FIONBIO off again, but that made no difference.

    Any other thoughts as to why one read would block and another not?

    Thanks for the trace instructions anyway.

    Cheers,

    Ken


    Ken Wallis Guest

  10. #10

    Default Re: How can a process consume kernel CPU time without making system calls?

    "Ken Wallis" <kenwsempower.aust.com> writes:
    > So I suspect that in rogue mode stdin has been set up for non-blocking IO
    > with ioctl, but I can't see enough detail in the trace output to be sure on
    > this. I've tried hacking in a bit of 'C' to do an ioctl() and switch
    > FIONBIO off again, but that made no difference.
    Your trace appears to show your process receiving a constant stream
    of EOFs (read(0, ..., 1) returns 0), and not (properly) checking
    for this.

    Did the user hit ^D ?

    Here is a common programming bug that often leads to such behaviour:

    char ch; /* ought to be int */
    while ((ch = getchar()) != EOF) { ...

    Cheers,
    --
    In order to understand recursion you must first understand recursion.
    Paul Pluzhnikov Guest

  11. #11

    Default Re: How can a process consume kernel CPU time without making system calls?

    Paul Pluzhnikov wrote:
    > "Ken Wallis" <kenwsempower.aust.com> writes:
    >
    >> So I suspect that in rogue mode stdin has been set up for
    >> non-blocking IO with ioctl, but I can't see enough detail in the
    >> trace output to be sure on this. I've tried hacking in a bit of 'C'
    >> to do an ioctl() and switch FIONBIO off again, but that made no
    >> difference.
    >
    > Your trace appears to show your process receiving a constant stream
    > of EOFs (read(0, ..., 1) returns 0), and not (properly) checking
    > for this.
    >
    > Did the user hit ^D ?
    No, nobody hits anything. The read() appears to return 0 in preference to
    blocking.

    From:
    [url]http://publib.boulder.ibm.com/doc_link/en_US/a_doc_lib/libs/basetrf2/read.htm[/url]
    When attempting to read from a character special file that supports
    nonblocking reads, such as a terminal, and no data is currently available:

    a.. If O_NDELAY and O_NONBLOCK are clear (the default), the read
    subroutine blocks until data becomes available.
    b.. If O_NDELAY is set, the read subroutine returns 0.
    c.. If O_NONBLOCK is set, the read subroutine returns -1 and sets the
    errno global variable to EAGAIN if no data is available.
    So, it seems to me that the code hasn't internally done an ioctl(0, FIONBIO,
    ....) but instead has more likely done an fcntl(0, F_SETFL, O_NDELAY).

    Anyhow, I have now managed to isolate a single line change that can be made
    to the configuration of these enquiries which prevents the errant behaviour
    from occurring. All I have to do now is find a simple way to te it
    so I can submit it formally as a bug to the third party vendor.

    Thanks for the input guys. at least I was able to determine that system
    calls *were* being made and that the kernel CPU consumption did have a
    rational explanation.

    Cheers,

    Ken


    Ken Wallis Guest

Similar Threads

  1. Replies: 1
    Last Post: May 30th, 09:28 AM
  2. In-Process Soap Calls
    By Nezaam Joseph in forum ASP.NET Web Services
    Replies: 0
    Last Post: September 9th, 07:35 AM
  3. System.Net.WebException making web service calls about 10% of the time.
    By Chris Langston in forum ASP.NET Web Services
    Replies: 4
    Last Post: September 7th, 10:03 PM
  4. Zip &amp; unzip using system calls
    By MAR in forum PHP Development
    Replies: 3
    Last Post: April 25th, 10:25 AM
  5. Finding system data with system calls.
    By Dr. David Kirkby in forum AIX
    Replies: 2
    Last Post: July 25th, 02:46 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139