Professional Web Applications Themes

fork() race in SIGCHLD handler - UNIX Programming

Hi, In tracking down some unexpected behaviour in a program of mine, I have noticed that the test code below exhibits a timing race on fork() in Linux kernel 2.6.0. If the test code below illustrating the problem is compiled and run, the first output is "child_pid is less than zero", all subsequent lines being the expected "child_pid is greater than zero". (I get "child_pid is greater than zero" at all times with kernel 2.4.23). What is clearly happening is that by the time the child process has exited and the SIGCHLD handler is executing in the parent process, the ...

  1. #1

    Default fork() race in SIGCHLD handler

    Hi,

    In tracking down some unexpected behaviour in a program of mine, I have
    noticed that the test code below exhibits a timing race on fork() in Linux
    kernel 2.6.0. If the test code below illustrating the problem is compiled
    and run, the first output is "child_pid is less than zero", all subsequent
    lines being the expected "child_pid is greater than zero". (I get
    "child_pid is greater than zero" at all times with kernel 2.4.23).

    What is clearly happening is that by the time the child process has exited
    and the SIGCHLD handler is executing in the parent process, the parent
    process has not had its copy of the static child_pid variable modified by
    having the return value of the fork() call assigned to it in its own
    address space. The parent process after fork()ing is beginning after the
    child process has already terminated.

    At first I thought this may be a kernel or glibc bug, but on reflection I am
    not so sure - I do not think the asynchronous SIGCHLD handler in the parent
    process is required to wait around for the process in which it is executing
    to begin executing synchronously after fork()ing. I can get around the
    difficulty by putting a flag in the first line of the parent process after
    the fork(), on which the SIGCHLD handler can block, but does POSIX say
    anything about this? (Apologies for posting this to comp.unix.programmer
    as well as comp.os.linux.development.system, but the comp.unix.programmer
    regulars seem to know a lot about POSIX).

    It is interesting though that the assignment of -1 to child_pid in the last
    line of the signal handler does not cause subsequent calls to the handler
    to exhibit the "unexpected" behaviour. Probably on all iterations after
    the first the address space of the parent process has been set up so that
    copies-on-write to its copy of child_pid take place quickly.

    This is with glibc 2.3.2, and the compiler is gcc-3.2.3.

    Chris.

    //////////////////// test code ///////////////////

    #include <signal.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <unistd.h>
    #include <stdlib.h>
    #include <time.h>

    pid_t child_pid = -1;

    void childexit_signalhandler(int sig) {

    char message_less[] = "child_pid is less than zero\n";
    char message_greater[] = "child_pid is greater than zero\n";

    waitpid(-1, 0, WNOHANG); /* eliminate zombies */

    if (child_pid < 0) write(0, message_less,
    sizeof(message_less));
    else if (child_pid > 0) write(0, message_greater,
    sizeof(message_greater));

    child_pid = -1;
    }

    int main(void) {

    struct sigaction sig_act_chld;
    int index;
    struct timespec delay;
    struct timespec residual_delay;

    sig_act_chld.sa_handler = childexit_signalhandler;
    sigemptyset(&sig_act_chld.sa_mask);
    sig_act_chld.sa_flags = 0;
    sigaction(SIGCHLD, &sig_act_chld, 0);

    for (;;) {
    child_pid = fork();
    if (child_pid > 0) { /* parent process */
    /* set up a 1s delay */
    delay.tv_sec = 1;
    delay.tv_nsec = 0;
    while (nanosleep(&delay, &residual_delay) == -1)
    delay = residual_delay;
    }
    if (!child_pid) { /* child process */
    /* do something meaningless */
    for (index = 0; index < 100; index++);
    _exit(0);
    }
    }
    return 0;
    }


    --
    To reply by e-mail, remove the "--nospam--" in the address
    Chris Guest

  2. #2

    Default Re: fork() race in SIGCHLD handler

    In comp.os.linux.development.system Chris Vine <freeserve.co.uk> wrote: 

    I recall linus saying that he had changed the order of the first
    process to execute after fork twice in the 2.6.0 series. There was some
    problem such as you detail below, but it went away ages ago with the
    changes made some way back now.
     
     

    Well, this sounds a bit like the old issue of "who runs first". I
    presume this is UP?
     

    Uh huh. Very likely. So? Isn't that allowed? There's nothing wrong with
    that.
     

    Well, in fact the handler executes (because the child dies)
    before the return value from the fork is known!

    It sounds like you should delay setting the handler in the parent till
    the fork has returned, but that leaves a window in which the child can
    die unattended.

     

    Dunno. I imagine so.
     
     

    Unknown by me. Compiler issues will intervene to make that cloudy.
     

    Well try a different compiler for a start! Though I don't know if it
    will make a difference, since so far life sounds vaguely within spec.
    But mapping the space would be useful. Try no optimization.
     

    It makes me nervous not having this static. Shrug. SHouldn't you
    declare this volatile too?

     

     

    So you set the sigchld action to be to tell us what the child_pid is
    known to be in the parent process.
     
     
     
     
     
     
     


    Yes, well. Your child dies and activates the handler, but child_pid in
    the handler still looks the way it used to be before being assigned by
    the fork.

    This sounds as though it's partially a problem with gcc. Can you
    scatter a few "volatile"s around, and see if anything changes?





    Peter
    P.T. Guest

  3. #3

    Default Re: fork() race in SIGCHLD handler

    P.T. Breuer wrote:

    [snip]
     

    It is UP, yes, but the scheduler will provide reasonable equality of
    processor time and I expect the real issue is what process has to accept
    the delays on copy-on-write, and that that has changed between kernel 2.4
    and kernel 2.6.
     [/ref]

    [snip]
     

    I cannot do that, because in the test code I know that the child will return
    before the parent fork() has returned, so I will definitely miss it.
     
    >
    > Dunno. I imagine so.[/ref]

    On further thought, of course my idea will not work - the parent and the
    signal handler are in the same thread of execution, so blocking the handler
    will block the parent.

    I suspect the proper approach is to mask off SIGCHLD before fork()ing, and
    then unmask it again in the parent once the fork() has returned. I have
    just tried that and it appears to work as expected. Phew!
     
    >
    > Unknown by me. Compiler issues will intervene to make that cloudy.

    >
    > Well try a different compiler for a start! Though I don't know if it
    > will make a difference, since so far life sounds vaguely within spec.
    > But mapping the space would be useful. Try no optimization.[/ref]

    Actually, I don't think the compiler is the issue - the explanation for this
    behaviour is indentified by both of us above and should be compiler
    independent. I have tried the same test code on another machine with
    gcc-2.95.3, and with kernel 2.6.0 I get the same problem as with gcc-3.2.
    It also occurs with or without optimisation. Only the mask/unmask approach
    seems to guarantee correct results.
     
    >
    > It makes me nervous not having this static. Shrug. SHouldn't you
    > declare this volatile too?[/ref]

    I probably should, but having changed it to volatile the effect is still the
    same.

    Chris.


    --
    To reply by e-mail, remove the "--nospam--" in the address
    Chris Guest

  4. #4

    Default Re: fork() race in SIGCHLD handler

    "P.T. Breuer" wrote: 

    No matter who executes first you should get correct behaviour.
    Even if you know who executes first, you don't know how many
    cycles it will get. Eventually you will have a situation where
    the first executing process gets preempted before it returned
    from the fork function. It is often desirable to have the child
    run first for performance reasons, since we are talking about
    performance it doesn't hurt too much if we in a few rare cases
    have the parent execute first.

    printf("%d",fork()==fork()==fork()); /* :-) */
     
    >
    > Uh huh. Very likely. So? Isn't that allowed? There's nothing wrong with
    > that.[/ref]

    Not just likely, also desirable. I think you will have the
    smallest number of context switches that way.
     
    >
    > Well, in fact the handler executes (because the child dies)
    > before the return value from the fork is known![/ref]

    Well, there is a race between the process executing and the
    signal handler. You cannot use a lock to protect against a
    singal handler, as that will cause a deadlock. So I guess
    the only solution is to mask the handler while the process
    is in the critical region. And it seems the fork() call and
    the assignment to the variable need both be in the critical
    region. So masking before fork and unmasking after assigning
    to the variable probably is the best (only?) solution.
     

    You need the handler to be ready before the signal can
    arrive. That means the handler must be set up before the
    fork call which is responsible for the signal eventually
    arriving in the parent process. I don't know what will
    happen if you get the signal while it is blocked and first
    then actually installs the handler. I would avoid that by
    installing the handler before fork and just keep the signal
    blocked during the critical region in the parent.
     
    >
    > It makes me nervous not having this static.[/ref]

    Shouldn't make any difference.
     

    Might be a good idea. But is that really necesarry if it
    is protected with a mutual exclusion implemented by
    blocking the signal?

    --
    Kasper Dupont -- der bruger for meget tid paa usenet.
    For sending spam use mailto:au.dk
    /* Would you like fries with that? */
    Kasper Guest

  5. #5

    Default Re: fork() race in SIGCHLD handler

    In comp.os.linux.development.system Kasper Dupont <au.dk> wrote: [/ref]
     

    I'm not quite sure what you mean by masking - blocking, I take it, with
    signals being queued while the block is in place, and treated when the
    block is removed.

    I'm not sure how you achieve that! One can block signals while a handler
    is running (with sigaction), but he wants to block (and queue) them
    while the handler is not running.

    Oh - you mean use sigprocmask, with SIG_BLOCK. Yes, that will work
    nicely. Is there some limit to the number of queued signals? Or
    are the processes sending the signal blocked until it can be delivered?

    That would make sense, but a synchronous signal would be dangerous!
    A process could deadlock y signalling itself while the signal
    was blocked. So I don't believe synchronous signalling can be in posix,
    and hence I believe there must be a finite length to the queue of
    pending signals. Maybe "1". And other signals are lost.

    Oh well, I'm sure you'll enlighten me!
     
     [/ref]
     

    I would feel better if the compiler were forced to implement this
    global variable the way I expect, as a memory location that one writes
    to directly. That would avoid me feeling uneasy.

    Peter
    P.T. Guest

  6. #6

    Default Re: fork() race in SIGCHLD handler

    Kasper Dupont wrote:

    [snip]
     

    As you will see from my follow-up post of earlier today, this was the
    approach which worked. On reflection, it is probably the only approach
    which will work, and it is relatively clean.

    These timing/race issues with signal handlers are some of the most difficult
    to resolve. The syncronisation of multiple threads seems quite easy in
    comparison, and the fork() race issue I stumbled on is a problem waiting to
    happen, because it is unintuitive (until it happens to you, and then after
    a little thought the problem is obvious).

    Chris.

    --
    To reply by e-mail, remove the "--nospam--" in the address
    Chris Guest

  7. #7

    Default Re: fork() race in SIGCHLD handler

    Chris Vine wrote: 

    Indeed. I'm not sure I would have thought about it
    either. But of course after reading your description
    the problem (and solution) was obvious.

    --
    Kasper Dupont -- der bruger for meget tid paa usenet.
    For sending spam use mailto:au.dk
    /* Would you like fries with that? */
    Kasper Guest

  8. #8

    Default Re: fork() race in SIGCHLD handler

    "P.T. Breuer" wrote: 

    Yes blocking is what I mean. Isn't the term masking also
    sometimes used about this? At least the term masking has
    been used about interrupts. Actually interrupts and
    signals are quite similar.
     

    No problem.
     

    Yes.
     

    One of each type.
     

    Nope, the sending process will continue at once.
    (Maybe this is different in the case of RT signals,
    I know something is different, but I'm not sure
    what it is.)
     

    I think this link should answer what happens in
    the case of synchronous signals.
    http://kt.zork.net/kernel-traffic/kt20031201_243.html#9

    But when exactly is a signal synchronous? If it
    is caused by an exception like SIGSEGV and
    similar cases, it is obviously synchronous. But
    how about kill(getpid(),...) or raise()?
     

    I see you got a point. But is there really any
    problem? Doesn't the compiler know, that a system
    call can cause conents in variables to change?

    Actually at compile time the system call is considered
    an external function in a different object file. So
    the compiler knows nothing about what it is going to
    do. Since the variable is global, ie. not static,
    that means any external function could potentially
    change the variable. So the compiler must be prepared
    to handle cases where a system call changes a variable.

    If the variable was static without being volatile, it
    might be interesting. But still, a pointer to the
    signal handler is given to one system call. Later
    another system call unblocking signals causes the
    handler to get invoked. That could have happened even
    without involving signals and with another object
    file written in pure C code. Just assume the first
    function being called saves the function pointer and
    the second calls the previously saved function pointer.
    (It almost happens that way, though a signal of course
    is more complicated). So I cannot see how the compiler
    could do anything of potential harm, as long as you
    use the necesarry system calls to guard against race
    conditions.

    --
    Kasper Dupont -- der bruger for meget tid paa usenet.
    For sending spam use mailto:au.dk
    /* Would you like fries with that? */
    Kasper Guest

  9. #9

    Default Re: fork() race in SIGCHLD handler

    On Tue, 30 Dec 2003 15:50:29 +0000, P.T. Breuer wrote:
     
    >
    > I recall linus saying that he had changed the order of the first
    > process to execute after fork twice in the 2.6.0 series. There was some
    > problem such as you detail below, but it went away ages ago with the
    > changes made some way back now.[/ref]

    Your recollection matches mine, except that the problem
    didn't go away, but was made less likely. As far as I know,
    there is nothing one can do, except make parent and child to
    communicate in some way.

    -- Pete

    Pete Guest

  10. #10

    Default Re: fork() race in SIGCHLD handler

    On Tue, 30 Dec 2003 23:41:38 +0000 Chris Vine <freeserve.co.uk> wrote: 
    >
    > As you will see from my follow-up post of earlier today, this was the
    > approach which worked. On reflection, it is probably the only approach
    > which will work, and it is relatively clean.[/ref]

    One (maybe "the") standard way to synchronize between parent and child
    for who-runs-first is a pipe. Create a pipe before forking, then
    whoever must run 2nd reads from the pipe. When the 1st process is
    past the critical section, it closes the pipe. The other process will
    then unblock.

    I didn't read the thread in detail but it sounds like that's what you
    should be doing here. signal masking/unmasking seems roundabout in
    comparison (to me), and it's probably more fragile.

    /fc
    Frank Guest

  11. #11

    Default Re: fork() race in SIGCHLD handler

    Frank Cusack wrote:

    [snip]
     

    The problem was not really a "who runs first" one (although the problem can
    be solved by ordering "who runs first" using pipes, as you suggest).

    My problem was that the SIGCHLD signal handler was not supposed to execute
    until the fork() call in the parent process had returned - that is, until
    it had assigned the child process number to the global variable intended to
    hold it. To do that, blocking with sigprocmask() immediately before the
    fork() call and unblocking (in the parent) immediately after achieves the
    effect wanted. This queues the signal until the SIGCHLD handler is able to
    deal with it.

    The issue could also be dealt with with pipes, by blocking the _exit() or
    exec() call made by the child process and so delaying the event
    (termination of the child process) giving rise to the SIGCHLD signal, and
    thank you for the suggestion. As a means of dealing with my particular
    problem this is probably marginally less attractive as it means, where an
    exec() call is to be made by the child process, the execution of the new
    program in the child process can be delayed, perhaps unnecessarily, and in
    the case of blocking before an _exit() call may, I suppose, give rise to an
    unnecessary context switch by preventing the child process from finishing
    until the parent process had started executing after the fork(). However,
    this all seems pretty marginal, I agree.

    Chris.

    --
    To reply by e-mail, remove the "--nospam--" in the address
    Chris Guest

  12. #12

    Default Re: fork() race in SIGCHLD handler

    > ... the test code below exhibits a timing race on fork() in Linux ...

    To fix the race, implement fork() using the NPTL version of clone:
    int clone(int (*fn)(void *arg), void *child_stack, int flags, void *arg,
    pid_t *ptid, struct user_desc *tls, pid_t *ctid);
    With suitable flags, the kernel sets *ptid and *ctid before returning
    from the syscall.

    Hint: as revealed by strace, fork() in glibc-2.3.2 on RH9 (2.4.20-27.9) uses:
    clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGC HLD,
    <ignored>, <ignored>, 0x4002f0c8)
    and the only tricky task is finding a suitable fn. Consult the glibc
    source in sysdeps/unix/sysv/linux/i386/clone.S .

    John Guest

  13. #13

    Default Re: fork() race in SIGCHLD handler

    On Wed, 31 Dec 2003, Chris Vine wrote:
     

    OK - but returning from a system call is one of the places where
    the kernel usually checks if there's a signal pending. In the
    case where you're having problems, the signal has been delivered
    before you're ready. But given that fork() doesn't guarantee
    which executes first (parent or child), you have to expect, and
    work around, this sort of thing.
     

    Yes, that's another way of doing it.

    --
    Rich Teer, SCNA, SCSA . * * . * .* .
    . * . .*
    President, * . . /\ ( . . *
    Rite Online Inc. . . / .\ . * .
    .*. / * \ . .
    . /* o \ .
    Voice: +1 (250) 979-1638 * '''||''' .
    URL: http://www.rite-online.net ******************
    Rich Guest

  14. #14

    Default Re: fork() race in SIGCHLD handler

    On Wed, 31 Dec 2003, John Reiser wrote:
     
    >
    > To fix the race, implement fork() using the NPTL version of clone:[/ref]

    Why? That makes the OP's code needlessly Linux specific.

    --
    Rich Teer, SCNA, SCSA . * * . * .* .
    . * . .*
    President, * . . /\ ( . . *
    Rite Online Inc. . . / .\ . * .
    .*. / * \ . .
    . /* o \ .
    Voice: +1 (250) 979-1638 * '''||''' .
    URL: http://www.rite-online.net ******************
    Rich Guest

  15. #15

    Default Re: fork() race in SIGCHLD handler

    On Wed, 31 Dec 2003 22:02:30 +0000 Chris Vine <freeserve.co.uk> wrote: 
    >
    > The problem was not really a "who runs first" one (although the problem can
    > be solved by ordering "who runs first" using pipes, as you suggest).
    >
    > My problem was that the SIGCHLD signal handler was not supposed to execute
    > until the fork() call in the parent process had returned - that is, until[/ref]
    ....

    IOW, the classic who-runs-first problem. fork() doesn't return until
    said process (parent or child) is running. You need for the parent to
    run first, to set the global pid var.

    Classic problems deserve classic solutions.
     

    Not to be harsh, but doesn't that seem ridiculous? And there's only
    one point where you'd block, immediately after fork().

    a) If latency is that important, you should pre-fork the children.
    b) If you agree that it's marginal, you should stick with the standard
    solution.

    int p[2];
    ....
    if (!pipe(p))
    perror() ...

    gPid=fork();
    if (gPid > 0) {
    /* parent */
    (void) close(p[0]); /* cleanup */
    (void) close(p[1]); /* unblock child */
    ...

    } else if (gpid == 0) {
    (void) close(p[1]);
    /* block on parent */
    while (read(p[0], &p[1], 1) < 0) {
    if (errno != EINTR) {
    perror(); ...
    break;
    }
    }
    close(p[0]);
    ...

    } else {
    perror(); ...
    }

    For systems where the child runs first, you'll have a context switch back
    the parent. For systems where the parent runs first, you'll incur no such
    penalty. EVEN IF you are writing ONLY FOR LINUX, you cannot count on the
    child running first! It has changed in the past and will change again.
    Writing your app to depend on the current (undoented, and for good
    reason) behavior is a mistake.

    /fc
    Frank Guest

  16. #16

    Default Re: fork() race in SIGCHLD handler

    On Wed, 31 Dec 2003 15:19:50 -0800 John Reiser <com> wrote: 
    >
    > To fix the race, implement fork() using the NPTL version of clone:
    > int clone(int (*fn)(void *arg), void *child_stack, int flags, void *arg,
    > pid_t *ptid, struct user_desc *tls, pid_t *ctid);
    > With suitable flags, the kernel sets *ptid and *ctid before returning
    > from the syscall.[/ref]

    That's sick. I don't mean that in the "good" way.

    /fc
    Frank Guest

  17. #17

    Default Re: fork() race in SIGCHLD handler

    On Thu, 01 Jan 2004 01:22:33 GMT Rich Teer <com> wrote: 
    >>
    >> To fix the race, implement fork() using the NPTL version of clone:[/ref]
    >
    > Why? That makes the OP's code needlessly Linux specific.[/ref]

    And sensitive to an API which is explicitly stated to be in constant flux.

    /fc
    Frank Guest

  18. #18

    Default Re: fork() race in SIGCHLD handler

    Frank Cusack wrote: 
    >>
    >>Why? That makes the OP's code needlessly Linux specific.[/ref][/ref]

    On a Linux system using NPTL clone(), it is possible for the programmer to
    guarantee that a SIGCHLD handler has an inexpensive way to know in advance
    and check the pid of every exiting child process. On which other *NIX-like
    system(s) is this possible?
     

    Now that Linux 2.6.0 is out, the API is no longer in constant flux.
    The probability of a change that is not backwards compatible is very small
    [bug fixes excepted, of course.]
    The API is the default in RedHat 9, RedHat Advanced Server 3.0, Fedora Core 1,
    SuSE 9.0, and others. There are several million machines using it today.

    --

    John Guest

  19. #19

    Default Re: fork() race in SIGCHLD handler

    Frank Cusack wrote:
     

    [snip]
     
    >
    > Not to be harsh, but doesn't that seem ridiculous? And there's only
    > one point where you'd block, immediately after fork().[/ref]

    [snip]

    I do not know what point you are trying to make, but it simply isn't true
    that you would only block after the fork() in the case to which I was
    referring. To deal with the issue which started this series of postings,
    if you were to choose to use a pipe to deal with it, it wouldn't matter
    where you blocked provided it was before the _exit() call in the child
    process. Nothing in the child process depended on the state of the parent
    process.
     

    I really don't know what you are talking about. Using sigprocmask() is
    neither undoented nor a mistake, and doesn't rely on the order in which
    the child and parent run as you suggest (it makes the issue irrelevant).
    You may need to re-read the post to which you thought you were replying.

    Chris.

    --
    To reply by e-mail, remove the "--nospam--" in the address
    Chris Guest

Similar Threads

  1. SIGINT, SIGCHLD, and free()
    By Aaron in forum UNIX Programming
    Replies: 5
    Last Post: November 25th, 12:03 PM
  2. Pthreads and SIGCHLD
    By Chris in forum UNIX Programming
    Replies: 5
    Last Post: September 11th, 08:57 PM
  3. DataGrid's UpdateCommand event handler and CancelCommand handler problem
    By David Mans in forum ASP.NET Data Grid Control
    Replies: 0
    Last Post: July 29th, 05:52 AM
  4. Fast signal handler switching & thread-specific handler.
    By Paul Pluzhnikov in forum UNIX Programming
    Replies: 0
    Last Post: July 18th, 03:05 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139