Tuesday, January 6, 2009

c: to exit or to _exit?

There are two functions that allow you to legally terminate your program: exit and _exit.

Both of them immediately terminate the process, close file descriptors, flush all open streams with unwritten buffered data and close all of them, send SIGCHLD signal and return exit code to the parent process if it has no set SA_NOCLDWAIT, or has not set the SIGCHLD handler to SIG_IGN.
If the process is a session leader and its controlling terminal is the controlling terminal of the session, then each process in the foreground process group of this controlling terminal is sent a SIGHUP signal, and the terminal is disassociated from this session, allowing it to be acquired by a new controlling process.

The main difference is that exit calls all functions registered with atexit and on_exit while _exit does not.
Also the threads terminated by a call to _exit() shall not invoke their cancellation cleanup handlers or per-thread data destructors.

It's worth to note that using return from main function has the same behavior as calling exit with the returned value.

How is user or developer is affected by the difference between these calls?
First of all the differences become significant when we are talking about the processes that call fork(or any other routine to create child thread or process).

I read about the side effects of calling exit from the child that caused temporary files unexpectedly removed. I haven't ever been affected by this when I used exit from the child process.
Though I'm trying to use _exit from the child processes.
This sample illustrates the absence of this possible effect at least on my system.

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>

int main(int argc, char **argv)
{
    FILE *tmp = tmpfile();

    if (fork() == 0)
    {   
        fprintf(tmp, "Hello from child<%d>!\n", getpid());
        fflush(tmp);
        exit(0);
    }   
    else
    {   
        sleep(1);
        wait(NULL);

        char msg[256];
        fseek(tmp, 0, SEEK_SET);
        fgets(msg, 256, tmp);
        printf("message: %s\n", msg);
    }   
            
    return 0;
}
$./exit 
message: Hello from child<16489>!
The things go worse when we are talking about c++.

Destructors of global and static data are being called on at_exit stage.
After you have called the constructor of global or static object GCC automatically calls the function
int __cxa_atexit(void (* f)(void *), void *p, void *d);
Where f is a function-pointer to the destructor, p is the parameter for the destructor and d is the "home DSO" (DSO = dynamic shared object). When the process exits
void __cxa_finalize(void *d);
should be called with d = 0 in order to destroy all with __cxa_atexit registered objects.

Let's look at some samples that show side effects of calling _exit or exit from both child and parent processes.
Say we have class A that is stored in the global memory region.
#include <iostream>

using namespace std;

class A
{
    public:
        A(){cout << "A(), " << getpid() << endl;}
        ~A(){cout << "~A(), " << getpid() << endl;}
};

A a;
  • The first case is when both child and parent call exit to terminate themselves.
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            cout << "child, " << getpid() << endl;
            exit(0);
        }   
        else
        {   
            cout << "parent, " << getpid() << endl;
            sleep(1);
            wait(NULL);
            exit(0);
        }   
                
        return 0;
    }
    The output might be
    $./exit
    A(), 17186
    child, 17187
    ~A(), 17187
    parent, 17186
    ~A(), 17186
    You see that the object was constructed once but the destructor was called twice from the child and parent processes. This can cause really bad things.
  • The second case is when the child calls exit and the parent calls _exit.
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            cout << "child, " << getpid() << endl;
            exit(0);
        }   
        else
        {   
            cout << "parent, " << getpid() << endl;
            sleep(1);
            wait(NULL);
            _exit(0);
        }   
                
        return 0;
    }
    The output might be
    $./exit 
    A(), 17212
    parent, 17212
    child, 17213
    ~A(), 17213
    The constructor was called once from the parent process and destructor was called once but from the child process. You may not notice the effect unless you use object in the parent process. In spite of this it's still dangerous. But this could be used if your forked process is going to become a daemon and parent is no longer needed and will be immediately terminated.
  • The third case is when both child and parent call _exit.
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            cout << "child, " << getpid() << endl;
            _exit(0);
        }   
        else
        {   
            cout << "parent, " << getpid() << endl;
            sleep(1);
            wait(NULL);
            _exit(0);
        }   
                
        return 0;
    }
    The output might be
    $./exit 
    A(), 17221
    parent, 17221
    child, 17222
    Here constructor was called once but no calls of destructor. This could be fine. But if you are doing something significant in destructor such as closing connections, deleting temporary files, etc. you may run into the trouble.
  • An the last one, that should be correct, is when the child calls _exit and the parent calls exit.
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            cout << "child, " << getpid() << endl;
            _exit(0);
        }   
        else
        {   
            cout << "parent, " << getpid() << endl;
            sleep(1);
            wait(NULL);
            exit(0);
        }   
                
        return 0;
    }
    The output should be
    $./exit 
    A(), 17234
    child, 17235
    parent, 17234
    ~A(), 17234
    The parent process has created the object and it has destroyed it. The best behavior you can expect.
The behavior changes a bit if you are using vfork instead of fork. The behavior differ with
int main(int argc, char **argv)
{

    if (vfork() == 0)
    {   
        cout << "child, " << getpid() << endl;
        exit(0);
    }   
    else
    {   
        cout << "parent, " << getpid() << endl;
        sleep(1);
        wait(NULL);
        exit(0);
    }   
            
    return 0;
}
The output might look like
$./exit 
A(), 17321
child, 17322
~A(), 17322
parent, 17321
This is similar to the code where the child called exit and the parent called _exit. This is the expected behavior of vfork. With vfork the new process is being created without copying the page tables of the parent process, the parent and child use the same memory pages. The usage of vfork is dangerous first of all. And in the modern systems that use COW technique for forked processes you may not feel the performance reduction.

You can't be 100% sure that you are not using c++ code in your project that could define global variables somewhere. It could be third-party library in your project that uses static or global objects(the singletons, depending on the implementation, could be affected also). So the best practice is to use _exit to return from the child process and use exit(or return from main) to exit from the parent.

No comments: