stupefy developer

Friday, September 10, 2010

event handling with message queue on top of pipes: part 2

The previous part stopped just before going into the details of sending and receiving mechanism.
It's implied that each sent message should receive an ACK from the handler when it was processed. This piece of information could be embedded in the handler response or sent back as a dedicated message if there's no answer expected. ACK should indicate that the handler finished processing the message. The sender can indicate if the ACK is not needed at all.
On one hand this kind of information could be used by sender to conclude whether message processing is complete. On other hand it's possible to statistically conclude how much time it takes the handler to process specific message. Though this information is not used in current rough implementation it could be accounted for probabilistic scheduling.

The prototype of routine for sending message is straightforward:

mq_cookie
mq_enqueue_request(struct mq *mq,
                   struct mq_request *req,
                   int broadcast)

It accepts message queue descriptor as a first argument and message itself as a second. Third argument states whether the message should be delivered to all handlers or just to one. The function return a cookie which could be used by caller to obtain the answer(or ACK) from the handler.
If the request is not a broadcast mq_enqueue_request searches for the handler with the shortest message backlog. Also this routine is trying to find a handler writing to whose pipe will not block. However if all pipes are full this would be a blockable operation.

Aforementioned poll thread is used to avoid blocks in handlers on writing the responses(or ACKs) back. The thread is iterating over handlers' pipes gathering the responses and putting them into the list. Each response has its lifetime after which the resources would be freed.
In kernels prior to 2.6.35 there was no way to configure pipe length however now it possible via F_SETPIPE_SZ fcntl(the max size for non-root owners is limited by /proc/sys/fs/pipe-max-size). Since this thread is fetching messages from pipes on regular basis there's no need to set big pipe size. Current upper bound of 1MB is big enough to hold thousands of short messages and this could be limited to few KB unless big messages are not required.

Collecting responses into a dedicated list is also needed for obtaining the responses by a caller
since pipes are not iterative. Once data read from the pipe it's not possible to write it back(to be more precise it's possible but for iterating over the pipe all messages should be read and then written back - the overhead is to big in this case and also requires some synchronization mechanism to guard the write end of the pipe). mq_get_response is used to fetch the response from the global list:

struct mq_response *
mq_get_response(struct mq *mq,
                mq_cookie cookie)

Second argument, cookie, is a response identifier returned by mq_enqueue_request. If the message is not in the list of the already fetched responses this routine will return NULL. At the moment there is no way to say whether the response is still on the way or it was already purged.

Following example illustrates the flow of sending messages and receiving responses:

    int id;
    for (id=0;id<10000;++id) {
        struct mq_list *e, *t;
        struct mq_cookie_list *entry;
        mq_cookie cookie = mq_enqueue_request(&mq,
                                              mq_request(&req,
                                                         MQ_EVENT_REQUEST,
                                                         ((unsigned long long)pid << 32) | id,
                                                         timems(),
                                                         "",
                                                         0),
                                              0);

        mq_list_for_each_safe(&cookies.list, e, t) {
            entry = container_of(e, struct mq_cookie_list, list);

            if ((rsp = mq_get_response(&mq, entry->cookie))) {
                /* Do something with the response */
                
                mq_release_response(rsp);
                mq_list_detach(&entry->list);
                free(entry);
            }
        }

        if ((rsp = mq_get_response(&mq, cookie))) {
            /* Do something with the response */
            
            mq_release_response(rsp);
        } else {
            struct mq_cookie_list *entry = malloc(sizeof(struct mq_cookie_list));
            entry->cookie = cookie;
            mq_list_add_tail(&cookies.list, &entry->list);
        }
    }

Basic handler should deal with two kinds of events: MQ_EVENT_EXIT and MQ_EVENT_REQUEST. The following function could be a good skeleton for a message handler.

void
handler(int endpoints[2])
{
    struct mq_request req;
    struct mq_response rsp;
    unsigned int served = 0, id = 0;
    unsigned int pid = getpid();

    printf("Event Handler %u\n", pid);

    for (;;) {
        mq_get_request(&req, endpoints);

        ++served;

        if (req.event & MQ_EVENT_EXIT) {
                printf("Event Handler %u: exiting, request %u:%u, served %u\n", pid, (unsigned int)(req.id >> 32), (unsigned int)(req.id & 0xffffffff), served);
                mq_free_request(&req, endpoints);
                return;
        } else if (req.event & MQ_EVENT_REQUEST) {
            if (req.data.len) {
                printf("Event Handler %u: request %u:%u, %s\n", pid, (unsigned int)(req.id >> 32), (unsigned int)(req.id & 0xffffffff), (char *)req.data.data);
            }

            mq_send_response_free_request(&req,
                                          mq_response(&rsp,
                                                      &req,
                                                      ((unsigned long long)pid << 32) | id++,
                                                      timems(),
                                                      &pid,
                                                      4),
                                          endpoints);
        } else {
            printf("Event Handler %u: unknown event: %d", pid, (int)req.event);
            mq_free_request(&req, endpoints);
        }
    }
}

The sources of pipe message queue could be found at GitHub.

Friday, June 4, 2010

urxvt: 256 colors in ubuntu

Unfortunately Ubuntu has rxvt-unicode package without full 256 colors support.

The easiest way is to rebuild the package manually.

Following steps describe how to do that easily:

Get urxvt sources:

$ apt-get source rxvt-unicode
$ cd rxvt-unicode-9.06

Apply 256 color patch that could be already found in the package:

$ patch -p1 < doc/urxvt-8.2-256color.patch

Edit debian build rules file to reflect the changes from the patch. In debian/rules find definition of cfgcommon and replace

--with-term=rxvt-unicode

with

--with-term=rxvt-256color --enable-xterm-colors=256

Now you should checkout build dependencies and build the package:

$ sudo apt-get build-dep rxvt-unicode
$ dpkg-buildpackage -us -uc

Once the build process had been finished, install the resulting deb package:

$ sudo dpkg -i rxvt-unicode_9.06-1ubuntu0.09.10.1_i386.deb

At this point it's possible to delete build dependencies

$ sudo aptitude markauto $(apt-cache showsrc rxvt-unicode | grep Build-Depends: | sed -e 's/Build-Depends:\|,\|([^)]*)//g')

Don't forget to configure .Xresources to make use of 256 colors in urxvt. For instance mine looks like:

URxvt.termName:             rxvt-256color

Thursday, March 18, 2010

event handling with message queue on top of pipes: part 1

Event-driven program flow is common for interactive systems. Events usually come into the system in non-deterministic way. Also it's not always known how much time handling event would take. Moreover events for processing could appear while current one is being processed. Thereby it's better to handle events in parallel to aviod pilling up of pending events.
Some message interchange mechanism should be used to transmit events between the main dispatcher and handlers.
Exist different implementation of message queues but here I'd like to invent the wheel and show that there's still place for imagination.

In the message queue implementation pipe is used as a transmission path for messages.
Pipe is a generic mechanism to exchange data between the processes. It is a unidirectional channel, a bridge, between two instances of the process. One process is able to write data into the write end of the pipe while another one is able to read the written data from the read end. In most cases there is a writer on one side and a reader on the other.
The concept of [one reader]/[one writer] could be extended to [multiply readers]/[multiply writers] but this will complicate things. If keep things simple it would be necessary to create two separate pipes for message queue for one handler: one for sending requests from dispatcher's side and one for sending responses from handler's side.

Unlike FIFO(a named pipe visible in userspace within the filesystem as a special file), pipe created by a process is visible only by children of the process(and by the process itself of course). This effect is a consequence of how pipe is being represented: a file descriptor for each end.
Pipe could be created by calling pipe routine from the C standard library:

int pipe(int pipefd[2]);

Two file descriptors ensue from calling pipe: one is referring to the read end of the pipe, the other one is referring to the write end. In the most cases this call directly invokes a system call which has the same name.
From linux kernel sources it's obvious that the control is immediately passed from sys_pipe to sys_pipe2 system call:

SYSCALL_DEFINE2(pipe2, int __user *, fildes, int, flags)
{
 int fd[2];
 int error;

 error = do_pipe_flags(fd, flags);
...
}
SYSCALL_DEFINE1(pipe, int __user *, fildes)
{
 return sys_pipe2(fildes, 0);
}

Currently the flags are out of our interest that is why sys_pipe is used which uses zero for flags argument in sys_pipe2.
For us the most interesting part is hidden in do_pipe_flags routine. The part we are interested in is highlighted below:

int do_pipe_flags(int *fd, int flags)
{
 struct file *fw, *fr;
...
 fw = create_write_pipe(flags);
...
 fr = create_read_pipe(fw, flags);
...
}

Before going further I would like to make a small remark on how pipes are managed inside the linux kernel. The kernel manages pipes as files on pseudo filesystem pipefs. This filesystem could not be mounted from the userspace because it has MS_NOUSER flag but it is visible from kernel under pipe: 'mountpoint':

static struct vfsmount *pipe_mnt;
...
static int pipefs_get_sb(struct file_system_type *fs_type,
    int flags, const char *dev_name, void *data,
    struct vfsmount *mnt)
{
 return get_sb_pseudo(fs_type, "pipe:", NULL, PIPEFS_MAGIC, mnt);
}

static struct file_system_type pipe_fs_type = {
 .name  = "pipefs",
 .get_sb  = pipefs_get_sb,
 .kill_sb = kill_anon_super,
};

static int __init init_pipe_fs(void)
{
 int err = register_filesystem(&pipe_fs_type);

 if (!err) {
  pipe_mnt = kern_mount(&pipe_fs_type);
  ...
}

In do_pipe_flags routine create_write_pipe allocates inode on pipefs with the help of get_pipe_inode which in order calls alloc_pipe_info:

static struct inode * get_pipe_inode(void)
{
 struct inode *inode = new_inode(pipe_mnt->mnt_sb);
 struct pipe_inode_info *pipe;
...
 pipe = alloc_pipe_info(inode);
...
}

Besides initializing pipe-specific data alloc_pipe_info creates a waitqueue which is someway of interest for message queue(as would be shown below):

struct pipe_inode_info * alloc_pipe_info(struct inode *inode)
{
...
  init_waitqueue_head(&pipe->wait);
...
 return pipe;
}

Returning back to create_write_pipe the kernel creates new file with write-only flags and write_pipefifo_fops file operations:

const struct file_operations write_pipefifo_fops = {
 .llseek  = no_llseek,
 .read  = bad_pipe_r,
 .write  = do_sync_write,
 .aio_write = pipe_write,
...
 .open  = pipe_write_open,
...
};
...
 f = alloc_file(pipe_mnt, dentry, FMODE_WRITE, &write_pipefifo_fops);
...
 f->f_flags = O_WRONLY | (flags & O_NONBLOCK);

This is write end of the pipe.
create_read_pipe used for creation of read end of the pipe reuses the job done by write_pipefifo_fops:

struct file *create_read_pipe(struct file *wrf, int flags)
{
 struct file *f = get_empty_filp();
...
 /* Grab pipe from the writer */
 f->f_path = wrf->f_path;
 path_get(&wrf->f_path);
 f->f_mapping = wrf->f_path.dentry->d_inode->i_mapping;

 f->f_pos = 0;
 f->f_flags = O_RDONLY | (flags & O_NONBLOCK);
 f->f_op = &read_pipefifo_fops;
...
}

inode mapping is unsurprisingly used from the write end but file operations differ:

const struct file_operations read_pipefifo_fops = {
 .llseek  = no_llseek,
 .read  = do_sync_read,
 .aio_read = pipe_read,
 .write  = bad_pipe_w,
...
 .open  = pipe_read_open,
...
};

Then with the help of get_unused_fd_flags file descriptors that would be returned into the userspace upon successful completion of sys_pipe system call are being created.

There's no more need to stay in kernel mode for the moment. Let's return to the userspace implementation of message queue.

The basic structures used to form message requests and responses are divided into header and payload parts:

struct mq_data {
    unsigned long len;
    char *data;
} __attribute__((packed));

struct mq_request {
    unsigned char event; // event type
    unsigned long long id; // request id
    unsigned long time; // issue time
    struct mq_data data; // payload
} __attribute__((packed));

struct mq_response {
    unsigned char event; // event type
    unsigned long long req_id; // request id to which the response belongs
    unsigned long long rsp_id; // response id
    unsigned long time; // issue time
    struct mq_data data; // payload
} __attribute__((packed));

struct mq_data represents message payload and generic for request and response. The comments descriptive enough. Worth to mention which event type are predefined in current implementation:

enum mq_event {
    MQ_EVENT_REQUEST            = (1 << 0), // generic request to the client
    MQ_EVENT_REQUEST_NO_ACK     = (1 << 1), // request that doesn't require ACK from the client
    MQ_EVENT_RESPONSE           = (1 << 2), // response from the client
    MQ_EVENT_ACK                = (1 << 3), // ACK to the request that it was processed
    MQ_EVENT_EXIT               = (1 << 4), // call to terminate the client
};

The principle of operation is pretty simple. On initialization stage the core process launches subprocesses that will handle messages, additionally it launches a thread that will receive responses from the handlers:

#define HANDLERS 3
struct mq mq;
struct mq_handler handlers[HANDLERS];
int main(int argc, char **argv)
{
    int i;
....
    mq_init(&mq);
....
    for (i=0;i<HANDLERS;++i)
        if (mq_launch_handler(&mq, &handlers[i], handler) != 0) {
            fprintf(stderr, "Failed to launch handler\n");
            goto out;
        }
....

The structure struct mq drawn above defines core message queue. The definition is as follows:

struct mq {
    struct mq_list handlers;  // queue handlers
    struct mq_lock lock;      // management lock

    short state;              // state of the queue

    pthread_t rsp_manager;        // thread handle for managing responses
    struct mq_list rsp_list;  // list of responses
    pthread_mutex_t rsp_lock; // lock for the list of responses
};

struct mq_list is a generic doubly-linked list implementation. I will not bother to discuss it here. Locking primitive struct mq_lock is built upon POSIX mutex in conjunction with simple reference counting.
The main job of the response handling thread mentioned as above is to fetch packets from pipe, do some preprocessing mostly needed for finer scheduling and put into local queue for later usage.
struct mq_handler contains file descriptors for the pipes and few fields for collecting statistic. The definition as follows:

struct mq_handler {
    struct mq_list list;
    struct mq_lock lock;

    int pipein[2]; // 0-> read end for obtaining request, 1-> write end for issueing request
    int pipeout[2]; // 0-> read end for obtaining response, 1-> write end for issueing response
    int process; // handler PID

    unsigned long pushed, // amount of pushed requests
        popped; // amount of popped responses
    unsigned nacked; // not acked packets
};

mq_init appeared above is used to initialize message queue structure and launch management thread:

void
mq_init(struct mq *mq)
{
    mq_list_init(&mq->handlers);
    mq_list_init(&mq->rsp_list);

    mq_lock_init(&mq->lock);

    pthread_mutex_init(&mq->rsp_lock, NULL);

    mq->state = MQ_STATE_STARTING;

    pthread_create(&mq->rsp_manager, NULL, mq_poll_wrapper, mq);

    mq->state = MQ_STATE_RUNNING;
    mq_set_available(&mq->lock, 1);
}

This routine initializes miscellaneous stuff and announces that it's ready to go. mq_poll_wrapper triggers mq_poll that is used to fetch responses from the pipe and cleans 'old' responses that are likely will not be needed anymore because their timeout exhausted and to avoid memory leak memory they use should be freed:

void *
mq_poll_wrapper(void *data)
{
    struct mq *mq = (struct mq *)data;

    unsigned int last_cleaned = timems(), now;

    for (;;) {
        mq_lock_lock(&mq->lock);
        if (mq->state == MQ_STATE_STOPPING) {
            mq_lock_unlock(&mq->lock);

            return NULL;
        } else if (mq->state == MQ_STATE_STARTING) {
            mq_lock_unlock(&mq->lock);

            release_cpu();

            continue;
        }

        mq_lock_unlock(&mq->lock);

        mq_poll(mq);

        now = timems();
        if (now - last_cleaned > MQ_MAX_RESPONSE_LIFE_TIME_MS) {
            last_cleaned = now;

            mq_cleanup_response_queue(mq, now);
        }
    }

    return NULL;
}

mq_launch_handler is a helper routine that creates pipe endpoints for transmitting requests and responses and launch request handling process. Its definition as follows:

int
mq_launch_handler(struct mq *mq,
                  struct mq_handler *handler,
                  void (*f)(int endpoints[2]))
{
    int endpoints[2];

    if (pipe(handler->pipein) == -1) {
        perror("pipe");

        goto fail_pipein;
    }
    if (pipe(handler->pipeout) == -1) {
        perror("pipe");

        goto fail_pipeout;
    }

    endpoints[0] = handler->pipein[0];
    endpoints[1] = handler->pipeout[1];

    if ((handler->process = launch_process(f, endpoints)) == -1)
        goto fail_launch;

    mq_lock_init(&handler->lock);

    handler->nacked = handler->popped = handler->pushed = 0;

    mq_lock_lock(&mq->lock);
    mq_list_add_tail(&mq->handlers, &handler->list);
    mq_lock_unlock(&mq->lock);

    mq_set_available(&handler->lock, 1);

    return 0;

  fail_launch:
    close(handler->pipeout[0]);
    close(handler->pipeout[1]);
  fail_pipeout:
    close(handler->pipein[0]);
    close(handler->pipein[1]);
  fail_pipein:

    return -1;
}

Now everything is ready for sending requests and receiving responses. Further details will be discussed in the following part.

Tuesday, September 29, 2009

c/c++: call stack v.2

In previous c/c++: call stack article I wrote about obtaining function call stack. The method described in the article is good enough but it is linux-specific. There's no similar solutions out-of-box in other *nix-like operation systems as far as I know.
This time I would like to discuss more generic way. At list it is available for the code compiled with gcc.

gcc provides two built-in functions which could be used to obtain function call stack: __builtin_return_address and __builtin_frame_address. With __builtin_return_address it's possible to obtain return address of the current function, or of one of its callers. __builtin_frame_address returns the address of the function frame.

Both functions require constant argument - the number of frames to scan up.

To store function call struct call structure is used defined as following:

struct call {
 const void *address;
 const char *function;
 const char *object;
};

With __builtin_frame_address it is checked if the top of the stack has been reached - in this case aforementioned built-in returns zero. In the loop the return value of this function is compared with zero and the loop is terminated if this expression in compare statement turns into true - the top of the call stack has been reached.
Finally __builtin_return_address is used to get the return address of the function.

The resulting function to get the backtrace looks like:

#define _GNU_SOURCE
#include <dlfcn.h>

int backtrace(struct call trace[], int maxlen)
{
 Dl_info dlinfo;
 unsigned int i;

 for (i=0;i<maxlen;++i) {
  switch (i) {
   case 0:
    if(!__builtin_frame_address(0))
     return i;
    trace[i].address = __builtin_return_address(0);
    break;
   case 1:
    if(!__builtin_frame_address(1))
     return i;
    trace[i].address = __builtin_return_address(1);
    break;
   case 2:
    if(!__builtin_frame_address(2))
     return i;
    trace[i].address = __builtin_return_address(2);
    break;
 /* SNIP */
 /* .... */
 /* SNIP */
   case 63:
    if(!__builtin_frame_address(63))
     return i;
    trace[i].address = __builtin_return_address(63);
    break;
   default:
    return i;
  }

  if (dladdr(trace[i].address, &dlinfo) != 0) {
   trace[i].function = dlinfo.dli_sname;
   trace[i].object = dlinfo.dli_fname;
  }
 }

 return i;
}

backtrace routine fills trace array with the call stack information and returns the depth of the function calls.

Following small example shows backtrace in action:

#include <stdio.h>

#define CALLSTACK_MAXLEN 64

void f0()
{
 struct call trace[CALLSTACK_MAXLEN];
 int i;
 int depth;

 depth = backtrace(trace, CALLSTACK_MAXLEN);

 for (i=0;i<depth;++i)
  printf("%s: %s(%p)\n", trace[i].object, trace[i].function, trace[i].address);
}

void f1()
{
 f0();
}

void f2()
{
 f1();
}

void f3()
{
 f2();
}

void f4()
{
 f3();
}


int main(int argc, char **argv)
{
 f4();

 return 0;
}

Again the application should be compiled with -rdynamic gcc flag needed for dladdr function and linked with dl library for the same purpose:

gcc backtrace.c -o backtrace -ldl -rdynamic

After execution the program should provide the following output:

./backtrace 
./backtrace: f0(0x804b11c)
./backtrace: f1(0x804b1ac)
./backtrace: f2(0x804b1b9)
./backtrace: f3(0x804b1c6)
./backtrace: f4(0x804b1d3)
./backtrace: main(0x804b1e0)
/lib/libc.so.6: __libc_start_main(0x6ff58a9e)

Though it's gcc-specific method this compiler is available for most platforms and operation systems and is used almost everywhere as a default one.

Monday, September 21, 2009

c: double exclamation

Linux kernel is full of fascinating code. Not all of it is good but it's possible to find something interesting.
While reading the kernel code I've seen a lot of conditional expressions that contain double exclamations. Something like:

if (!!smth) {...}

I was curious what's the purpose of such expression.
And I've found that mostly this is used to make compiler happy and quite. Double exclamation or simply '!!' turns the expression value into binary representation: either '1' or '0'. Everything that could be treated as true results into '1' otherwise - '0'.
The simplest example to show it in action could be:

int i;

for (i = 0; i < 5; i++)
    printf("%d: %d\n", i, !!i);

This will print to the output:

0: 0
1: 1
2: 1
3: 1
4: 1

So, which compiler warnings this could help to suppress? That's very easy. Let's assume there's a function like following:

int function(void)
{                                                                                                     void *ptr;
    /* Useful calculations */
    return ptr;
}

The compiler, at least gcc, will warn that return expression makes integer from pointer without a cast. If function should return '1' on success and '0' on failure, double exclamation fits very well:

int function(void)
{                                                                                                     void *ptr;
    /* Useful calculations */
    return !!ptr;
}

Besides this simple example the double exclamation expression has indeed wide range of application.

Wednesday, July 22, 2009

linux: execution of the binary

In previous post I've touched internals of fork system call in Linux kernel.
Here I want to make the picture complete with describing execve system call.
execve is a system call that substitutes current process image with new one constructed from the executable binary. Usually execve supplements fork: with fork program creates new process and with execve it loads new image. This allows former continue to run in parallel with new program. In other case there was no opportunity to run different binaries at one time.
When userspace program calls one of the exec family defined by POSIX:

int execl(const char *path, const char *arg0, ... /*, (char *)0 */);
       int execv(const char *path, char *const argv[]);
       int execle(const char *path, const char *arg0, ... /*,
              (char *)0, char *const envp[]*/);
       int execve(const char *path, char *const argv[], char *const envp[]);
       int execlp(const char *file, const char *arg0, ... /*, (char *)0 */);
       int execvp(const char *file, char *const argv[]);

c library will finally execute execve system call:

int execve(const char *filename, char *const argv[], char *const envp[]);

execve system call will result into calling do_execve from fs/exec.c in the kernel mode.
At first do_execve checks if it's safe to execute this binary - checks process credentials. If it's safe to run the program do_execve opens the file and fills linux_binprm structure. This structure holds enough information needed to execute binary:

struct linux_binprm{
 ....
#ifdef CONFIG_MMU
 struct vm_area_struct *vma;
#else
# define MAX_ARG_PAGES 32
 struct page *page[MAX_ARG_PAGES];
#endif
 struct mm_struct *mm;
 unsigned long p;  /* current top of mem */
 ....
 struct file * file;
 struct cred *cred; /* new credentials */
 ....
 int argc, envc;
 char * filename; /* Name of binary as seen by procps */
 char * interp;  /* Name of the binary really executed. Most
       of the time same as filename, but could be
       different for binfmt_{misc,script} */
 ....
};

With int bprm_mm_init(struct linux_binprm *bprm) help new mm_struct is being initialized and, for example, for x86 arch new LTD(Local Descriptor Table) is being written. Architecture independent code allocates space for struct vm_area_struct and fills fields with appropriate values - area for the stack, etc. Worth to note that do_execve tries to migrate task to other processor on SMP system if balancing needed. do_execve is a good place to do load balancing task - the task has smallest memory and cache footprint.
When linux_binprm is filled with various values gathered from the current task and executable file metadata, do_execve tries to find proper handler with int search_binary_handler(struct linux_binprm *bprm, struct pt_regs *regs) which will finish setup of the new image. It iterates over all registered binary format handler until the suitable is found. In the loop load_binary callback from linux binary format handler is used to probe the image. If this callback finds the image suitable for this format handler the outer loop will stop the iteration.
load_binary is responsible not only for probing the image but also for loading it and setting up environment. It gets all needed information through the function arguments to do that.
load_binary in binary format handler is responsible for loading(mapping) image into memory and all stuff needed for the normal program execution(relocation, etc.), setting up stack, bss and environment: put argv and envp arrays onto userspace stack. Finally it calls start_thread which sets up process' registers with new values of stack pointer, program execution point, arguments for main function of the executable etc. Here how it looks for arm architecture:

#define start_thread(regs,pc,sp)                                \
 ({                                                             \
  unsigned long *stack = (unsigned long *)sp;                   \
  set_fs(USER_DS);                                              \
  memset(regs->uregs, 0, sizeof(regs->uregs));                  \
  if (current->personality & ADDR_LIMIT_32BIT)                  \
   regs->ARM_cpsr = USR_MODE;                                   \
   regs->ARM_cpsr = USR26_MODE;                                 \
  if (elf_hwcap & HWCAP_THUMB && pc & 1)                        \
   regs->ARM_cpsr |= PSR_T_BIT;                                 \
  regs->ARM_cpsr |= PSR_ENDSTATE;                               \
  regs->ARM_pc = pc & ~1;  /* pc */                             \
  regs->ARM_sp = sp;       /* sp */                             \
  regs->ARM_r2 = stack[2]; /* r2 (envp) */                      \
  regs->ARM_r1 = stack[1]; /* r1 (argv) */                      \
  regs->ARM_r0 = stack[0]; /* r0 (argc) */                      \
  nommu_start_thread(regs);                                     \
 })

In ARM architecture in assembly first four function parameters are being passed in registers r0-r3: argc, argv, envp pointers are written into r0, r1, r2 correspondingly.
Now current process is ready to run new program. Once the context switch is done for the process, usually by returning into the userland, new values of registers come into play and the code of new image is executing.

Monday, June 15, 2009

linux: creation of the new process

System developers know that you need fork system call to start a new process and execve system call to start a new program. While fork creates a copy of the current process(actually not a complete copy - please refer to the man page of fork to get information what is being duplicated and currently COW is used to postpone the copying of the memory regions), execve replaces current running process with a new executable binary.

Here I'll try to describe how the new process is being created in the kernel.

First of all the kernel has to handle a system call and switch from the userspace into the kernelspace. On different architectures this is done in different ways. On x86 it's most likely 0x80 software interrupt or swi instruction on ARM. The system call interface, SCI, demultiplexes the system call into the call of a routine in the kernel kingdom: gets the pointer of the routine in the syscall table.
fork system call is multiplexed into sys_fork kernel function. Its prototype is:

int sys_fork(struct pt_regs *regs)

. This is a arch-dependent routine. struct pt_regs is a set of CPU registers which are saved in the process' memory region. When userspace makes a system call CPU registers of current process are taken. Arch-dependent sys_fork makes initial checks(for example it returns -EINVAL on ARM without MMU) and finally calls system-independent do_fork:

long do_fork(unsigned long clone_flags,
       unsigned long stack_start,
       struct pt_regs *regs,
       unsigned long stack_size,
       int __user *parent_tidptr,
       int __user *child_tidptr)

clone_flags represent policy of what should be copied and what should be shared between the process that called sys_fork(parent process) and newly created process(child process). stack_start and stack_size point to the start of the stack and its size respectively. These values are taken from the information obtained about the current process. regs is a pointer to CPU registers of the current process. This structure represents the state of the CPU registers. For x86 it's defined as

struct pt_regs {
 long ebx;
 long ecx;
 long edx;
 long esi;
 long edi;
 long ebp;
 long eax;
 int  xds;
 int  xes;
 int  xfs;
 long orig_eax;
 long eip;
 int  xcs;
 long eflags;
 long esp;
 int  xss;
};

and for ARM as

struct pt_regs {
 long uregs[18];
};

#define ARM_cpsr uregs[16]
#define ARM_pc  uregs[15]
#define ARM_lr  uregs[14]
#define ARM_sp  uregs[13]
#define ARM_ip  uregs[12]
#define ARM_fp  uregs[11]
#define ARM_r10  uregs[10]
#define ARM_r9  uregs[9]
#define ARM_r8  uregs[8]
#define ARM_r7  uregs[7]
#define ARM_r6  uregs[6]
#define ARM_r5  uregs[5]
#define ARM_r4  uregs[4]
#define ARM_r3  uregs[3]
#define ARM_r2  uregs[2]
#define ARM_r1  uregs[1]
#define ARM_r0  uregs[0]
#define ARM_ORIG_r0 uregs[17]

parent_tidptr and child_tidptr are pointers which help userspace libraries to handle threads(NPTL respectively).
do_fork again does some checks and calls copy_process which is responsible for make a copy of the process. copy_process again does some checks of supplied flags and does the copying of the process:
* duplicates task structure;
* allocates memory for kernel stack and puts instance of struct thread_info on the bottom of the kernel stack which holds arch-specific information about the process;
* copies process information: fs, opened files, IPC, signal handling, mm, etc.;
* generates new PID and other IDs in respect of namespace information;
* does some further actions according to passed flags.
Interesting how memory copying is managed by copy_process. mm structure described by struct mm_struct is being copied by copy_mm function. If CLONE_VM was supplied it doesn't copy the memory management information of the parent process but shares it with child and adjusts reference counter of the users of this mm. If no CLONE_VM was set dup_mm is called. This function makes a copy of the mm struct but doesn't copy the contents of the memory pages - copy-on-write is used to make this process run faster and do not waste system memory. When either parent or child process attempts to write to the memory page page fault occurs and kernel recognizes that the page should be copied for each process and write operation could be later satisfied: the result of this operation would be visible only for initiator.
Another interesting function called by copy_process is copy_thread. This routine is architecture dependent and in general among other management tasks it copies CPU registers of current process to the new process and adjusts some of their values(stack pointer, etc.). Also it sets pc(for ARM)/ip(for x86) to ret_from_fork. ret_from_fork will be called next time the newly created process will be scheduled to run. This function does some cleanups and returns control to the userspace. Linux saves CPU registers in the top of the kernel stack of the process. This information helps kernel to do a context switch.
When the process is ready to run do_fork calls wake_up_new_task(or sets TASK_STOPPED if process is being traced) which informs process scheduler that task is ready.
At this point kernel returns execution point into the userland.

Tuesday, May 5, 2009

*nix: crash-safe rw-lock with semaphores

In previous post about rw-locks based on semaphores I've introduced rw-lock mechanism to limit readers.
This post introduces crash-safe implementation of rw-locks. The implemlementation doesn't take into account bad application design. It doesn't detect deadlocks however this is not hard to implement when you get the main idea.
The implementation tracks readers and writers and can detect whether reader/writer that currently blocks the queue is alive and tries to improve the whole situation.
The threads are not taken into account because if thread crashes due to some unpredicted reason most likely the process that created it will die too. If thread exits without releasing the lock this is not a crash - it's poor application design. Safiness of such behaviors is another topic I believe.

In the real world you may get some situation when you want some safiness with locking the section and the code of this section might fail because of the outer world influence(unhandled signal, OOM-killer) and interior ifluence(the code of section might cause segfault).

To behonest most of the time I don't believe the code I'm protecting and that the outer world is kind for me. The code might cause crash deeply inside the function calls and the OOM killer might chose the application as victim.
Knowing all these I want the whole application continue to run.
The implemetation of crash-safe rw-locks is listed below. Well, it's a bit linux-specific(in the mmap call) but this is just for simplicity. I didn't want to overload the code to make it cross-platform.

#include <semaphore.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <time.h>
#include <string.h>

struct rw_lock *rw;

struct rw_lock {
 sem_t wbl;
 sem_t rbl;
 sem_t r;
 sem_t w;
 int limit;
 pid_t writer;
 pid_t *readers;
};

void init_rwlock(struct rw_lock **rw, int limit)
{
 *rw = mmap(NULL, sizeof(struct rw_lock), PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, 0, 0);

 sem_init(&(*rw)->r, 1, limit);
 sem_init(&(*rw)->w, 1, 1);
 sem_init(&(*rw)->wbl, 1, 1);
 sem_init(&(*rw)->rbl, 1, 1);
 (*rw)->limit = limit;
 (*rw)->writer = 0;
 (*rw)->readers = mmap(NULL, sizeof(pid_t)*limit, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, 0, 0);
}

void rlock(struct rw_lock *rw)
{
 int i;
 struct timespec to = {
  .tv_sec = 1,
  .tv_nsec = 0
 };

 sem_wait(&rw->w);

 do {
  if (sem_timedwait(&rw->r, &to) == 0) {
   break;
  } else if (errno == ETIMEDOUT) {
   sem_wait(&rw->rbl);
   for (i=0;i<rw->limit;++i)
    if (rw->readers[i] && kill(rw->readers[i], 0) == -1 && errno == ESRCH) {
     printf("deadlock detected: process invoked rlock died(%d)\n", rw->readers[i]), fflush(NULL);
     rw->readers[i] = 0;
     sem_post(&rw->r);

     break;
    }
   sem_post(&rw->rbl);
  }
 } while (1);

 sem_wait(&rw->rbl);
 for (i=0;i<rw->limit;++i)
  if (rw->readers[i] == 0) {
   rw->readers[i] = getpid();

   break;
  }
 sem_post(&rw->rbl);

 sem_post(&rw->w);
}

void runlock(struct rw_lock *rw)
{
 int i, current = getpid();

 sem_wait(&rw->rbl);
 for (i=0;i<rw->limit;++i)
  if (rw->readers[i] == current) {
   rw->readers[i] = 0;

   break;
  }
 sem_post(&rw->rbl);

 sem_post(&rw->r);
}

void wlock(struct rw_lock *rw)
{
 int val;
 pid_t current = getpid();
 struct timespec to = {
  .tv_sec = 1,
  .tv_nsec = 0
 };
 time_t wfr0, wfr1;

 do {
  if (sem_timedwait(&rw->w, &to) == 0) {
   break;
  } else if (errno == ETIMEDOUT) {
   sem_wait(&rw->wbl);
   if (rw->writer && kill(rw->writer, 0) == -1 && errno == ESRCH) {
    printf("deadlock detected: process invoked wlock died(%d)\n", rw->writer), fflush(NULL);
    rw->writer = 0;
    sem_post(&rw->w);
   }
   sem_post(&rw->wbl);
  }
 } while (1);
 sem_wait(&rw->wbl);
 rw->writer = current;
 sem_post(&rw->wbl);

 wfr0 = time(NULL);
 do {
  wfr1 = time(NULL);
  if ((wfr1 - wfr0) > 1) {
   int i;
   sem_wait(&rw->rbl);
   for (i=0;rw->limit;++i)
    if (rw->readers[i] && kill(rw->readers[i], 0) == -1 && errno == ESRCH) {
     printf("deadlock detected: process invoked rlock died(%d)\n", rw->readers[i]), fflush(NULL);
     rw->readers[i] = 0;
     sem_post(&rw->r);

     break;
    }
   sem_post(&rw->rbl);
   wfr0 = wfr1;
  }

  sem_getvalue(&rw->r, &val);

 } while (val != rw->limit);
}

void wunlock(struct rw_lock *rw)
{
 sem_wait(&rw->wbl);
 rw->writer = 0;
 sem_post(&rw->wbl);

 sem_post(&rw->w);
}

The idea is simple. If the process unable to lock the section for some time it checks whether the previos holder is alive. Otherwise the process tries to release the semaphore and accuire it again. It's possible to omit release/accuire procedure and just change the holder. You'll gain some performance with this change. I left this just to make the code more clear.

The application that simulates crashes is shown below:

int *counter;

void reader(void)
{
 while (1) {
  rlock(rw);
  if (*counter == 1024*4) {
   runlock(rw);
   break;
  }
  if (*counter !=0 && *counter%1024 == 0) {
   printf("reader died(counter: %d, pid: %d)\n", *counter,getpid()), fflush(NULL);
   *(int *)0 = 0;
  }
  runlock(rw);
 }
}

void writer(void)
{
 while (1) {
  wlock(rw);
  if (*counter == 2048*2) {
   wunlock(rw);
   break;
  }
  ++*counter;
  if (*counter !=0 && *counter%2048 == 0) {
   printf("writer died(counter: %d, pid: %d)\n", *counter, getpid()), fflush(NULL);
   *(int *)0 = 0;
  }
  wunlock(rw);
 }
}
int main(int argc, char **argv)
{
 int i;

 counter = mmap(NULL, sizeof(int), PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, 0, 0);

 init_rwlock(&rw, 5);

 for (i=0;i<10;++i)
  if (fork() == 0) {
   reader();

   return 0;
  }

 for (i=0;i<5;++i)
  if (fork() == 0) {
   writer();

   return 0;
  }

 for (i=0;i<15;++i)
  wait(NULL);

 printf("counter: %d\n", *counter);

 return 0;
}

After running this application you'll find the information about the crashed processes in dmesg:

*sudo dmesg -c
test[8806]: segfault at 0 ip 08048d5e sp bf828290 error 6 in test[8048000+2000]
test[8812]: segfault at 0 ip 08048e01 sp bf828290 error 6 in test[8048000+2000]
test[8801]: segfault at 0 ip 08048d5e sp bf828290 error 6 in test[8048000+2000]
test[8804]: segfault at 0 ip 08048d5e sp bf828290 error 6 in test[8048000+2000]
test[8808]: segfault at 0 ip 08048d5e sp bf828290 error 6 in test[8048000+2000]
test[8799]: segfault at 0 ip 08048d5e sp bf828290 error 6 in test[8048000+2000]
test[8800]: segfault at 0 ip 08048d5e sp bf828290 error 6 in test[8048000+2000]
test[8802]: segfault at 0 ip 08048d5e sp bf828290 error 6 in test[8048000+2000]
test[8805]: segfault at 0 ip 08048d5e sp bf828290 error 6 in test[8048000+2000]
test[8809]: segfault at 0 ip 08048e01 sp bf828290 error 6 in test[8048000+2000]

And the application should produce following output:

reader died(counter: 1024, pid: 8806)
deadlock detected: process invoked rlock died(8806)
writer died(counter: 2048, pid: 8812)
deadlock detected: process invoked wlock died(8812)
reader died(counter: 2048, pid: 8801)
reader died(counter: 2048, pid: 8804)
reader died(counter: 2048, pid: 8808)
reader died(counter: 2048, pid: 8799)
reader died(counter: 2048, pid: 8802)
deadlock detected: process invoked rlock died(8801)
reader died(counter: 2048, pid: 8800)
deadlock detected: process invoked rlock died(8800)
reader died(counter: 2048, pid: 8805)
deadlock detected: process invoked rlock died(8805)
deadlock detected: process invoked rlock died(8804)
deadlock detected: process invoked rlock died(8808)
deadlock detected: process invoked rlock died(8799)
deadlock detected: process invoked rlock died(8802)
writer died(counter: 4096, pid: 8809)
deadlock detected: process invoked wlock died(8809)
counter: 4096, max_readers: 5

As you can see there are few readers crashed in the same circumstances just because they were allowed to do so.

Tuesday, April 21, 2009

linux: how to become a real daemon?

When you want to write a daemon - a program that runs in background and serves during all the time that machine run(or almost all the time) you there are few things that's better to remember.

First of all the daemon should release controlling terminal. If daemon holds controlling terminal SIGHUP is sent when controlling terminal is lost. This happens if CLOCAL flag on the terminal is not set. That's not a good idea to handle these signals as a workaround to keep process running if controlling terminal is lost.
In linux(and maybe in some other OSes) lately CLOCAL is usually set but the developer shouldn't rely on this.

There are few ways to detach from the controlling terminal.
The easy way is to request TIOCNOTTY from the tty device if it supports it:

int tty = open("/dev/tty", O_RDWR);
if (tty == -1)
    return -1;

if (ioctl(tty, TIOCNOTTY, (char *)0) == -1)
    return -1;

close(tty);

This might not work for in some OSes. There is another opportunity.
Daemon has to create a new session using setsid. The caller of setsid becomes a process leader without controlling terminal. There is one thing with setsid - it creates a new session if caller is not a process group leader.
To do so first of all the daemon should call setsid from its child. The child will become a process leader and will get closer to be a real daemon and the parent will exit taking controlling tty with itself:

#include <unistd.h>

int main(int argc, char **argv)
{
 if (fork() == 0) {
  setsid();
  while (1) {
   /* Daemons like to sleep */
   sleep(1);
  }
 }

 return 0;
}

Next thing is to release current working directory. It's not good if daemon starts in some directory that is placed on filesystem that might be unmounted during the daemon's work. You won't be able to unmount the filesystem in that case. Simply chdir to '/' is a good choice.
Also it'd be nice to close stdin/stdout/stderr file descriptors since the daemon doesn't need them and some system resources could be saved. If the daemon is going to produce some output that's bad idea to use inherited stdout and stderr since you don't know where they might be redirected. It's better to provide new open descriptors(stdout and/or stderr might point to some log file):

#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

int main(int argc, char **argv)
{
 if (fork() == 0) {
  int log;

  setsid();

  log = open("/tmp/test.log", O_CREAT|O_APPEND|O_RDWR, 0755);
  close(0);
  close(2);
  dup2(log, 1);

  printf("Daemon is started\n");
  while (1) {
   /* Daemons like to sleep */
   sleep(1);
   printf("Daemon is sleeping\n");
   fflush(NULL);
  }
 }

 return 0;
}

The output will go to "/tmp/test.log":

*./test 
*tail /tmp/test.log -f
Daemon is started
Daemon is sleeping
Daemon is sleeping
Daemon is sleeping
Daemon is sleeping
Daemon is sleeping

Tuesday, April 14, 2009

emacs: proper kill-whole-line

I'm using emacs not for a long time but I really liked it. It took not much time to get used to it. Though emacs doesn't have some function you are free to write them in elisp easily!
I used to vim's 'dd' command which kills current line and later you are able to yank it.
emacs has 'kill-line' function to kill line from the current point except newline character or 'kill-whole-line' which kills following newline also. This doesn't fit me entirely. I would like to kill the whole line from the beginning including newline character. Simply you can wrap 'kill-whole-line' with adding an instruction to jump into the beginning of the line and finally kill it.
Looks like everything is done except the situation when you want to yank it back. When you'll try to yank the killed line you'll see that besides wanted line you have an extra newline. It was annoying for me to yank and kill the appendix. Finally I wrote my variant of 'kill-whole-line' - kill-total-line:

(defun kill-total-line ()
  (interactive)
  (let ((kill-whole-line t)) 
        (beginning-of-line)
        (kill-line)
        (setq top (car kill-ring))
        (setq last (substring top -1))
        (if (string-equal last "\n")
                (let ()
                  (setq stripped (substring top 0 -1))
                  (setq kill-ring (cons stripped (cdr kill-ring)))
                  )
          )
        )
  )

This 'kill-total-line' function kills the line from the beginning to the end and cuts newline symbol of the killed line in the kill-ring if any. Now I'm happy to kill lines and yank them back!

Tuesday, April 7, 2009

c: offset of the member in struct

Sometimes, very rarely, you know the pointer to the member of the structure and the type of the structure it belongs to. And in this rare situation you want to get pointer to the entire structure.

To get the pointer of the original structure you should know the offset of this member in the structure. The hint here is that you can use some base address(0 fits the best here), cast it into the pointer of the structure type, dereference with given member name and voilà - you have the offset. Look at the next code snippet:

struct S {
 int m0;
 char m1;
 short m2;
 long m3;
};
unsigned int m3_offset = (unsigned int)(&((struct S *)0)->m3);

Variable m3_offset holds the offset of the member m3 in structure S. In this case the offset of m3 is equal 8. So the address of the variable of type S is (address of m3) - m3_offset.

struct S *sp = (struct S *)((char *)&s.m3 - m3_offset);

You should cast pointer to member into char * to correctly use pointer arithmetics.
The whole code that shows this trick is below:

#include <stdio.h>

struct S {
 int m0;
 char m1;
 short m2;
 long m3;
};

unsigned int m3_offset = (unsigned int)(&((struct S *)0)->m3);

int main(int argc, char **argv)
{
 struct S s = {1, 2, 3, 4};
 struct S *sp = (struct S *)((char *)&s.m3 - m3_offset);

 printf("m1: %d\n", sp->m1);

 return 0;
}

The output should be

*./test 
m1: 2

This technique could be used to easily construct nested structures:

struct S0 {
 int s0m0;
};

struct S1 {
 struct S0 s1m0;
 int s1m1;
};

/**
 * pointer to member in S structure
 * type of S
 * name of member
 */
#define S(s, st, m)            \
 ({               \
  unsigned int offset = (unsigned int)(&((st *)0)->m); \
  (st *)((char *)s-offset);        \
 })

int main(int argc, char **argv)
{
 struct S1 s = {{1}, 2};

 printf("%d",S(&s.s1m0, struct S1, s1m0)->s1m1);

 return 0;
}

It's easy to build such data types as trees, queues and so on separately defining payload structures(data of the node) and helper structures(pointers to other nodes, etc.)
Doubly ended queue in linux kernel uses this approach and looking into the code you can definitely say - the code looks much clear if this technique is used.

Tuesday, March 31, 2009

linux: semget across the processes

Recently I've written an issue on whether the semaphore handle obtained with sem_open is valid in children processes.
To make the picture complete I'd like to describe is it possible to pass semaphore handle to children processes obtained by IXS's semget.

When you call semget with some key and the call was successful you get semaphore id that is valid not only within process tree but system wide.
So actually there shouldn't be any problems with the children processes that are using semaphore handle obtained in the parent process. Let's glance at the code below.

#include <unistd.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
#include <errno.h>
#include <string.h>

union semun
{
 int              val;    /* Value for SETVAL */
 struct semid_ds *buf;    /* Buffer for IPC_STAT, IPC_SET */
 unsigned short  *array;  /* Array for GETALL, SETALL */
};

int main()
{
 struct sembuf op[1];
 union semun ctrl;
 int sem = semget(0xF00B00, 1, IPC_CREAT|0600);
 if (sem == -1)
 {
  perror("semget");

  return 1;
 }

 op[0].sem_num = 0;
 op[0].sem_flg = 0;

 ctrl.val = 1;
 if (semctl(sem, 0, SETVAL, ctrl) == -1)
 {
  perror("semctl");

  return 1;
 }

 switch(fork())
 {
  case -1:
   perror("fork()");

   return 1;

  case 0:
  {
   printf("Child %u waiting for semaphore(%d)...\n",getpid(), sem);
   op[0].sem_op = 0;
   semop(sem, op, 1);
   printf("Child: Done\n");

   return 0;
  }
 }

 sleep(1);

 printf("Parent %u setting semaphore(%d)...\n",getpid(),sem);
 op[0].sem_op = -1;
 semop(sem, op, 1);
 printf("Parent: Set\n");

 return 0;
}

Here a semaphore was created and its initial value was 1. Next child process was created which waits on the semaphore until its value becomes zero. It finishes as soon as parent process decrements the value of the semaphore.The output of this test application would be

$ ./test 
Child 3353 waiting for semaphore(0)...
Parent 3352 setting semaphore(0)...
Child: Done
Parent: Set

With ipcs we can look at the system semaphores:

$ ipcs -s

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     
0x00f00b00 0          niam      600        1

You can see that semaphore with key 0x00f00b00 and id 0 is present in the system.
Since semaphore is system wide and could be accessed by its id the call to sem get may be omitted is we know that semaphore is already available and the id is known. The following code snippet is the same as previous one except there's no call to semget but value of sem is hardcoded with value 0(the id of that semaphore in our case):

int main()
{
 struct sembuf op[1];
 union semun ctrl;
 int sem = 0;

 op[0].sem_num = 0;
 op[0].sem_flg = 0;

 ctrl.val = 1;
 if (semctl(sem, 0, SETVAL, ctrl) == -1)
 {
  perror("semctl");

  return 1;
 }

 switch(fork())
 {
  case -1:
   perror("fork()");

   return 1;

  case 0:
  {
   printf("Child %u waiting for semaphore(%d)...\n",getpid(), sem);
   op[0].sem_op = 0;
   semop(sem, op, 1);
   printf("Child: Done\n");

   return 0;
  }
 }

 sleep(1);

 printf("Parent %u setting semaphore(%d)...\n",getpid(),sem);
 op[0].sem_op = -1;
 semop(sem, op, 1);
 printf("Parent: Set\n");

 return 0;
}

The output is

$ ./test 
Child 3366 waiting for semaphore(0)...
Parent 3365 setting semaphore(0)...
Child: Done
Parent: Set

Here as before the child process waits for the semaphore until the parent process decrements its value to become 0.

If you'll look into the code of semget you'll find that it does semget syscall. In ipc/util.c in the linux kernel tree you should be able to find ipcget function which calls ipcget_public routine that makes key checks and in under certain circumstances creates new semaphore with new id and adds to semaphore set. The value of id is system wide, so after the semaphore had been created you will be able to get access to it if you have appropriate permissions.

Friday, March 27, 2009

linux: automatic login

The time of laptop-per-human is almost came. Laptop is something personal like toothbrush - you are the only owner.
I do not really worry who is using laptop - because only I use it. So logging in each time I plug power on seemed strange for me. So I've forced gdm(gnome desktop manager) to autologin mode. The time passed and I realized that the only purpose of gdm is to automatically login to my session. "What the hell?" I said to myself. Why do I have to spend my laptop's resource on something that I don't really need(yep, I prefer to spend memory and cpu on emacs and gcc)?

So I've found nice approach to autologin to X session w/o any desktop managers.
That's really simple.
You should edit /etc/inittab configuration file and replace line which loads desktop manager with command that will launch X session for your user. For me it is

x:5:once:/bin/su <user> -l -c "/bin/bash --login -c 'ck-launch-session startx' &>/dev/null"

For distributions that use another approach to run DM(like gentoo - they load DM as a startup script) you can replace the line which loads DM with

/bin/su <user> -l -c "/bin/bash --login -c 'ck-launch-session startx' &>/dev/null"

or disable loading the startup script and modify inittab like I showed above.

Also it's possible to make automatic login in virtual consoles:

c1:2345:respawn:/sbin/mingetty -n -l <user> vc/1 linux

c1:2345:respawn:/sbin/agetty -n -l <external program> 38400 vc/1 linux

Whe external program will perform login for you. The program might be quite simple - just exec 'login' for your user.