Tuesday, December 23, 2008

asm: overwriting return point of the function/stack overflow

Stack overflow is a common attack in programming world.
To understand how it could be done we should be aware about the function's stack and how the function is being executed and how it passes the execution of the code in the parent function after its call.
The stack of the function, at least in *nix, should look like

|function parameters| <--top of the stack(higher memory addresses)
|---return  point---| <--%esp
|--local variables--|
|-------------------| <--bottom of the stack(lower memory addresses)
Let's examine simple program written in asm. It has a function pc that puts giver character onto stdout and adds '\n'. In main this function is called with argument which value is '*'.
 1 .text
 2 
 3 pc:
 4     pushl %ebp
 5     movl %esp, %ebp
 6     
 7     subl $4, %esp /*4 bytes for local variables*/
 8     
 9     pushl 8(%ebp)/*get value of the function parameter*/
10     call putchar
11     pushl $0x0a /*new line*/
12     call putchar
13     addl $8, %esp/*allign stack*/
14     
15     movl %ebp, %esp
16     popl %ebp
17     ret
18     
19 
20 .global main
21 
22 main:
23     pushl $0x0000002a /*character '*'*/
24     call pc
25     addl $4, %esp/*allign stack*/
26     
27     movl $1, %eax
28     movl $0, %ebx
29     int $0x80/*exit(0)*/
I set a breakpoint on line 4 in gdb and got the information about the registers
(gdb) i r
eax            0xbfae0a34 -1079113164
ecx            0x312f6668 825189992
edx            0x1 1
ebx            0xb7fa4ff4 -1208332300
esp            0xbfae09a4 0xbfae09a4
ebp            0xbfae0a08 0xbfae0a08
esi            0xb7fe2ca0 -1208079200
edi            0x0 0
eip            0x8048384 0x8048384 <pc>
Address of %esp is 0xbfae09a4, so here is the top of the stack of our function pc.
In *nix world stack of the process grows from the higher memory addresses to the lower ones. So to get function parameter we should add 4 bytes to %esp(the size of return point is 4 bytes)[Note, on line 9 I pushed the address of %ebp + 8 because after 'pushl %ebp' value of %esp increased with 4 bytes.]
(gdb) x/c 0xbfae09a4 + 4
0xbfae09a8: 42 '*'
Yes, here we have '*' _because_ we indeed pushed it onto the stack on line 23. In %esp we can find the address of the return point
(gdb) x/x 0xbfae09a4 
0xbfae09a4: 0x080483a7
0x080483a7 is the address of the next instruction after the call of pc in main. Let's check.
Going through instruction in gdb I got out from pc
(gdb) n
pc () at fcall.s:17
17  ret
(gdb) n
main () at fcall.s:25
25  addl $4, %esp/*allign stack*/
(gdb) i r
eax            0xa 10
ecx            0xffffffff -1
edx            0xb7fa60b0 -1208328016
ebx            0xb7fa4ff4 -1208332300
esp            0xbfae09a8 0xbfae09a8
ebp            0xbfae0a08 0xbfae0a08
esi            0xb7fe2ca0 -1208079200
edi            0x0 0
eip            0x80483a7 0x80483a7 <main+7>
You can see that value of %eip is 0x80483a7, so we were right. To make a program run any other code rather than return to the parent function the address of the return point has to be overwritten.
The following code attempts to do so.
It has function evil which address will be written to the return point of the function pc. Function evil writes '%\n' on the output and calls exit syscall with exit code 1.
 1 .text
 2 
 3 evil:
 4     pushl %ebp
 5     movl %esp, %ebp
 6 
 7     pushl $0x00000025 /*character '%'*/
 8     call putchar
 9     pushl $0x0a /*new line*/
10     call putchar
11 
12     movl $1, %eax
13     movl $1, %ebx
14     int $0x80/*exit(1)*/
15     
16 pc: 
17     pushl %ebp
18     movl %esp, %ebp
19     
20     subl $4, %esp /*4 bytes for local variables*/
21 
22     pushl 8(%ebp)
23     call putchar
24     pushl $0x0a /*new line*/
25     call putchar
26     addl $8, %esp/*allign stack*/
27     
28     movl %ebp, %esp
29     popl %ebp
30 
31     movl $evil, (%esp)
32 
33     ret
34 
35 
36 .global main
37 
38 main:
39     pushl $0x0000002a /*character '*'*/
40     call pc
41     addl $4, %esp/*allign stack*/
42     
43     movl $1, %eax
44     movl $0, %ebx
45     int $0x80/*exit(0)*/
The result of the exucution of this program should be
$gcc fcall.s -o fcall -g
$./fcall 
*
%
$echo $?
1

Tuesday, December 16, 2008

c++: multidimensional arrays in the (dynamic) memory

I know some solutions how to store multidimensional arrays in the (dynamic) memory.
I'd like to share this knowledge because I noticed that not all of the developers understand what is going on in this field.
Let's look at different ways how to create 2-dimension array of objects of the class A which code is below

class A
{
    public:
        void * operator new(size_t size)
        {
            void *p = malloc(size);
            cout << "new, size: " << size << "\n";
            return p;
        }

        void * operator new[](size_t size)
        {
            void *p = malloc(size);
            cout << "new[], size: " << size << "\n";
            return p;
        }

        A() 
        {   
            cout << "A()\n";
            id = ++counter;
        }

        ~A()
        {   
            cout << "~A()\n";
        }   

        void call()
        {   
            cout << "id #" << id << ", " << counter << " times constructor of A was called\n";
        }

        static int counter;
        int id; 
};

int A::counter = 0;
I added some code for tracing operator new, constructor and destructor calls.
Each time the constructor is called value of class static variable counter is incremented by 1 and its new value is assigned to class member variable id.
  • The first method and the simplest.
    Simply to allocate 2x2 array of A on the stack.
    cout << "size of A: " << sizeof(A) << "\n";
    A z[2][2];
    z[1][1].call();
    (z[1]+1)->call();
    (*z+3)->call();
    This piece of code produces
    size of A: 4
    A()
    A()
    A()
    A()
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    ~A()
    ~A()
    ~A()
    ~A()
    4 times constructor was called, 4 times destructor, no calls of operator new.
  • The second, a bit more complex.
    Allocate memory for 2x2 array of A in the heap.
    cout << "size of A: " << sizeof(A) << "\n";
    A (*z)[2] = new A[2][2];
    z[1][1].call();
    (z[1]+1)->call();
    (*z+3)->call();
    delete [] z;
    The output should be
    size of A: 4
    new[], size: 20
    A()
    A()
    A()
    A()
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    ~A()
    ~A()
    ~A()
    ~A()
    
    4 times constructor was called, 4 times destructor, 1 call of operator new[] to allocate memory for all 4 objects.
  • The next method is used to allocate memory in the heap for one-dimension array of size 2 of pointers to A. Then allocate memory for one-dimension 'sub-arrays'.
    cout << "size of A: " << sizeof(A) << "\n";
    A **z = new A*[2];
    z[0] = new A[2];
    z[1] = new A[2];
    
    z[1][1].call();
    (z[1]+1)->call();
    (*z+3)->call();
    
    delete [] z[0];
    delete [] z[1];
    delete [] z;
    size of A: 4
    new[], size: 12
    A()
    A()
    new[], size: 12
    A()
    A()
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    ~A()
    ~A()
    ~A()
    ~A()
    2 times constructor was called after each call to operator new[] to allocate memory for 2 objects, 4 times destructor was called
  • This method is tricky a little bit. We allocate one-dimension array of size 4. Using pointer arithmetics we can simulate two-dimension array.
    cout << "size of A: " << sizeof(A) << "\n";
    A *z = new A[2*2];
    z[2+1].call();
    (z+3)->call();
    delete [] z;
    size of A: 4
    new[], size: 20
    A()
    A()
    A()
    A()
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    ~A()
    ~A()
    ~A()
    ~A()
    
  • 4 times constructor was called, 4 times destructor, 1 call of operator new[] to allocate memory for all 4 objects.
  • This one is a combination of storing 2x2 array in the heap and in the stack. At first one-dimension array of pointers to A is put onto the stack and later memory from heap is used to allocate one-dimension 'sub-arrays'.
    cout << "size of A: " << sizeof(A) << "\n";
    A *z[2];
    z[0] = new A[2];
    z[1] = new A[2];
    
    z[1][1].call();
    (z[1]+1)->call();
    (*z+3)->call();
    
    delete [] z[0];
    delete [] z[1];
    size of A: 4
    new[], size: 12
    A()
    A()
    new[], size: 12
    A()
    A()
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    ~A()
    ~A()
    ~A()
    ~A()
    
    2 times constructor was called after each call to operator new[] to allocate memory for 2 objects, 4 times destructor was called
All methods have their '+'s and '-'s. One can take more time but require less memory and the other one can take more memory but could be executed faster. That depends how many calls have been done to allocate memory, where memory was taken to allocate an array, etc. Also you should remember c++ restriction for arrays on the stack that their size must be known during the compile time. The dark side of memory from the heap is that it should be explicitly released when it become unused. Some of them are more expressive for understanding some of them not.
This is upon you.

Thursday, December 11, 2008

autoconf: square brackets in AS_HELP_STRING

With autoconf(2.63) if I wanted to use square brackets for AS_HELP_STRING I didn't succeed. I have been trying to add extra [] around the helpstring according to manual:

Each time there can be a macro expansion, there is a quotation expansion, i.e., one level of quotes is stripped:
int tab[10];
     =>int tab10;
     [int tab[10];]
     =>int tab[10];

The solution I've found in configure.ac of qore programming language.
The quadrigraphs are used there.
'@<:@' for '[' and '@:>@' for ']' could be used in autoconf input file.
So now I have nice output of ./configure --help in my project:
....
  --with-mysql[=DIR]      use mysql
  --with-fcgi[=DIR]       use fast CGI
  --with-pcre[=DIR]       use pcre
....
The code in configure.in looks like:
....
AS_HELP_STRING([--with-sqlite@<:@=DIR@:>@], [use sqlite])...
....

Wednesday, December 10, 2008

emacs: the dark side of the force

Recently I've decided to try the dark side of the force - emacs.
I'm Vim user for a long time. Several times I wanted to try emacs but didn't have a good chance.
Now I'm working on project with huge amount of sources and I decided to try emacs for it.
It works! ;)

Playing with emacs I've found out that it's not so complex as some people say.

The thing to which I couldn't get used to for some time is that I don't have to press ESC to switch to the command mode, press i(INS) to switch to editor mode and so on.

I haven't found some Vim features(as visual mode) but I believe that just don't know how to make them work.

The main difference is that there is no distinct differences between editor mode and command mode. You are allowed to run commands while you are editing the text.

All commands(or better to say most of them) begin with control- or meta- character. control is usually Ctrl on your keyboard and meta is Alt.

For guys who want to try emacs here is the migration guide on vim-to-emacs commands.
The table of equivalence of vim and emacs commands.

split horizontalsplit vertical
VIMEMACSDescription
:qa/:wqa/:xa/:qa!C-x C-cexit(emacs prompts whether to save buffers or not)
hC-bleft
lC-fright
b/BM-bword backward
w/WM-fword forward
jC-ndown
kC-pup
0C-abeginning of the line
$C-eend of the line
gg/1GC-<beginning of the buffer
GC->end of the buffer
xC-ddelete under cursor
DC-kdelete from cursor to EOL
ddC-k C-kdelete line
dw/dWM-ddelete word
db/dBM-{BACKSPACE}delete word backwards
:set ignorecase {ENTER} /C-s {needle in lower case}icase search forward
:set ignorecase {ENTER} ?C-r {needle in lower case}icase search backward
/C-ssearch forward
?C-rsearch backward
:set ignorecase {ENTER} /M-C-s {regexp in lower case}icase regexp search forward
:set ignorecase {ENTER} ?M-C-r {regexp in lower case}icase regexp search backward
:%s/{needle}/{replacement}/gcM-% {needle} {ENTER} {replacement} {ENTER}query replace
/M-C-sregexp search forward
?M-C-rregexp search backward
uC-_/C-x uundo
C-RC-_/C-x uredo(it's tricky for emacs*)
ESCC-gquit the running/entered command(switch to command mode in Vim)
:e fileC-x C-fopen file
:set roC-x C-qset file as read-only
:wC-x C-ssave buffer
:w fileC-x C-w filesave buffer as ...
:waC-x ssave all buffers
:buffersC-x C-bshow buffers
:b [name]C-x b [name]switch to another buffer
:q/:q!/:wq/:xC-x kclose buffer
C-w n/:splitC-x 2
C-w v/:vsplitC-x 3
C-w C-wC-x oswitch to another window
:qC-x 0close window
C-w oC-x 1close other windows
:! {cmd}M-!run shell command
m{a-zA-Z}C-@/C-spaceset mark
C-x C-xexchange mark and position
{visual}yM-wcopy region**
{visual}dC-wdelete region**
pC-ypaste
C-V {key}C-q {key}insert special char, e.g. ^M:
{visual}SHIFT->C-x TABindent region
C-]M-.find tag
C-tM-*return to previous location
:e!M-x revert-bufferreload buffer from disk

*To redo changes you have undone, type `C-f' or any other command that will harmlessly break the
sequence of undoing, then type more undo commands
**region is from current position to the mark

Other useful emacs commands:
M-ggo to line
C-x iinsert file
C-x hmark whole buffer
C-x C-tswitch two lines
M-C-abeginning of the function
M-C-eend of the function
M-abeginning of the statement
M-eend of the statement
M-C-hmark the function
M-/autocompletion
M-C-\indent region
C-c C-qindent the whole function according to indention style
C-c C-ccomment out marked area
M-x uncomment-regionuncomment marked area
M-,jumps to the next occurence for tags-search
M-;insert comment in code
C-x w hhighlight the text by regexp
C-x w rdisable highlighting the text by regexp

To run emacs without X add -nw command line argument.

To run multiply commands 'C-u {number} {command}' or 'M-{digit} {command}'.

emacs has bookmarks that are close to Vim marks:
C-x r mset a bookmark at current cursor position
C-x r bjump to bookmark
C-x r llist bookmarks
M-x bookmark-write write all bookmarks in given file
M-x bookmark-loadload bookmarks from given file

My ~/.emacs looks like:
(setq load-path (cons "~/.emacs.d" load-path))

(auto-compression-mode t) ; uncompress files before displaying them

(global-font-lock-mode t) ; use colors to highlight commands, etc.
(setq font-lock-maximum-decoration t)
(custom-set-faces)
(transient-mark-mode t) ; highlight the region whenever it is active
(show-paren-mode t) ; highlight parent brace
(global-hi-lock-mode t) ; highlight region by regexp

(column-number-mode t) ; column-number in the mode line

(setq make-backup-files nil)

(setq scroll-conservatively most-positive-fixnum) ; scroll only one line when I move past the bottom of the screen

(add-hook 'text-mode-hook 'turn-on-auto-fill) ; break lines at space when they are too long

(fset 'yes-or-no-p 'y-or-n-p) ; make the y or n suffice for a yes or no question

(setq comment-style 'indent)

(global-set-key (kbd "C-x C-b") 'buffer-menu) ; buffers menu in the same window

(global-set-key (kbd "C-x 3") 'split-window-horizontally-other) ; open new window horisontally and switch to it
(defun split-window-horizontally-other ()
        (interactive)
        (split-window-horizontally)
        (other-window 1)
)

(global-set-key (kbd "C-x 2") 'split-window-vertically-other) ; open new window vertically and switch to it
(defun split-window-vertically-other ()
 (interactive)
 (split-window-vertically)
 (other-window 1)
)

(global-set-key (kbd "C-c c") 'comment-region) ; comment code block
(global-set-key (kbd "C-c u") 'uncomment-region) ; uncomment code block

(global-set-key (kbd "C-x TAB") 'tab-indent-region) ; indent region
(global-set-key (kbd "C-x <backtab>") 'unindent-region) ; unindent region
(defun tab-indent-region ()
    (interactive)
 (setq fill-prefix "\t")
    (indent-region (region-beginning) (region-end) 4)
)
(defun unindent-region ()
    (interactive)
    (indent-region (region-beginning) (region-end) -1)
)

(global-set-key (kbd "TAB") 'self-insert-command)
(global-set-key (kbd "RET") 'newline-and-indent)

(setq key-translation-map (make-sparse-keymap))
(define-key key-translation-map "\177" (kbd "C-="))
(define-key key-translation-map (kbd "C-=") "\177")
(global-set-key "\177" 'delete-backward-char)
(global-set-key (kbd "C-=") 'delete-backward-char)

(setq indent-tabs-mode t)
(setq tab-always-indent t)
(setq default-tab-width 4)

(setq inhibit-startup-message t) ; do not show startup message

(iswitchb-mode t)
(desktop-save-mode t)

(display-time)

Happy emacsing!

Tuesday, December 9, 2008

perl: manipulations with standart output stream

Builtin functions print and write put data to stdout if the filehandle is omitted.
This is really handy.
But since you want to write to some other stream each time you have to specify filehandle for them.
It's possible to point STDOUT to some other filehandle by reassigning it

open $nfh, '>test';
$ofh = *STDOUT;
*STDOUT = *$nfh;

print "test";

*STDOUT = $ofh;
close $nfh;
or by using select function
open $nfh, '>test';
$ofh = select $nfh;

print "test";

select $ofh;
close $nfh;
and still use print and write w/o specifying the filehandle. The second method with select looks better for me.
If you use a lot of ways for output the LIFO(Last-In, First-Out) of filehandles can be used to implement an easy way to walk through them like pushd/popd in bash
#!/usr/bin/perl
@fh = ();

open my $nfh, '>test-0';
push @fh, select $nfh;

print "test\n";

open my $nfh, '>test-1';
push @fh, select $nfh;

print "test\n";

close select pop @fh;
print "test\n";
close select pop @fh;

print "test\n";
The result of executing this script
$./test.pl 
test
$cat test-*
test
test
test
Note, I used my with filehandle in open to give it an undefined scalar variable. Otherwise open will associate the stream with this variable. And the old stream will be lost. The following code does the same work w/o using my expression
#!/usr/bin/perl
@fh = ();

open $nfh, '>test-0';
push @fh, select $nfh;

print "test\n";

$nfh = undef;
open $nfh, '>test-1';
push @fh, select $nfh;

print "test\n";

close select pop @fh;
print "test\n";
close select pop @fh;

print "test\n";
The result of executing this script
$./test.pl 
test
$cat test-*
test
test
test

Thursday, December 4, 2008

intel VT: how to enable

My Intel(R) Core(TM)2 Duo CPU supports intel VT but it's disabled by default in BIOS.

Currently I'm hardly testing libdodo in different environments. As I'm running IA32 kernel I was testing it in x86 environments. Lately I decided to test it in x86_64 with VMWare. I went to BIOS, turned on VT support, rebooted ... and VMWare player told me that my platform supports VT but it's disabled. What the heck? I checked that the option is enabled in BIOS several times but still couldn't run x86_64 guest OS.

Googling didn't help. Running guest OS under qemu x86_64 simulation was the last option. It's extremely slow.

When I almost gave up I reached a very interesting article.
The author is telling that after the VT option was enabled in BIOS, the processor must be plugged off from the electricity circuit.
I turned off my laptop removed power supply and battery, waited for a few seconds, plugged the stuff back, booted and started VMWare with x86_64 guest. BINGO.

Now I'm able to run x86_64 guests on x86 host.

Tuesday, December 2, 2008

linux: zombies

New processes are being created with fork system call. After the fork is completed there are two processes available - parent and child.

Zombie is a process that was finished before the parent and the parent hadn't made any attempts to reap it.

When the child process was stopped or terminated the kernel still holds some minimal information about it in case the parent process will attempt to get it. This information is a task_struct where pid, exit code, resource usage information, etc. could be found. All other resources(memory, file descriptors, etc.) should be released. For details you can dig into the code of the do_exit function in kernel/exit.c in the linux kernel sources.

The parent should be notified about the child's death with SIGCHLD signal.

When the parent receives SIGCHLD it can get the information about the child by calling wait/waitpid/... system call. In the interval between the child terminating and the parent calling wait, the child is said to be a zombie. Even it's can't be in running or idle states it still takes place in the process table.
As soon as parent process receives SIGCHLD signal and it's ready to get information about the dead child this information is passed to it and all the information about the child is being removed from the kernel. In wait_task_zombie function(kernel/exit.c) you should find the details.

In fact the memory that is hold by a zombie is really small but it still in the process table and processed by the scheduler and also as the process table has a fixed number of entries it is possible for the system to run out of them.

When the parent terminates without waiting for the child zombie process is adopted by 'init' process which calls wait to clean up after it.

Let's look at the common zombie and its code.

#include <unistd.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    if (fork() == 0)
    {   
        printf("%d\n", getpid());
        fflush(stdout);
        _exit(0);
    }   
    else
    {   
        printf("%d\n", getpid());
        fflush(stdout);
        while (1) 
            sleep(10);
    }   
    
    return 0;
}
The output should be something like
$./zombie 
4090
4091
Grepping ps's output I got
$ps aux | grep -E "(4090|4091)"
niam      4090  0.0  0.0   1496   340 pts/2    S+   12:24   0:00 ./zombie
niam      4091  0.0  0.0      0     0 pts/2    Z+   12:24   0:00 [zombie] 
You can see that child process became a zombie.

Knowing that zombies if are not a complete evil but are very close to it, there existence should be prevented.

There are some possibilities to do that.

First of all if you want to care why the child was finished you should call wait. This is an only way I know to do that.

I know two modes of wait: blocking and non-blocking. Both methods are listed below.
  • Blocking method that will suspend parent until the SIGCHLD is received.
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            _exit(0);
        }   
        else
        {   
            int status;
            wait(&status);
            printf("%d\n", getpid());
            fflush(stdout);
            while (1) 
                sleep(10);
        }   
        
        return 0;
    }
    And the resulting output for this code was
    $./zombie 
    4949
    4950
    
    $ps aux | grep -E '(4949|4950)'
    niam      4949  0.0  0.0   1496   340 pts/2    S+   13:12   0:00 ./zombie
  • Non-blocking which won't put the parent into the sleep state but requires multiply calls of waitpid.
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    #include <sys/wait.h>
    
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            _exit(0);
        }   
        else
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            int status;
            while (1) 
            {
                waitpid(-1, &status, WNOHANG);
                sleep(10);
            }
        }   
        
        return 0;
    }
    The output was
    $./zombie 
    4932
    4931
    
    $ps aux | grep -E '(4931|4932)'
    niam      4931  0.0  0.0   1496   336 pts/2    S+   13:07   0:00 ./zombie

Another approach is to disregard child's exit status and detach it.
  • Redefine SIGCHLD signal handler to specify SA_NOCLDSTOP flag for it.
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    
    int main(int argc, char **argv)
    {
        struct sigaction sa;
        sigaction(SIGCHLD, NULL, &sa);
        sa.sa_flags |= SA_NOCLDWAIT;//(since POSIX.1-2001 and Linux 2.6 and later)
        sigaction(SIGCHLD, &sa, NULL);
    
        if (fork() == 0)
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            _exit(0);
        }   
        else
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            while (1) 
                sleep(10);
        }   
        
        return 0;
    }
    The output should be something like
    $./zombie 
    4416
    4417
    
    $ps aux | grep -E '(4416|4417)'
    niam      4416  0.0  0.0   1496   340 pts/2    S+   12:41   0:00 ./zombie
  • Set SIGCHLD signal handler to SIG_IGN(ignore this signal).
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    
    int main(int argc, char **argv)
    {
        struct sigaction sa;
        sigaction(SIGCHLD, NULL, &sa);
        sa.sa_handler = SIG_IGN;
        sigaction(SIGCHLD, &sa, NULL);
    
        if (fork() == 0)
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            _exit(0);
        }   
        else
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            while (1) 
                sleep(10);
        }   
        
        return 0;
    }
    This code should produce the following output.
    $./zombie 
    4458
    4459
    
    $ps aux | grep -E '(4459|4458)'
    niam      4458  0.0  0.0   1496   340 pts/2    S+   12:45   0:00 ./zombie
    Note that POSIX.1-1990 disallowed setting the action for SIGCHLD to SIG_IGN. POSIX.1-2001 allows this possibility, so that ignoring SIGCHLD can be used to prevent the creation of zombies
According to the linux-2.6.27 sources setting signal handler to SIG_IGN might give a small benefit in performance. Here is a piece of code from kernel/exit.c
static int ignoring_children(struct task_struct *parent)     
{                               
    int ret;                                                        
    struct sighand_struct *psig = parent->sighand;     
    unsigned long flags;        
    spin_lock_irqsave(&psig->siglock, flags);            
    ret = (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||     
           (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT));     
    spin_unlock_irqrestore(&psig->siglock, flags);     
    return ret;                
}
The kernel checks signal handler first.

There is other problem when the software wasn't developed by you but it produces zombies during the execution. There is a trick with gdb to kill process' zombies. You can attach to the parent process and manually call wait.
$./zombie 
4980
4981

$ps aux | grep -E '(4980|4981)'
niam      4980  0.0  0.0   1496   336 pts/2    S+   13:19   0:00 ./zombie
niam      4981  0.0  0.0      0     0 pts/2    Z+   13:19   0:00 [zombie] 

$gdb -p 4980
....
(gdb) call wait()
$1 = 4981

$ps aux | grep -E '(4980|4981)'
niam      4980  0.0  0.0   1496   336 pts/2    S+   13:19   0:00 ./zombie

Sunday, November 30, 2008

FreeBSD: sem_open bug

Recently I've been testing libdodo on FreeBSD 7.0
Any time sem_open was called I received 'bad system call' and abort signal from ksem_open routine.

sem_open is buggy in FreeBSD, man 3 sem_open:

BUGS
     This implementation places strict requirements on the value of name: it
     must begin with a slash (`/'), contain no other slash characters, and be
     less than 14 characters in length not including the terminating null
     character.
Anyway I was giving sem_open 14 bytes long key with leading '/' and it continued to fail.
Program received signal SIGSYS, Bad system call.
[Switching to Thread 0x28f01100 (LWP 100056)]
0x2891c84b in ksem_open () from /lib/libc.so.7
(gdb) bt
#0  0x2891c84b in ksem_open () from /lib/libc.so.7
#1  0x2891209c in sem_open () from /lib/libc.so.7
#2  0x28767386 in single (this=0x804b520, value=1, a_key=@0x804b4dc) at src/pcSyncProcessDataSingle.cc:58
#3  0x0804969c in __static_initialization_and_destruction_0 (__initialize_p=Variable "__initialize_p" is not available.
) at test.cc:24
#4  0x0804aaf5 in __do_global_ctors_aux ()
#5  0x080491ed in _init ()
#6  0x00000000 in ?? ()
#7  0x00000000 in ?? ()
#8  0xbfbfecbc in ?? ()
#9  0x080495a6 in _start ()
#10 0x00000001 in ?? ()
(gdb) up
#1  0x2891209c in sem_open () from /lib/libc.so.7
(gdb) 
#2  0x28767386 in single (this=0x804b520, value=1, a_key=@0x804b4dc) at src/pcSyncProcessDataSingle.cc:58
58  semaphore = sem_open(key.c_str(), O_CREAT, S_IWUSR, value);
(gdb) p key
$1 = {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x28f4a24c "/c5b80cc1e0"}}
(gdb) p value
$2 = 1
The bug is somewhere in the kernel. I should look for another implementation of semaphores in FreeBSD.

Thursday, November 27, 2008

gmake: command echoing

By default gmake prints each command line before it is executed.
I knew the method to suppress this output but because I haven't bothered about it for some time I actually forgot how to do that.
Recently I wanted to make build output of libdodo pretty and tried to recall how to tell gmake not to print command lines.
gmake manual has a chapter that describes what should be done to achieve this.
'@' sign before the command tells gmake to suppress the command line echoing:

@echo "-- Compiling: $^"
@$(CXX) $(DEFINES) $(CPPFLAGS) $(CFLAGS) -fPIC -c $^
Long g++ command line won't appear on the output, instead you'll see
-- Compiling: 'the source file'.cc

Tuesday, November 25, 2008

c++: virtual constructor

Recently I was asked about an virtual constructor in c++.
The conception of virtual constructor makes me confused.
First of all on the construction stage object doesn't have v-table. The construction process goes from the base class to the current. So current object doesn't know anything about the derived classes on the constructing stage. Even if class could somehow know(if the v-table was available in constructor) it's nonsense to call functions from the derived objects that haven't been constructed. It may cause undefined behavior if those virtual functions work the the data from their own class.
Specially for me, constructor is the place where the object is being initialized - the class members are initialized by given or default values, some actions to prepare object for the further work. There shouldn't be any other calls that can be resolved via v-table(to be concrete - virtual functions). If there are some, and you expect them to be called in the constructor, they probably should be called in a special 'init' virtual method defined by user and exposed in the documentation that it should be called just after the constructor and it should be redefined in derived classes. And only when the object construction is complete the 'init' method should be called.
There are some idioms of virtual constructor you can find in the Internet which suggest to create special virtual method 'construct' that return new copy of the object using 'new' to construct it. It may cause some troubles with memory leaks but to avoid this the smart pointers can be used.
Anyway this idioms hide the initial meaning of the constructor - to construct the object.

Where 'virtual' constructor can be used?
If, say, you have class A, which works with local data. On construction stage you want to connect to the local storage. Later you want to define class B that works with the remote storage. You are trying to connect to the remote storage in the constructor. You have realized, that can do connection to the storage in 'connect' method that is virtual. When you call 'connect' in the constructor you expect that in case of

A *a = new B;
B::connect will be called in the constructor. But how A knows how to perform connection to the remote storage if the connection metadata will be available in constructor of class B that will be called later?
With understanding of concepts of object construction it should be clear that constructors can not be virtual, or better to say they _should_not_ call functions as virtual.
It's much better to expose 'connect' method to make a connection rather hide it in constructor.
The worse can happen if 'connect' throws exceptions. It's not a good idea to throw exceptions in constructor. It's much harder to work with constructors that may throw exceptions.
Usually I don't expect that exception is going to be thrown in constructor, but if it was - the things are really bad and the program probably should finish here. But if I can't connect - well, probably I should wait some time and try to connect later. It's more expressive to put 'connect' into try...catch block than to put
A *a = new B;
there and try to reconstruct object each time the connection failed. This strategy even takes more memory/cpu resources.

Thursday, November 20, 2008

linux: linux-gate.so.1

I recently read very nice article by Johan Petersson about what is linux-gate.so.1 that is linked to all ELF binaries(that compiled to use shared libraries) on x86 in linux.
He mentioned that linux-gate.so.1 has always the same address in the executable.
This is rather dangerous, as described in "Exploiting with linux-gate.so.1" paper. You can exploit process via linux-gate.so.1 because it's address is always known. Moreover, it has the same address in all ELF files in the system. Determining the address of linux-gate.so.1 in any of ELF file on the machine and having exploit you are able to take control over almost every process in the system.
It's possible to manipulate vdso address or disable it completely with setting appropriate value to /proc/sys/vm/vdso_enabled:

0: no vdso at all
1: random free page(works only if /proc/sys/kernel/randomize_va_space set to 1)
2: top of the stack
Disabling it is a not good idea because the system even can become unusable. But putting it into random free page is good solution. It may break debugger and/or reduce performance a bit.

Tuesday, November 18, 2008

perl: default input and pattern-searching space

Perl has plenty of special variables.
The most usable is probably $_.
$_ stands for default input and pattern-searching space.
$_ implicitly assigned to input streams, subroutine variables, pattern-searching space(when used without an =~ operator).
$_ is a default iterator variable in a foreach loop if no other variable is supplied
The following block

while (<STDIN>)
{
    s/[A-Z]*//g;
    print;
}
is equivalent to
while ($_ = <STDIN>)
{
    $_ =~ s/[A-Z]*//g;
    print $_;
}
$_ is a global variable so this can produce some unwanted side-effects in some cases. The output of the following code
while (<STDIN>)
{
    print;
    last;
}
print;
{
    print;
    while (<STDIN>)
    {  
        s/[A-Z]*//g;
        print;
        last;
    }  
    print;
}
print;
should be
abcABC<<-- my input string
abcABC
abcABC
abcABC
abcABC<<-- my input string
abc
abc
abc
It's possible to declare $_ with my to be relative to the scope of the block(in perl 5.9.1 and later) and using our restores the global scope of the $_.
The output of the this code
while (<STDIN>)
{
    print;
    last;
}
print;
{
    print;
    my $_;
    while (<STDIN>)
    {  
        s/[A-Z]*//g;
        print;
        last;
    }  
    print;
}
print;
should be
abcABC<<-- my input string
abcABC
abcABC
abcABC
abcABC<<-- my input string
abc
abc
abcABC
and with our
while (<STDIN>)
{
    print;
    last;
}
print;
{
    print;
    my $_;
    while (<STDIN>)
    {  
        s/[A-Z]*//g;
        print;
        last;
    }  
    our $_;
    print;
}
print;
should be
abcABC<<-- my input string
abcABC
abcABC
abcABC
abcABC<<-- my input string
abc
abcABC
abcABC
Unfortunately perl 5.10 is not by default in most linux distribution and some workarounds should be done to achieve functionality of my and our with $_.