Tuesday, December 23, 2008

asm: overwriting return point of the function/stack overflow

Stack overflow is a common attack in programming world.
To understand how it could be done we should be aware about the function's stack and how the function is being executed and how it passes the execution of the code in the parent function after its call.
The stack of the function, at least in *nix, should look like

|function parameters| <--top of the stack(higher memory addresses)
|---return  point---| <--%esp
|--local variables--|
|-------------------| <--bottom of the stack(lower memory addresses)
Let's examine simple program written in asm. It has a function pc that puts giver character onto stdout and adds '\n'. In main this function is called with argument which value is '*'.
 1 .text
 2 
 3 pc:
 4     pushl %ebp
 5     movl %esp, %ebp
 6     
 7     subl $4, %esp /*4 bytes for local variables*/
 8     
 9     pushl 8(%ebp)/*get value of the function parameter*/
10     call putchar
11     pushl $0x0a /*new line*/
12     call putchar
13     addl $8, %esp/*allign stack*/
14     
15     movl %ebp, %esp
16     popl %ebp
17     ret
18     
19 
20 .global main
21 
22 main:
23     pushl $0x0000002a /*character '*'*/
24     call pc
25     addl $4, %esp/*allign stack*/
26     
27     movl $1, %eax
28     movl $0, %ebx
29     int $0x80/*exit(0)*/
I set a breakpoint on line 4 in gdb and got the information about the registers
(gdb) i r
eax            0xbfae0a34 -1079113164
ecx            0x312f6668 825189992
edx            0x1 1
ebx            0xb7fa4ff4 -1208332300
esp            0xbfae09a4 0xbfae09a4
ebp            0xbfae0a08 0xbfae0a08
esi            0xb7fe2ca0 -1208079200
edi            0x0 0
eip            0x8048384 0x8048384 <pc>
Address of %esp is 0xbfae09a4, so here is the top of the stack of our function pc.
In *nix world stack of the process grows from the higher memory addresses to the lower ones. So to get function parameter we should add 4 bytes to %esp(the size of return point is 4 bytes)[Note, on line 9 I pushed the address of %ebp + 8 because after 'pushl %ebp' value of %esp increased with 4 bytes.]
(gdb) x/c 0xbfae09a4 + 4
0xbfae09a8: 42 '*'
Yes, here we have '*' _because_ we indeed pushed it onto the stack on line 23. In %esp we can find the address of the return point
(gdb) x/x 0xbfae09a4 
0xbfae09a4: 0x080483a7
0x080483a7 is the address of the next instruction after the call of pc in main. Let's check.
Going through instruction in gdb I got out from pc
(gdb) n
pc () at fcall.s:17
17  ret
(gdb) n
main () at fcall.s:25
25  addl $4, %esp/*allign stack*/
(gdb) i r
eax            0xa 10
ecx            0xffffffff -1
edx            0xb7fa60b0 -1208328016
ebx            0xb7fa4ff4 -1208332300
esp            0xbfae09a8 0xbfae09a8
ebp            0xbfae0a08 0xbfae0a08
esi            0xb7fe2ca0 -1208079200
edi            0x0 0
eip            0x80483a7 0x80483a7 <main+7>
You can see that value of %eip is 0x80483a7, so we were right. To make a program run any other code rather than return to the parent function the address of the return point has to be overwritten.
The following code attempts to do so.
It has function evil which address will be written to the return point of the function pc. Function evil writes '%\n' on the output and calls exit syscall with exit code 1.
 1 .text
 2 
 3 evil:
 4     pushl %ebp
 5     movl %esp, %ebp
 6 
 7     pushl $0x00000025 /*character '%'*/
 8     call putchar
 9     pushl $0x0a /*new line*/
10     call putchar
11 
12     movl $1, %eax
13     movl $1, %ebx
14     int $0x80/*exit(1)*/
15     
16 pc: 
17     pushl %ebp
18     movl %esp, %ebp
19     
20     subl $4, %esp /*4 bytes for local variables*/
21 
22     pushl 8(%ebp)
23     call putchar
24     pushl $0x0a /*new line*/
25     call putchar
26     addl $8, %esp/*allign stack*/
27     
28     movl %ebp, %esp
29     popl %ebp
30 
31     movl $evil, (%esp)
32 
33     ret
34 
35 
36 .global main
37 
38 main:
39     pushl $0x0000002a /*character '*'*/
40     call pc
41     addl $4, %esp/*allign stack*/
42     
43     movl $1, %eax
44     movl $0, %ebx
45     int $0x80/*exit(0)*/
The result of the exucution of this program should be
$gcc fcall.s -o fcall -g
$./fcall 
*
%
$echo $?
1

Tuesday, December 16, 2008

c++: multidimensional arrays in the (dynamic) memory

I know some solutions how to store multidimensional arrays in the (dynamic) memory.
I'd like to share this knowledge because I noticed that not all of the developers understand what is going on in this field.
Let's look at different ways how to create 2-dimension array of objects of the class A which code is below

class A
{
    public:
        void * operator new(size_t size)
        {
            void *p = malloc(size);
            cout << "new, size: " << size << "\n";
            return p;
        }

        void * operator new[](size_t size)
        {
            void *p = malloc(size);
            cout << "new[], size: " << size << "\n";
            return p;
        }

        A() 
        {   
            cout << "A()\n";
            id = ++counter;
        }

        ~A()
        {   
            cout << "~A()\n";
        }   

        void call()
        {   
            cout << "id #" << id << ", " << counter << " times constructor of A was called\n";
        }

        static int counter;
        int id; 
};

int A::counter = 0;
I added some code for tracing operator new, constructor and destructor calls.
Each time the constructor is called value of class static variable counter is incremented by 1 and its new value is assigned to class member variable id.
  • The first method and the simplest.
    Simply to allocate 2x2 array of A on the stack.
    cout << "size of A: " << sizeof(A) << "\n";
    A z[2][2];
    z[1][1].call();
    (z[1]+1)->call();
    (*z+3)->call();
    This piece of code produces
    size of A: 4
    A()
    A()
    A()
    A()
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    ~A()
    ~A()
    ~A()
    ~A()
    4 times constructor was called, 4 times destructor, no calls of operator new.
  • The second, a bit more complex.
    Allocate memory for 2x2 array of A in the heap.
    cout << "size of A: " << sizeof(A) << "\n";
    A (*z)[2] = new A[2][2];
    z[1][1].call();
    (z[1]+1)->call();
    (*z+3)->call();
    delete [] z;
    The output should be
    size of A: 4
    new[], size: 20
    A()
    A()
    A()
    A()
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    ~A()
    ~A()
    ~A()
    ~A()
    
    4 times constructor was called, 4 times destructor, 1 call of operator new[] to allocate memory for all 4 objects.
  • The next method is used to allocate memory in the heap for one-dimension array of size 2 of pointers to A. Then allocate memory for one-dimension 'sub-arrays'.
    cout << "size of A: " << sizeof(A) << "\n";
    A **z = new A*[2];
    z[0] = new A[2];
    z[1] = new A[2];
    
    z[1][1].call();
    (z[1]+1)->call();
    (*z+3)->call();
    
    delete [] z[0];
    delete [] z[1];
    delete [] z;
    size of A: 4
    new[], size: 12
    A()
    A()
    new[], size: 12
    A()
    A()
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    ~A()
    ~A()
    ~A()
    ~A()
    2 times constructor was called after each call to operator new[] to allocate memory for 2 objects, 4 times destructor was called
  • This method is tricky a little bit. We allocate one-dimension array of size 4. Using pointer arithmetics we can simulate two-dimension array.
    cout << "size of A: " << sizeof(A) << "\n";
    A *z = new A[2*2];
    z[2+1].call();
    (z+3)->call();
    delete [] z;
    size of A: 4
    new[], size: 20
    A()
    A()
    A()
    A()
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    ~A()
    ~A()
    ~A()
    ~A()
    
  • 4 times constructor was called, 4 times destructor, 1 call of operator new[] to allocate memory for all 4 objects.
  • This one is a combination of storing 2x2 array in the heap and in the stack. At first one-dimension array of pointers to A is put onto the stack and later memory from heap is used to allocate one-dimension 'sub-arrays'.
    cout << "size of A: " << sizeof(A) << "\n";
    A *z[2];
    z[0] = new A[2];
    z[1] = new A[2];
    
    z[1][1].call();
    (z[1]+1)->call();
    (*z+3)->call();
    
    delete [] z[0];
    delete [] z[1];
    size of A: 4
    new[], size: 12
    A()
    A()
    new[], size: 12
    A()
    A()
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    id #4, 4 times constructor of A was called
    ~A()
    ~A()
    ~A()
    ~A()
    
    2 times constructor was called after each call to operator new[] to allocate memory for 2 objects, 4 times destructor was called
All methods have their '+'s and '-'s. One can take more time but require less memory and the other one can take more memory but could be executed faster. That depends how many calls have been done to allocate memory, where memory was taken to allocate an array, etc. Also you should remember c++ restriction for arrays on the stack that their size must be known during the compile time. The dark side of memory from the heap is that it should be explicitly released when it become unused. Some of them are more expressive for understanding some of them not.
This is upon you.

Thursday, December 11, 2008

autoconf: square brackets in AS_HELP_STRING

With autoconf(2.63) if I wanted to use square brackets for AS_HELP_STRING I didn't succeed. I have been trying to add extra [] around the helpstring according to manual:

Each time there can be a macro expansion, there is a quotation expansion, i.e., one level of quotes is stripped:
int tab[10];
     =>int tab10;
     [int tab[10];]
     =>int tab[10];

The solution I've found in configure.ac of qore programming language.
The quadrigraphs are used there.
'@<:@' for '[' and '@:>@' for ']' could be used in autoconf input file.
So now I have nice output of ./configure --help in my project:
....
  --with-mysql[=DIR]      use mysql
  --with-fcgi[=DIR]       use fast CGI
  --with-pcre[=DIR]       use pcre
....
The code in configure.in looks like:
....
AS_HELP_STRING([--with-sqlite@<:@=DIR@:>@], [use sqlite])...
....

Wednesday, December 10, 2008

emacs: the dark side of the force

Recently I've decided to try the dark side of the force - emacs.
I'm Vim user for a long time. Several times I wanted to try emacs but didn't have a good chance.
Now I'm working on project with huge amount of sources and I decided to try emacs for it.
It works! ;)

Playing with emacs I've found out that it's not so complex as some people say.

The thing to which I couldn't get used to for some time is that I don't have to press ESC to switch to the command mode, press i(INS) to switch to editor mode and so on.

I haven't found some Vim features(as visual mode) but I believe that just don't know how to make them work.

The main difference is that there is no distinct differences between editor mode and command mode. You are allowed to run commands while you are editing the text.

All commands(or better to say most of them) begin with control- or meta- character. control is usually Ctrl on your keyboard and meta is Alt.

For guys who want to try emacs here is the migration guide on vim-to-emacs commands.
The table of equivalence of vim and emacs commands.

split horizontalsplit vertical
VIMEMACSDescription
:qa/:wqa/:xa/:qa!C-x C-cexit(emacs prompts whether to save buffers or not)
hC-bleft
lC-fright
b/BM-bword backward
w/WM-fword forward
jC-ndown
kC-pup
0C-abeginning of the line
$C-eend of the line
gg/1GC-<beginning of the buffer
GC->end of the buffer
xC-ddelete under cursor
DC-kdelete from cursor to EOL
ddC-k C-kdelete line
dw/dWM-ddelete word
db/dBM-{BACKSPACE}delete word backwards
:set ignorecase {ENTER} /C-s {needle in lower case}icase search forward
:set ignorecase {ENTER} ?C-r {needle in lower case}icase search backward
/C-ssearch forward
?C-rsearch backward
:set ignorecase {ENTER} /M-C-s {regexp in lower case}icase regexp search forward
:set ignorecase {ENTER} ?M-C-r {regexp in lower case}icase regexp search backward
:%s/{needle}/{replacement}/gcM-% {needle} {ENTER} {replacement} {ENTER}query replace
/M-C-sregexp search forward
?M-C-rregexp search backward
uC-_/C-x uundo
C-RC-_/C-x uredo(it's tricky for emacs*)
ESCC-gquit the running/entered command(switch to command mode in Vim)
:e fileC-x C-fopen file
:set roC-x C-qset file as read-only
:wC-x C-ssave buffer
:w fileC-x C-w filesave buffer as ...
:waC-x ssave all buffers
:buffersC-x C-bshow buffers
:b [name]C-x b [name]switch to another buffer
:q/:q!/:wq/:xC-x kclose buffer
C-w n/:splitC-x 2
C-w v/:vsplitC-x 3
C-w C-wC-x oswitch to another window
:qC-x 0close window
C-w oC-x 1close other windows
:! {cmd}M-!run shell command
m{a-zA-Z}C-@/C-spaceset mark
C-x C-xexchange mark and position
{visual}yM-wcopy region**
{visual}dC-wdelete region**
pC-ypaste
C-V {key}C-q {key}insert special char, e.g. ^M:
{visual}SHIFT->C-x TABindent region
C-]M-.find tag
C-tM-*return to previous location
:e!M-x revert-bufferreload buffer from disk

*To redo changes you have undone, type `C-f' or any other command that will harmlessly break the
sequence of undoing, then type more undo commands
**region is from current position to the mark

Other useful emacs commands:
M-ggo to line
C-x iinsert file
C-x hmark whole buffer
C-x C-tswitch two lines
M-C-abeginning of the function
M-C-eend of the function
M-abeginning of the statement
M-eend of the statement
M-C-hmark the function
M-/autocompletion
M-C-\indent region
C-c C-qindent the whole function according to indention style
C-c C-ccomment out marked area
M-x uncomment-regionuncomment marked area
M-,jumps to the next occurence for tags-search
M-;insert comment in code
C-x w hhighlight the text by regexp
C-x w rdisable highlighting the text by regexp

To run emacs without X add -nw command line argument.

To run multiply commands 'C-u {number} {command}' or 'M-{digit} {command}'.

emacs has bookmarks that are close to Vim marks:
C-x r mset a bookmark at current cursor position
C-x r bjump to bookmark
C-x r llist bookmarks
M-x bookmark-write write all bookmarks in given file
M-x bookmark-loadload bookmarks from given file

My ~/.emacs looks like:
(setq load-path (cons "~/.emacs.d" load-path))

(auto-compression-mode t) ; uncompress files before displaying them

(global-font-lock-mode t) ; use colors to highlight commands, etc.
(setq font-lock-maximum-decoration t)
(custom-set-faces)
(transient-mark-mode t) ; highlight the region whenever it is active
(show-paren-mode t) ; highlight parent brace
(global-hi-lock-mode t) ; highlight region by regexp

(column-number-mode t) ; column-number in the mode line

(setq make-backup-files nil)

(setq scroll-conservatively most-positive-fixnum) ; scroll only one line when I move past the bottom of the screen

(add-hook 'text-mode-hook 'turn-on-auto-fill) ; break lines at space when they are too long

(fset 'yes-or-no-p 'y-or-n-p) ; make the y or n suffice for a yes or no question

(setq comment-style 'indent)

(global-set-key (kbd "C-x C-b") 'buffer-menu) ; buffers menu in the same window

(global-set-key (kbd "C-x 3") 'split-window-horizontally-other) ; open new window horisontally and switch to it
(defun split-window-horizontally-other ()
        (interactive)
        (split-window-horizontally)
        (other-window 1)
)

(global-set-key (kbd "C-x 2") 'split-window-vertically-other) ; open new window vertically and switch to it
(defun split-window-vertically-other ()
 (interactive)
 (split-window-vertically)
 (other-window 1)
)

(global-set-key (kbd "C-c c") 'comment-region) ; comment code block
(global-set-key (kbd "C-c u") 'uncomment-region) ; uncomment code block

(global-set-key (kbd "C-x TAB") 'tab-indent-region) ; indent region
(global-set-key (kbd "C-x <backtab>") 'unindent-region) ; unindent region
(defun tab-indent-region ()
    (interactive)
 (setq fill-prefix "\t")
    (indent-region (region-beginning) (region-end) 4)
)
(defun unindent-region ()
    (interactive)
    (indent-region (region-beginning) (region-end) -1)
)

(global-set-key (kbd "TAB") 'self-insert-command)
(global-set-key (kbd "RET") 'newline-and-indent)

(setq key-translation-map (make-sparse-keymap))
(define-key key-translation-map "\177" (kbd "C-="))
(define-key key-translation-map (kbd "C-=") "\177")
(global-set-key "\177" 'delete-backward-char)
(global-set-key (kbd "C-=") 'delete-backward-char)

(setq indent-tabs-mode t)
(setq tab-always-indent t)
(setq default-tab-width 4)

(setq inhibit-startup-message t) ; do not show startup message

(iswitchb-mode t)
(desktop-save-mode t)

(display-time)

Happy emacsing!

Tuesday, December 9, 2008

perl: manipulations with standart output stream

Builtin functions print and write put data to stdout if the filehandle is omitted.
This is really handy.
But since you want to write to some other stream each time you have to specify filehandle for them.
It's possible to point STDOUT to some other filehandle by reassigning it

open $nfh, '>test';
$ofh = *STDOUT;
*STDOUT = *$nfh;

print "test";

*STDOUT = $ofh;
close $nfh;
or by using select function
open $nfh, '>test';
$ofh = select $nfh;

print "test";

select $ofh;
close $nfh;
and still use print and write w/o specifying the filehandle. The second method with select looks better for me.
If you use a lot of ways for output the LIFO(Last-In, First-Out) of filehandles can be used to implement an easy way to walk through them like pushd/popd in bash
#!/usr/bin/perl
@fh = ();

open my $nfh, '>test-0';
push @fh, select $nfh;

print "test\n";

open my $nfh, '>test-1';
push @fh, select $nfh;

print "test\n";

close select pop @fh;
print "test\n";
close select pop @fh;

print "test\n";
The result of executing this script
$./test.pl 
test
$cat test-*
test
test
test
Note, I used my with filehandle in open to give it an undefined scalar variable. Otherwise open will associate the stream with this variable. And the old stream will be lost. The following code does the same work w/o using my expression
#!/usr/bin/perl
@fh = ();

open $nfh, '>test-0';
push @fh, select $nfh;

print "test\n";

$nfh = undef;
open $nfh, '>test-1';
push @fh, select $nfh;

print "test\n";

close select pop @fh;
print "test\n";
close select pop @fh;

print "test\n";
The result of executing this script
$./test.pl 
test
$cat test-*
test
test
test

Thursday, December 4, 2008

intel VT: how to enable

My Intel(R) Core(TM)2 Duo CPU supports intel VT but it's disabled by default in BIOS.

Currently I'm hardly testing libdodo in different environments. As I'm running IA32 kernel I was testing it in x86 environments. Lately I decided to test it in x86_64 with VMWare. I went to BIOS, turned on VT support, rebooted ... and VMWare player told me that my platform supports VT but it's disabled. What the heck? I checked that the option is enabled in BIOS several times but still couldn't run x86_64 guest OS.

Googling didn't help. Running guest OS under qemu x86_64 simulation was the last option. It's extremely slow.

When I almost gave up I reached a very interesting article.
The author is telling that after the VT option was enabled in BIOS, the processor must be plugged off from the electricity circuit.
I turned off my laptop removed power supply and battery, waited for a few seconds, plugged the stuff back, booted and started VMWare with x86_64 guest. BINGO.

Now I'm able to run x86_64 guests on x86 host.

Tuesday, December 2, 2008

linux: zombies

New processes are being created with fork system call. After the fork is completed there are two processes available - parent and child.

Zombie is a process that was finished before the parent and the parent hadn't made any attempts to reap it.

When the child process was stopped or terminated the kernel still holds some minimal information about it in case the parent process will attempt to get it. This information is a task_struct where pid, exit code, resource usage information, etc. could be found. All other resources(memory, file descriptors, etc.) should be released. For details you can dig into the code of the do_exit function in kernel/exit.c in the linux kernel sources.

The parent should be notified about the child's death with SIGCHLD signal.

When the parent receives SIGCHLD it can get the information about the child by calling wait/waitpid/... system call. In the interval between the child terminating and the parent calling wait, the child is said to be a zombie. Even it's can't be in running or idle states it still takes place in the process table.
As soon as parent process receives SIGCHLD signal and it's ready to get information about the dead child this information is passed to it and all the information about the child is being removed from the kernel. In wait_task_zombie function(kernel/exit.c) you should find the details.

In fact the memory that is hold by a zombie is really small but it still in the process table and processed by the scheduler and also as the process table has a fixed number of entries it is possible for the system to run out of them.

When the parent terminates without waiting for the child zombie process is adopted by 'init' process which calls wait to clean up after it.

Let's look at the common zombie and its code.

#include <unistd.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    if (fork() == 0)
    {   
        printf("%d\n", getpid());
        fflush(stdout);
        _exit(0);
    }   
    else
    {   
        printf("%d\n", getpid());
        fflush(stdout);
        while (1) 
            sleep(10);
    }   
    
    return 0;
}
The output should be something like
$./zombie 
4090
4091
Grepping ps's output I got
$ps aux | grep -E "(4090|4091)"
niam      4090  0.0  0.0   1496   340 pts/2    S+   12:24   0:00 ./zombie
niam      4091  0.0  0.0      0     0 pts/2    Z+   12:24   0:00 [zombie] 
You can see that child process became a zombie.

Knowing that zombies if are not a complete evil but are very close to it, there existence should be prevented.

There are some possibilities to do that.

First of all if you want to care why the child was finished you should call wait. This is an only way I know to do that.

I know two modes of wait: blocking and non-blocking. Both methods are listed below.
  • Blocking method that will suspend parent until the SIGCHLD is received.
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            _exit(0);
        }   
        else
        {   
            int status;
            wait(&status);
            printf("%d\n", getpid());
            fflush(stdout);
            while (1) 
                sleep(10);
        }   
        
        return 0;
    }
    And the resulting output for this code was
    $./zombie 
    4949
    4950
    
    $ps aux | grep -E '(4949|4950)'
    niam      4949  0.0  0.0   1496   340 pts/2    S+   13:12   0:00 ./zombie
  • Non-blocking which won't put the parent into the sleep state but requires multiply calls of waitpid.
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    #include <sys/wait.h>
    
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            _exit(0);
        }   
        else
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            int status;
            while (1) 
            {
                waitpid(-1, &status, WNOHANG);
                sleep(10);
            }
        }   
        
        return 0;
    }
    The output was
    $./zombie 
    4932
    4931
    
    $ps aux | grep -E '(4931|4932)'
    niam      4931  0.0  0.0   1496   336 pts/2    S+   13:07   0:00 ./zombie

Another approach is to disregard child's exit status and detach it.
  • Redefine SIGCHLD signal handler to specify SA_NOCLDSTOP flag for it.
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    
    int main(int argc, char **argv)
    {
        struct sigaction sa;
        sigaction(SIGCHLD, NULL, &sa);
        sa.sa_flags |= SA_NOCLDWAIT;//(since POSIX.1-2001 and Linux 2.6 and later)
        sigaction(SIGCHLD, &sa, NULL);
    
        if (fork() == 0)
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            _exit(0);
        }   
        else
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            while (1) 
                sleep(10);
        }   
        
        return 0;
    }
    The output should be something like
    $./zombie 
    4416
    4417
    
    $ps aux | grep -E '(4416|4417)'
    niam      4416  0.0  0.0   1496   340 pts/2    S+   12:41   0:00 ./zombie
  • Set SIGCHLD signal handler to SIG_IGN(ignore this signal).
    #include <unistd.h>
    #include <stdio.h>
    #include <signal.h>
    
    int main(int argc, char **argv)
    {
        struct sigaction sa;
        sigaction(SIGCHLD, NULL, &sa);
        sa.sa_handler = SIG_IGN;
        sigaction(SIGCHLD, &sa, NULL);
    
        if (fork() == 0)
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            _exit(0);
        }   
        else
        {   
            printf("%d\n", getpid());
            fflush(stdout);
            while (1) 
                sleep(10);
        }   
        
        return 0;
    }
    This code should produce the following output.
    $./zombie 
    4458
    4459
    
    $ps aux | grep -E '(4459|4458)'
    niam      4458  0.0  0.0   1496   340 pts/2    S+   12:45   0:00 ./zombie
    Note that POSIX.1-1990 disallowed setting the action for SIGCHLD to SIG_IGN. POSIX.1-2001 allows this possibility, so that ignoring SIGCHLD can be used to prevent the creation of zombies
According to the linux-2.6.27 sources setting signal handler to SIG_IGN might give a small benefit in performance. Here is a piece of code from kernel/exit.c
static int ignoring_children(struct task_struct *parent)     
{                               
    int ret;                                                        
    struct sighand_struct *psig = parent->sighand;     
    unsigned long flags;        
    spin_lock_irqsave(&psig->siglock, flags);            
    ret = (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN ||     
           (psig->action[SIGCHLD-1].sa.sa_flags & SA_NOCLDWAIT));     
    spin_unlock_irqrestore(&psig->siglock, flags);     
    return ret;                
}
The kernel checks signal handler first.

There is other problem when the software wasn't developed by you but it produces zombies during the execution. There is a trick with gdb to kill process' zombies. You can attach to the parent process and manually call wait.
$./zombie 
4980
4981

$ps aux | grep -E '(4980|4981)'
niam      4980  0.0  0.0   1496   336 pts/2    S+   13:19   0:00 ./zombie
niam      4981  0.0  0.0      0     0 pts/2    Z+   13:19   0:00 [zombie] 

$gdb -p 4980
....
(gdb) call wait()
$1 = 4981

$ps aux | grep -E '(4980|4981)'
niam      4980  0.0  0.0   1496   336 pts/2    S+   13:19   0:00 ./zombie

Sunday, November 30, 2008

FreeBSD: sem_open bug

Recently I've been testing libdodo on FreeBSD 7.0
Any time sem_open was called I received 'bad system call' and abort signal from ksem_open routine.

sem_open is buggy in FreeBSD, man 3 sem_open:

BUGS
     This implementation places strict requirements on the value of name: it
     must begin with a slash (`/'), contain no other slash characters, and be
     less than 14 characters in length not including the terminating null
     character.
Anyway I was giving sem_open 14 bytes long key with leading '/' and it continued to fail.
Program received signal SIGSYS, Bad system call.
[Switching to Thread 0x28f01100 (LWP 100056)]
0x2891c84b in ksem_open () from /lib/libc.so.7
(gdb) bt
#0  0x2891c84b in ksem_open () from /lib/libc.so.7
#1  0x2891209c in sem_open () from /lib/libc.so.7
#2  0x28767386 in single (this=0x804b520, value=1, a_key=@0x804b4dc) at src/pcSyncProcessDataSingle.cc:58
#3  0x0804969c in __static_initialization_and_destruction_0 (__initialize_p=Variable "__initialize_p" is not available.
) at test.cc:24
#4  0x0804aaf5 in __do_global_ctors_aux ()
#5  0x080491ed in _init ()
#6  0x00000000 in ?? ()
#7  0x00000000 in ?? ()
#8  0xbfbfecbc in ?? ()
#9  0x080495a6 in _start ()
#10 0x00000001 in ?? ()
(gdb) up
#1  0x2891209c in sem_open () from /lib/libc.so.7
(gdb) 
#2  0x28767386 in single (this=0x804b520, value=1, a_key=@0x804b4dc) at src/pcSyncProcessDataSingle.cc:58
58  semaphore = sem_open(key.c_str(), O_CREAT, S_IWUSR, value);
(gdb) p key
$1 = {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x28f4a24c "/c5b80cc1e0"}}
(gdb) p value
$2 = 1
The bug is somewhere in the kernel. I should look for another implementation of semaphores in FreeBSD.

Thursday, November 27, 2008

gmake: command echoing

By default gmake prints each command line before it is executed.
I knew the method to suppress this output but because I haven't bothered about it for some time I actually forgot how to do that.
Recently I wanted to make build output of libdodo pretty and tried to recall how to tell gmake not to print command lines.
gmake manual has a chapter that describes what should be done to achieve this.
'@' sign before the command tells gmake to suppress the command line echoing:

@echo "-- Compiling: $^"
@$(CXX) $(DEFINES) $(CPPFLAGS) $(CFLAGS) -fPIC -c $^
Long g++ command line won't appear on the output, instead you'll see
-- Compiling: 'the source file'.cc

Tuesday, November 25, 2008

c++: virtual constructor

Recently I was asked about an virtual constructor in c++.
The conception of virtual constructor makes me confused.
First of all on the construction stage object doesn't have v-table. The construction process goes from the base class to the current. So current object doesn't know anything about the derived classes on the constructing stage. Even if class could somehow know(if the v-table was available in constructor) it's nonsense to call functions from the derived objects that haven't been constructed. It may cause undefined behavior if those virtual functions work the the data from their own class.
Specially for me, constructor is the place where the object is being initialized - the class members are initialized by given or default values, some actions to prepare object for the further work. There shouldn't be any other calls that can be resolved via v-table(to be concrete - virtual functions). If there are some, and you expect them to be called in the constructor, they probably should be called in a special 'init' virtual method defined by user and exposed in the documentation that it should be called just after the constructor and it should be redefined in derived classes. And only when the object construction is complete the 'init' method should be called.
There are some idioms of virtual constructor you can find in the Internet which suggest to create special virtual method 'construct' that return new copy of the object using 'new' to construct it. It may cause some troubles with memory leaks but to avoid this the smart pointers can be used.
Anyway this idioms hide the initial meaning of the constructor - to construct the object.

Where 'virtual' constructor can be used?
If, say, you have class A, which works with local data. On construction stage you want to connect to the local storage. Later you want to define class B that works with the remote storage. You are trying to connect to the remote storage in the constructor. You have realized, that can do connection to the storage in 'connect' method that is virtual. When you call 'connect' in the constructor you expect that in case of

A *a = new B;
B::connect will be called in the constructor. But how A knows how to perform connection to the remote storage if the connection metadata will be available in constructor of class B that will be called later?
With understanding of concepts of object construction it should be clear that constructors can not be virtual, or better to say they _should_not_ call functions as virtual.
It's much better to expose 'connect' method to make a connection rather hide it in constructor.
The worse can happen if 'connect' throws exceptions. It's not a good idea to throw exceptions in constructor. It's much harder to work with constructors that may throw exceptions.
Usually I don't expect that exception is going to be thrown in constructor, but if it was - the things are really bad and the program probably should finish here. But if I can't connect - well, probably I should wait some time and try to connect later. It's more expressive to put 'connect' into try...catch block than to put
A *a = new B;
there and try to reconstruct object each time the connection failed. This strategy even takes more memory/cpu resources.

Thursday, November 20, 2008

linux: linux-gate.so.1

I recently read very nice article by Johan Petersson about what is linux-gate.so.1 that is linked to all ELF binaries(that compiled to use shared libraries) on x86 in linux.
He mentioned that linux-gate.so.1 has always the same address in the executable.
This is rather dangerous, as described in "Exploiting with linux-gate.so.1" paper. You can exploit process via linux-gate.so.1 because it's address is always known. Moreover, it has the same address in all ELF files in the system. Determining the address of linux-gate.so.1 in any of ELF file on the machine and having exploit you are able to take control over almost every process in the system.
It's possible to manipulate vdso address or disable it completely with setting appropriate value to /proc/sys/vm/vdso_enabled:

0: no vdso at all
1: random free page(works only if /proc/sys/kernel/randomize_va_space set to 1)
2: top of the stack
Disabling it is a not good idea because the system even can become unusable. But putting it into random free page is good solution. It may break debugger and/or reduce performance a bit.

Tuesday, November 18, 2008

perl: default input and pattern-searching space

Perl has plenty of special variables.
The most usable is probably $_.
$_ stands for default input and pattern-searching space.
$_ implicitly assigned to input streams, subroutine variables, pattern-searching space(when used without an =~ operator).
$_ is a default iterator variable in a foreach loop if no other variable is supplied
The following block

while (<STDIN>)
{
    s/[A-Z]*//g;
    print;
}
is equivalent to
while ($_ = <STDIN>)
{
    $_ =~ s/[A-Z]*//g;
    print $_;
}
$_ is a global variable so this can produce some unwanted side-effects in some cases. The output of the following code
while (<STDIN>)
{
    print;
    last;
}
print;
{
    print;
    while (<STDIN>)
    {  
        s/[A-Z]*//g;
        print;
        last;
    }  
    print;
}
print;
should be
abcABC<<-- my input string
abcABC
abcABC
abcABC
abcABC<<-- my input string
abc
abc
abc
It's possible to declare $_ with my to be relative to the scope of the block(in perl 5.9.1 and later) and using our restores the global scope of the $_.
The output of the this code
while (<STDIN>)
{
    print;
    last;
}
print;
{
    print;
    my $_;
    while (<STDIN>)
    {  
        s/[A-Z]*//g;
        print;
        last;
    }  
    print;
}
print;
should be
abcABC<<-- my input string
abcABC
abcABC
abcABC
abcABC<<-- my input string
abc
abc
abcABC
and with our
while (<STDIN>)
{
    print;
    last;
}
print;
{
    print;
    my $_;
    while (<STDIN>)
    {  
        s/[A-Z]*//g;
        print;
        last;
    }  
    our $_;
    print;
}
print;
should be
abcABC<<-- my input string
abcABC
abcABC
abcABC
abcABC<<-- my input string
abc
abcABC
abcABC
Unfortunately perl 5.10 is not by default in most linux distribution and some workarounds should be done to achieve functionality of my and our with $_.

Friday, November 14, 2008

c++: name lookup changes in g++ 4.3

Recently I've failed to compile my code with g++ 4.3. The error message was something like this:

test.cc:2: error: changes meaning of ‘A’ from ‘class A’
I've created a test case to discover a problem:
class A {};

class B
{
    void foo(const A &a){}
    void A(){}
};
If you try to compile this code with g++ 4.3 you will definitely get
(~~) g++ test.cc -c
test.cc:7: error: declaration of ‘void B::A()’
test.cc:2: error: changes meaning of ‘A’ from ‘class A’
What's going on here?
Method A from class B changes the meaning of the type of the parameter in method foo.
gcc 4.3 now errors out on certain situations in c++ where a given name may refer to more than one type or function. Here name A refer to function B::A and class A.

The reason that this isn't allowed in c++ is because if in the definition of B we write A(), it is ambiguous whether we want to instantiate an object of type class A or call this->A().
It's possible to fix such code in two ways.
To rename one of the names. It's not always possible if you really want these names to stay. There not a lot of synonyms :).
The second one is more technical: to move one of the names such that it is not in the scope:
class A {};

class B
{
    void foo(const ::A &a){}
    void A(){}
};
This code would be compiled w/o any errors from gcc side.

In release notes to gcc 4.3 it's additionally mentioned that -fpermissive option can be used as a temporary workaround to convert the error into a warning until the code is fixed.

I wonder how gcc made a deal with this in previous versions. The next code won't compile with g++ 4.3 but you will succeed with -fpermissive option. Let's see how g++ deal with this ambiguous situation.
#include <iostream>

using namespace std;

class A
{
    public:
        A()
        {  
            std::cout << "A::A" << std::endl;
        }  
        void operator ()()
        {  
            std::cout << "A::operator ()" << std::endl;
        }  
};

class B
{
    public:
        void operator ()()
        {  
            std::cout << "B::operator ()" << std::endl;
        }  
        void foo(A a) 
        {  
        }  
        B A()
        {  
            std::cout << "B::A" << std::endl;

            return B();
        }  
        void bar()
        {  
            ::A()();
            A()();
        }  
};

int main(int argc, char **argv)
{
    B b; 
    b.bar();

    return 0; 
}
(~~) g++ test.cc -fpermissive -o test
test.cc:34: warning: declaration of ‘B B::A()’
test.cc:7: warning: changes meaning of ‘A’ from ‘class A’
(~~) ./test 
A::A
A::operator ()
B::A
B::operator ()
You can see that function B::A will be used in the meaning of current scope, we have to use scope resolution operator to get class A constructed.
To be honest I always expected such behavior, not sure why this should be an ambiguous.
If we use name A, we use the most close to the current scope, to use name from the other namespace we use namespace resolution operator.

Thursday, November 13, 2008

bash: defend yourself from overwriting files

I suppose almost everybody put '>' instead of '>>' to redirect the output to file.
In bash it's possible to set noclobber option to avoid file overwriting.

$touch test
$set -o noclobber
$echo test > test
bash: test: cannot overwrite existing file
If you really know what you are doing you you can use '>|' to overwrite the file successfully.
$set -o | grep noclobber
noclobber       on
$echo test >| test
$cat test
test
Very useful option and I think it should be on by default.

Tuesday, November 11, 2008

c: variable length arrays

Another feature of c99 is variable length arrays.
Before c99 array size had to be declared during compile time. Now array is an array of automatic storage duration whose length is determined at run time.

int size = strlen(*argv);
char array[size];

The variable-sized arrays can be used only in stack scope. The memory for this type of arrays is gotten from the stack, so in file(global) scope you still unable to define variable-sized arrays.

This is not the same as alloca.
Variable size arrays' space is freed at the end of the scope of the name of the array while the space allocated with alloca remains until the end of the function.
Use alloca within a loop it's possible to allocate an additional block on each iteration. This is impossible with variable-sized arrays.

c++ doesn't have this feater but g++ supports it as an extension.

Sunday, November 2, 2008

c++: virtual inheritance

Virtual inheritance is an important thing when we are talking about multiply inheritance.
Basically you can find term 'virtual base class' which means that the base class that is met in inheritance tree is shared between derived classes.
Let's look on the inheritance tree w/o virtual base class

class A
{
};
class B : public A
{
};
class D : public A
{
};
class E : public B, public D
{
};
Class E will have 2 copies of A(derived from B and D). To be more concrete let's look at the output of -fdump-class-hierarchy g++ option
Class A
   size=1 align=1
   base size=0 base align=1
A (0xb7f32680) 0 empty

Class B
   size=1 align=1
   base size=1 base align=1
B (0xb7f326c0) 0 empty
  A (0xb7f32700) 0 empty

Class D
   size=1 align=1
   base size=1 base align=1
D (0xb7f32740) 0 empty
  A (0xb7f32780) 0 empty

Class E
   size=2 align=1
   base size=2 base align=1
E (0xb7f327c0) 0 empty
  B (0xb7f32800) 0 empty
    A (0xb7f32840) 0 empty
  D (0xb7f32880) 1 empty
    A (0xb7f328c0) 1 empty
Indeed, the most obvious is the overhead: E contains 2 instances of A(by addresses 0xb7f32840 and 0xb7f328c0).
The other thing you are unable to call methods of A from E directly. There is no distinct path from E to A. The following code wouldn't be compiled. The compiler will raise an error that reference to methodA is ambiguous.
class A
{
    public:
        virtual void methodA(){}
};
class B : public A
{
};
class D : public A
{
};
class E : public B, public D
{
    virtual void methodE(){ methodA(); }
};
In this case you should explicitly call methodA either from B or D
virtual void methodE(){ B::methodA(); D::methodA(); }
Also you can face a problem with A as a base of E
A *a = new E;//‘A’ is an ambiguous base of ‘E’
You say you can do smth like this
A *a; 
    E *e = new E;
    void *v = (void *)e;
    a = (A *)v;
No chance to expect defined behavior with this piece of c-ish code.

Now let's look how things change w/ virtual base class.
class A
{
};
class B : virtual public A
{
};
class D : virtual public A
{
};
class E : public B, public D
{
};
Class A was defined as a virtual base class in the code above. Let's look what g++ says
Class A
   size=1 align=1
   base size=0 base align=1
A (0xb7f7a680) 0 empty

Class B
   size=4 align=4
   base size=4 base align=4
B (0xb7f7a6c0) 0 nearly-empty
    vptridx=0u vptr=((& B::_ZTV1B) + 12u)
  A (0xb7f7a700) 0 empty virtual
      vbaseoffset=-0x00000000c

Class D
   size=4 align=4
   base size=4 base align=4
D (0xb7f7a7c0) 0 nearly-empty
    vptridx=0u vptr=((& D::_ZTV1D) + 12u)
  A (0xb7f7a800) 0 empty virtual
      vbaseoffset=-0x00000000c

Class E
   size=8 align=4
   base size=8 base align=4
E (0xb7f7a880) 0
    vptridx=0u vptr=((& E::_ZTV1E) + 12u)
  B (0xb7f7a8c0) 0 nearly-empty
      primary-for E (0xb7f7a880)
      subvttidx=4u
    A (0xb7f7a900) 0 empty virtual
        vbaseoffset=-0x00000000c
  D (0xb7f7a940) 4 nearly-empty
      subvttidx=8u vptridx=12u vptr=((& E::_ZTV1E) + 24u)
    A (0xb7f7a900) alternative-path
Class E has one instance of A(by address 0xb7f7a900). We got rid of overhead. In the output you can see that there is only one path from A to E. There is no problem with compiling the following code
class E : public B, public D
{
    virtual void methodE(){ methodA(); }
};
And A can be used as a base class for E
A *a = new E;
With virtual inheritance we achieve better object model but we can loose some performance in order to run-time resolving paths to base from derived and from base to derived classes through v-table. With small classes we can even get overhead if v-table is pretty big.

The other thing you should be aware that c-style casting between derived and base classes(both ways) may break your program. Use dynamic_cast instead. That is because with the v-table classes not more of POD(Plain Old Data) types. c-style casts won't work correctly with non-POD types.

Wednesday, October 29, 2008

c++: specialization of template function

There some differences between template classes and template functions.
In c++ it isn't possible to overload classes but it's possible to overload functions.

Since you are able to overload function you should consider what actually you are doing.
Lets glance at the code below

template<typename T>
void func(T t) 
{
}

template<>
void func<int *>(int *t)
{
}

template<typename T>
void func(T *t)
{
}
Here is template function, specialization of template function and overloaded template function(it's still primary template).
What function will be called in case of
int *i = new int(10);
func(i);
Let's see. We have 2 primary template and 1 template specialization. The standard(13.3.2 Function Template Overloading) says that non-template functions have the highest priority(we don't have any here). Then if there no non-template functions(or none of them suitable) primary-template functions are being examined. There is no defined behavior which would be chosen among the available, probably compiler will ask developer to explicitly specify the template function(we have 2 primary template functions). If there are some specialization of template function compiler can make a decision w/o developer which is the most suitable in case when there is no primary template function that fits better.
In the code above primary template function will be called
template<> void func<int *>(int *t)
Is it possible to call specialized template function for the code below?
int *i = new int(10); func(i);
The answer is yes and the thing that should be done is specialization of overloaded template function
template<>
void
func<int>(int *t)
{
}
This is the specialization of the
template<typename T>
void func(T *t)
{
}
The specialization should appear _after_ the primary template in the code, so the complete piece of code is
template<typename T>
void func(T t) 
{
}

template<>
void func<int *>(int *t)
{
}

template<typename T>
void func(T *t)
{
}

template<>
void func<int>(int *t)
{
}
With this code
template<> void func<int>(int *t)
will be called for
int *i = new int(10);func(i);
The key is that the specialization is not overloading, to be more clear specialization _do_not_ overload functions. Specialization of template function is being examined only after the primary template has been chosen.
The better way I see here is to create non-template function
void func(int *t)
{
}
This one will be chosen first if there is no ambiguity in the prototypes.

Tuesday, October 28, 2008

c: designated initializers for structures

c99 came with a feature of designated initializers.
A designated initializer, or designator, points out a particular element to be initialized. A designator list is a comma-separated list of one or more designators.

struct
{
    int a; 
    int b; 
    int c; 
} s = {.a = 1, .c = 2};
Unfortunately c++ doesn't have this very handy feature.

Sunday, October 26, 2008

c/c++: call stack

A lot of languages such as python, java, ruby etc. produce call trace on exception.

In c/c++ you don't have such opportunity.
Only playing with a debugger it's possible to view the call trace.

In glibc(since 2.1) you can find backtrace function defined in execinfo.h
This function returns a backtrace for the calling program(addresses of the functions).
Using backtrace function with backtrace_symbols you can get symbol names.
backtrace_symbols returns the symbolic representation of each address consists of the function
name (if this can be determined), a hexadecimal offset into the function, and the actual return address (in hexadecimal) according to the man pages. It should look like:

message: ./a.out(_Z1cv+0x19) [0x80488ad]
message: ./a.out(_Z1bv+0xb) [0x804896f]
message: ./a.out(_Z1av+0xb) [0x804897c]
message: ./a.out(main+0x16) [0x8048994]
message: /lib/libc.so.6(__libc_start_main+0xfb) [0xb7e2764b]
In c++, due to function overloading, namespaces, etc., names of the functions are being mangled. It's hard to read this traceback. I really didn't want to parse each string to extract function names but tried to use dladdr, glibc extension for dlfcn. dladdr returns function name and the pathname of shared object that contains this function. Unfortunately dladdr still returns mangled name of the function.
Unexpectedly I have found __cxa_demangle hidden in the caves of g++. This function demangles function name. This is what I was looking for.
The short example of traceback generation:
#include <stdio.h>
#include <signal.h>
#include <execinfo.h>
#include <cxxabi.h>
#include <dlfcn.h>
#include <stdlib.h>

void c()
{
    using namespace abi;

    enum
    {  
        MAX_DEPTH = 10 
    }; 

    void *trace[MAX_DEPTH];

    Dl_info dlinfo;

    int status;
    const char *symname;
    char *demangled;

    int trace_size = backtrace(trace, MAX_DEPTH);

    printf("Call stack: \n");

    for (int i=0; i<trace_size; ++i)
    {  
        if(!dladdr(trace[i], &dlinfo))
            continue;

        symname = dlinfo.dli_sname;

        demangled = __cxa_demangle(symname, NULL, 0, &status);
        if(status == 0 && demangled)
            symname = demangled;

        printf("object: %s, function: %s\n", dlinfo.dli_fname, symname);

        if (demangled)
            free(demangled);
    }  
}

void b()
{
    c();
}

void a()
{
    b();
}

int main(int argc, char **argv)
{
    a();

    return 0;
}
The executable should be compiled with -rdynamic g++ flag to instruct the linker to add all symbols, not only used ones, to the dynamic symbol table.
g++ test.cc -ldl -rdynamic -o test
The output:
Call stack: 
object: ./test, function: c()
object: ./test, function: b()
object: ./test, function: a()
object: ./test, function: main
object: /lib/libc.so.6, function: __libc_start_main
If gcc optimization have been used you may not see the whole traceback in some cases. With -O2/-O3
Call stack: 
object: ./test, function: c()
object: ./test, function: main
object: /lib/libc.so.6, function: __libc_start_main

perl: $#

I was confused by some perl tutorials/books that claim you can use $# to get the size of an array.

Actually $# returns the last index of the array. The things can be messed with $[, special perl variable that stands for the index of the first element in an array.
But if environment haven't been modified $# will return 'size of the array' - 1.

To get amount of elements the array should be used in scalar context or using scalar function.

Please, don't use $# to get the amount of the elements in the array. You may confuse your followers and yourself.

Saturday, October 25, 2008

c: initializing arrays

There various ways to initialize array in c99.

One dimensional array you can initialize just enumerating the values sequentially. Values not initialized explicitly will be initialized with zeros.

int array[3] = {1, 2, 3};
Also you can specify values with designated initializers.
int array[3] = {1, [2] = 3};
In the example above array[0] is 1, array[1] is 0(implicitly initialized) and array[2] is 3.
Multidimensional arrays can be initialized with enumerating the values sequentially.
int array[2][2] = {1, 2, 3, 4};
The values 1, 2, 3, 4 will be assigned to array[0], array[1], array[2], array[3] correspondingly. Grouping the values of the elements of nesting level of elements is more expressive.
int array[2][2] = {{1, 2}, {3, 4}};
As one dimensional arrays multidimensional can be initialized by designated initializers assigning value to each element
int array[2][2] = {[0][0] = 1, [0][1] = 2, [1] = {3, 4}};
or grouping by level
int array[2][2] = {[0] = {1, 2}, [1] = {3, 4}};

Tuesday, October 21, 2008

linux: change the process' name

From time to time developers want to modify the command line of program that is shown in ps or top utilities.
There is no API in linux to modify the command line.
This article is linux specific but I hope that following these steps you can do it in almost every OS(BTW, FreeBSD has setproctitle routine that should do the job).

Both ps and top tools comes from procps package in all linux distribution I worked with. So they have the same base and let's stop on ps because it's a bit easier to investigate it.
To figure out how ps gets information about the processes let's simply trace it

$strace ps aux 2>&1 1>/dev/null|grep $$
stat64("/proc/4391", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/4391/stat", O_RDONLY)       = 6
read(6, "4391 (bash) S 4388 4391 4349 348"..., 1023) = 222
open("/proc/4391/status", O_RDONLY)     = 6
open("/proc/4391/cmdline", O_RDONLY)    = 6
readlink("/proc/4391/fd/2", "/dev/pts/1", 127) = 10
read(6, "4760 (strace) S 4391 4760 4349 3"..., 1023) = 208
read(6, "4761 (grep) S 4391 4760 4349 348"..., 1023) = 196
read(6, "grep\0004391\0", 2047)         = 10
As expected it reads information from procfs.
$cat /proc/$$/cmdline
bash
Now prepare yourself, I'm going to dig into the kernel sourses.

Looking into fs/proc/base.c I found desired function proc_pid_cmdline that is used to show the command line in /proc/<pid>/cmdline
The part of it that we are interested:
static int proc_pid_cmdline(struct task_struct *task, char * buffer) {
...
    struct mm_struct *mm = get_task_mm(task);

...

    len = mm->arg_end - mm->arg_start;
 
    if (len > PAGE_SIZE)
        len = PAGE_SIZE;

    res = access_process_vm(task, mm->arg_start, buffer, len, 0);

    // If the nul at the end of args has been overwritten, then
    // assume application is using setproctitle(3).
    if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) {
        len = strnlen(buffer, res);
        if (len < res) {
            res = len;
        } else {
            len = mm->env_end - mm->env_start;
            if (len > PAGE_SIZE - res)
                len = PAGE_SIZE - res;
            res += access_process_vm(task, mm->env_start, buffer+res, len, 0);
            res = strnlen(buffer, res);
        }    
    }    
...
That's funny but in comments mentioned setproctitle(3). After I saw these lines I tried to find setproctitle for linux but failed. It's interesting why is mentioned here as it available in FreeBSD but not in linux.
Anyway let's move forward.
The most interesting parts here are
len = mm->arg_end - mm->arg_start;
 
    if (len > PAGE_SIZE)
        len = PAGE_SIZE;

    res = access_process_vm(task, mm->arg_start, buffer, len, 0);
access_process_vm, defined in mm/memory.c, accesses another process' address space. The prototype:
int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write)
Fifth argument write is a flag, if it is 0 then access_process_vm reads len bytes of the process' memory to buf starting from address addr.
Going back to proc_pid_cmdline we can see that start address of the string with the command line is mm->arg_start and its length is mm->arg_end - mm->arg_start but not bigger than PAGE_SIZE. PAGE_SIZE is set to 4096 bytes as defined in include/asm-i386/page.h
#define PAGE_SHIFT 12
#define PAGE_SIZE (1UL << PAGE_SHIFT)
Well, now we know that we have 4KB where we can write.

Who fills the memmory between mm->arg_start and mm->arg_end and what is stored there?
I hope everybody uses ELF binary format now. So let's go to fs/binfmt_elf.c
The name of the function that creates process environment is create_elf_tables.
The part of it we are actually interested in:
static int
create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
        unsigned long load_addr, unsigned long interp_load_addr) {
...
    /* Populate argv and envp */
    p = current->mm->arg_end = current->mm->arg_start;
    while (argc-- > 0) {
        size_t len;
        if (__put_user((elf_addr_t)p, argv++))
            return -EFAULT;
        len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
        if (!len || len > MAX_ARG_STRLEN)
            return -EINVAL;
        p += len;
    }
    if (__put_user(0, argv))
        return -EFAULT;
    current->mm->arg_end = current->mm->env_start = p;
    while (envc-- > 0) {
        size_t len;
        if (__put_user((elf_addr_t)p, envp++))
            return -EFAULT;
        len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
        if (!len || len > MAX_ARG_STRLEN)
            return -EINVAL;
        p += len;
    }
    if (__put_user(0, envp))
        return -EFAULT;
    current->mm->env_end = p;
...
According to create_elf_tables argv points to current->mm->arg_start and current->mm->arg_end points to the end of the environment(envp).

To modify cmdline of the process you have to overwrite memory between current->mm->arg_start and current->mm->arg_end and to keep program's integrity move its environment.

Looking into the sources of getenv/setenv we can see that they access **environ variable(environ(7)). environ is declared in the <unistd.h> but it's preffered to declare it in the user program as
extern char **environ;
The following code rewrites cmdline of the process and moves environment to it's new 'home'.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <limits.h>
#include <sys/user.h>

extern char **environ;

int main(int argc, char **argv)
{   
    unsigned int pid = getpid();
    char proc_pid_cmdline_path[PATH_MAX];
    char cmdline[PAGE_SIZE];
    
    sprintf(proc_pid_cmdline_path, "/proc/%d/cmdline", pid);
    
    FILE *proc_pid_cmdline =  fopen(proc_pid_cmdline_path, "r");
    fgets(cmdline, PAGE_SIZE, proc_pid_cmdline);
    fclose(proc_pid_cmdline);
    
    printf("%s : %s\nenvironment variable HOME = %s\n", proc_pid_cmdline_path, cmdline, getenv("HOME"));
    
    int env_len = -1;
    if (environ) 
        while (environ[++env_len])
            ;
    
    unsigned int size;
    if (env_len > 0)
        size = environ[env_len-1] + strlen(environ[env_len-1]) - argv[0];
    else
        size = argv[argc-1] + strlen(argv[argc-1]) - argv[0];
    
    if (environ)
    {
        
        char **new_environ = malloc(env_len*sizeof(char *));
        
        unsigned int i = -1;
        while (environ[++i])
            new_environ[i] = strdup(environ[i]);
        
        environ = new_environ;
    }   
        
    
    char *args = argv[0];
    memset(args, '\0', size);
    snprintf(args, size - 1, "This is a new title and it should be definetely longer than initial one. Actually we can write %d bytes long string to the title. Well, it works!", size);
    
    proc_pid_cmdline =  fopen(proc_pid_cmdline_path, "r");
    fgets(cmdline, PAGE_SIZE, proc_pid_cmdline);
    fclose(proc_pid_cmdline);
    
    printf("%s : %s\nenvironment variable HOME = %s\n", proc_pid_cmdline_path, cmdline, getenv("HOME"));
    
    return 0; 
}
The output should be
$./chcmdline 
/proc/5865/cmdline : ./chcmdline
environment variable HOME = /home/niam
/proc/5865/cmdline : This is a new title and it should be definetely longer than initial one. Actually we can write 1843 bytes long string to the title. Well, it works!
environment variable HOME = /home/niam
There is an option to expand memory region that we can overwrite. Just assign environment variable for the process:
$ENV_VAR=$(perl -e 'print "0" x 4096;') ./chcmdline 
/proc/5863/cmdline : ./chcmdline
environment variable HOME = /home/niam
/proc/5863/cmdline : This is a new title and it should be definetely longer than initial one. Actually we can write 5948 bytes long string to the title. Well, it works!
environment variable HOME = /home/niam
That small perl script prints "0" 4096 times to stdout, so as we can see the memory size of environment grew up to 5948 bytes.

This code might be portable to other OSes, I don't know. In general in POSIX system with gcc compiler you might be successfull with this code.

Monday, October 20, 2008

bash: file descriptors

In bash you may open file descriptor for reading or(and) writing.
To create file descriptor for reading, writing, reading and writing use these commands:

[fd]<[source] #read-only
[fd]>[source] #write-only
[fd]<>[source] #read-write
[fd] is a digit that describes file descriptor and [source] can be either another file descriptor(should be with leading &) or any other source for reading or writing.
If the source is other file descriptor the new one will be a duplicate.
#file descriptor 3 is opened for reading
#and already contains '123' which comes from the 'echo' command
3< <(echo 1 2 3)
#file descriptor 3 is opened for writing
#which output will be printed to stdout by 'cat' tool
3> >(cat)
#file descriptor 3 is opened for reading/writing
#from/to test.file file
3<>test.file
To move descriptor add '-' suffix to the [source] descriptor.
3<&2- #stderr is redirected to descriptor 3 and closed
By executing
ls -l /proc/$$/fd
you can see what file descriptors are opened for current process.
Using exec for creating/redirecting/duplicating file descriptors allows the changes to take effect in the current shell. Otherwise they are local to subroutine.
#file descriptor 3 is local for 'while' loop
#you can't use it outside
while read line <&3; do echo $line; done 3<test.file
#file descriptor 3 is visible within current process
exec 3<test.file
while read line <&3; do echo $line; done
Special cases of redirection to stdin of the program are 'Here Documents' and 'Here Strings'.
#the shell reads input from the current source until
#a line containing only [word] (with no trailing blanks) is seen
<<[word]
#the shell reads input from the current source until
#a line containing only [word] (with no trailing blanks) is seen
#but all leading tab characters are stripped from input lines and the line containing [word]
<<-[word]
#[word] is expanded and supplied to the command on its standard input
<<<[word]
The examples
$exec 3<<EOF
> text is here
> EOF
$cat <&3
text is here

$exec 3<<<string
$cat <&3
string

Wednesday, October 15, 2008

gdb: modify values of the variables

While you debugging your application you likely want to change behavior a bit.
With gdb you can change values of the variables with set command:

(gdb) whatis a
type = int
By variable name
(gdb) p a
$1 = 5
(gdb) set var a = 10
(gdb) p a
$2 = 10
By variable address
(gdb) p a
$2 = 10
(gdb) p &a
$3 = (int *) 0xbff5ada0
(gdb) set {int}0xbff5ada0 = 15
(gdb) p a
$4 = 15

gdb: examining the core dumps

When your program receives SIGSEGV(Segmentation fault) kernel automatically terminates it(if the application doesn't handle SIGSEGV).
After a couple of long nights of debugging, tracing and drinking coffee you finally find the line in the sources where your application causes system to send this crafty signal.

The most general problem is that you usually unable to run application withing debugger. The fault may be caused by special circumstances. It's really painful to sit in front of the debugger and trying to reproduce the fault. More complexity add multi-threading/processing, network interaction.

Core dumps would be a good solution here.
The linux kernel is able to write a core dump if the application crashes. This core dump records the state of the process at the time of the crash.

Later you can use gdb to analyze the core dump.

Core dumps are disabled by default in linux.
To enable you should run

ulimit -c unlimited
By default kernel writes core dump in the current working directory of the application. You may customize the pattern of file path for core dumps by writing it to /proc/sys/kernel/core_pattern.
According to current documentation pattern consists of following templates
%%  A single % character 
%p  PID of dumped process 
%u  real UID of dumped process 
%g  real GID of dumped process 
%s  number of signal causing dump 
%t  time of dump (secs since 0:00h, 1 Jan 1970) 
%h  hostname (same as the 'nodename' returned by uname(2)) 
%e  executable filename
So with
echo /tmp/%e-%p.core > /proc/sys/kernel/core_pattern
linux should put core dumps into /tmp with -.core filename.
Let's try all this.
Say we have this code
void
crash()
{
    char a[0];
    free(a);
}
int
main(int argc, char **argv)
{
    crash();
    return 0;
}
As you can see application should cause segmentation violation on free call. Let's compile it
gdb test.c -g -o test
and execute
./test 
Segmentation fault (core dumped)
System tells us that core was dumped. Let's see what we have
ll /tmp/*core
-rw------- 1 niam niam  151552 2008-10-15 15:19 /tmp/test-25301.core
Got it. Now I'm going to run gdb
gdb --core /tmp/test-25301.core ./test
gdb clearly tells that application was terminated with SIGSEGV
Core was generated by `./test'.
Program terminated with signal 11, Segmentation fault.
#0  0xb7e4ff97 in free () from /lib/libc.so.6
Now we can use power of gdb to catch the problem code
(gdb) bt
#0  0xb7e4ff97 in free () from /lib/libc.so.6
#1  0x08048392 in crash () at 1.c:9#2  0x080483aa in main () at 1.c:15
(gdb) up
#1  0x08048392 in crash () at 1.c:99  free(a);
(gdb) p a
$1 = 0xbfc5daf8 "\b�ſ�\203\004\b�D�� �ſx�ſ��߷�����\203\004\bx�ſ��߷\001"
(gdb) whatis a
type = char [0]
(gdb) info frame
Stack level 1, frame at 0xbfc5db00:
 eip = 0x8048392 in crash (1.c:9);
 saved eip 0x80483aa called by frame at 0xbfc5db10, caller of frame at 0xbfc5daf0
 source language c.
 Arglist at 0xbfc5daf8, args:
 Locals at 0xbfc5daf8, Previous frame's sp is 0xbfc5db00
 Saved registers:
  ebp at 0xbfc5daf8, eip at 0xbfc5dafc
We can see here that free attempted to free memory of the stack. It shows 'whatis a' and we see that address of a is in the stack(esp holds 0xbfc5db00 and a is stored at 0xbfc5daf8 - just in the beginning of the stack).
gdb gave all needed information for further investigation. The only thing left is to understand who tought you to free array on the stack o_O.

Monday, October 13, 2008

c/c++: array subscripting operator [ ]

Everybody knows that operator [] allows to specify the element of the array.
Using the knowledge about the arrays, which says that the elements of the array are stored sequentially in the memory we can recall that the elements of the array can be accessed using pointer arithmetic.

int a[] = {1,2,3};
std::cout << "First:" << *a << std::endl;
std::cout << "Second:" << *(a + 1) << std::endl;
std::cout << "Third:" << *(a + 2) << std::endl
So, the expression a[index] is equivalent to the expression *(a + index). According to arithmetic rules
*(a + index) == *(index + a)
What does that mean? That means that using the definition of operator [], which says the expression a[b] is equivalent to the expression *((a) + (b)), we can write
int a[] = {1,2,3};
std::cout << "First:" << 0[a] << std::endl;
std::cout << "Second:" << 1[a] << std::endl;
std::cout << "Third:" << 2[a] << std::endl
This is similar to the first listing and to the traditional style
int a[] = {1,2,3};
std::cout << "First:" << a[0] << std::endl;
std::cout << "Second:" << a[1] << std::endl;
std::cout << "Third:" << a[2] << std::endl

Saturday, October 11, 2008

c++: implicit typenames in templates

Template object is an incomplete type, for an obvious reason in some cases it's hard for compiler to decide what typename should be. Let's look at the this example

template<typename T>
class A
{
    public:
        struct B
        {
            T member;
        };
};

template<typename T>
class C
{
    public:
        A<T>::B member;
};
Your(and mine also ;]) compiler will probably tell you " type A<T> is not derived from type C<T>". A<T> is undefined, so A<T>::B is undefined also.
To make this work you have to tell compiler explicitly that A<T>::B is a typename.
template<typename T>
class A
{
    public:
        struct B
        {
            T member;
        };
};

template<typename T>
class C
{
    public:
        typename A<T>::B member;
};

Friday, October 10, 2008

c++: template template parameters

Template template parameter allows to pass not complete template as a template parameter. Sounds like tongue twister. I hope an example below should demonstrate the usage of template template parameter

#include <vector>
template<typename T, template<typename T> class V>
class C
{
   V<T> v;
};
int main(int argc, char **argv)
{
   C<int,std::vector> c;
   return 0;
}

Thursday, October 9, 2008

css: absolute position

I do not do a lot of markup these days, but an article I read recently opened my eyes on the light side of absolute position of the elements.
If you place an element with absolute position into a box with relative position then child's position will be absolute relatively to the parent, it won't be removed from the document.

This one should be on the right of the box with red border.

c++: RVO and NRVO

RVO is stands for "return value optimization" and NRVO for "named return value optimization".
What does all this staff mean?

Typically, when a function returns an instance of an object, a temporary object is created and copied to the target object via the copy constructor.

RVO is a simple way of optimization when compiler doesn't create temporary when you return anonymous instance of class/structure/etc. from the function.

class C
{
    public:
        C() 
        {
            std::cout << "Constructor of C." << std::endl;
        }
        C(const C &)
        {
            std::cout << "Copy-constructor of C." << std::endl;
        }
};

C func()
{
    return C();
}

int main(int argc, char **argv)
{
    C c = func();

    return 0;
}
The output should be
Constructor of C.
Here compiler do not make a copy of C instance on return. This is a great optimization since construction of the object takes time. The implementation depends on compiler but in general compiler extends function with one parameter - reference/pointer to the object where programmer wants to store the return value and actually stores return value exact into this parameter.
It may look like
C func(C &__hidden__)
{
    __hidden__ = C();
    return;
}
NRVO is more complex. Say you have
class C
{
    public:
        C()
        {
            std::cout << "Constructor of C." << std::endl;
        }
        C(const C &c) 
        {
            std::cout << "Copy-constructor of C." << std::endl;
        }
};

C func()
{
    C c;
    return c;
}

int main(int argc, char **argv)
{
    C c = func();

    return 0;
}
Here compiler should deal with named object c. With NRVO temporary object on return shouldn't be created. The pseudocode of
C func()
{
    C c;
    c.method();
    c.member = 10;
    return c;
}
might look like
C func(C &__hidden__)
{
    __hidden__ = C();
    __hidden__.method();
    __hidden__.member = 10;
    return;
}
In both cases temporary object is not created for copying(copy-constructor is not invoked) from the function to the outside object.

When this may not work?
Situation I known when these optimizations won't work when function have different return paths with different named objects.

Wednesday, October 8, 2008

c++: inheritance from "template" class and dependance on template parameter

You may say it's quite weird that you are unable to access members of base class which depends on template parameter.

template<typename T>
class A
{
    public:
        T member;
};

template<typename T>
class B: public A<T>
{
    public:
        B();
};

template<typename T>
<T>::B()
{
    T t;
    member = t;
}
This code will raise an error that 'member' has not been found. An error occurs because of the interactions taking place in the c++ lookup rules. It comes down to that something is dependent upon some "<T>". Not that T depends upon say a typedef, but it depends upon a template parameter. In particular the use of A<T> depends upon the template parameter T, therefore the use of this base's members need to follow the rules of dependent name lookup, and hence are not directly allowed in the code above as written. To make this code work you may
  • Qualify the name with this->
    this->member = t;
  • Qualify the name with A<T>::
    A<T>::member = t;
  • Use a 'using' directive in the class template
    template<typename T>
    class A
    {
        public:
            T member;
    };
    
    template<typename T>
    class B: public A<T>
    {
        using A<T>::member;
        public:
            B();
    };
    
    template<typename T>
    B<T>::B()
    {
        T t;
        member = t;
    }
There is not everything clear with methods that depend on own template parameter.
template<typename T>
class A
{
    public:
        template<typename U>
        void func();
};

template<typename T>
template<typename U>
void
A<T>::func()
{
}

template<typename T>
class B: public A<T>
{
    public:
        B();
};

template<typename T>
B<T>::B()
{
    A<T>::func<int>();
    this->func<int>();
}
You can't write
this->member = t;
nor
A<T>::member = t;
The compiler assumes that the < is a less-than operator. In order for the compiler to recognize the function template call, you must add the template quantifier.
template<typename T>
B<T>::B()
{
    A<T>::template func<int>();
    this->template func<int>();
}
Some compilers(or their versions) don't actually parse the template until the instantiation. Those may successfully compile the code w/o specifying 'template' keyword. Without knowing the instantiation type, it can't know what 'func' refers to. In order to parse correctly, however, the compiler must know which symbols name types, and which name templates. 'template' keyword helps compiler to get that A<T>::func<int> is a template.

Monday, October 6, 2008

c++: function try-catch block

There is no hidden features in c++. Everything you can find in specification.
Somebody from Somewhere
There is an interesting feature in c++ - a function try-catch block. You can replace function body with try-catch block.
void function()
try
{
        //do smth
}
catch(...)
{
        //handle exception
}
This is almost similar to
void function()
{
        try
        {
                //do smth
        }
        catch(...)
        {
                //handle exception
        }
}
Quick notes:
  • the scope and lifetime of the parameters of a function or constructor extend into the function try-catch blocks
  • A function try-catch block on main() does not catch exceptions thrown in destructors of objects with static storage duration. In code below catch won't be called
    class A
    {
            public:
                    ~A()
                    {
                            throw "Exception in ~A";
                    }
    };
    
    int main(int argc, char **argv)
    try
    {
            static A a;
    }
    catch(const char *e)
    {
            std::cout << e << std::endl;
    }
    
  • A function try-catch block on main() does not catch exceptions thrown in constructors/destructors of namespace/global namespace scope objects. In code below catch won't be called
    namespace N
    {
            class A
            {
                    public:
                            A()
                            {
                                    throw "Exception in A";
                            }
            };
    
            A a;
    }
    
    int main(int argc, char **argv)
    try
    {
    }
    catch(const char *e)
    {
            std::cout << e << std::endl;
    }
  • The run time will rethrow an exception at the end of a function try-catch block's handler of a constructor or destructor. All other functions will return once they have reached the end of their function try block's handler
    class A
    {
            public:
                    A()
                    try
                    {
                            throw "Exception in A";
                    }
                    catch(const char *e)
                    {
                            std::cout << "In A: " << e << std::endl;
                    }
    };
    
    int main(int argc, char **argv)
    try
    {
            A a;
    }
    catch(const char *e)
    {
            std::cout << "In main: " << e << std::endl;
    }
    
    The output
    In A: Exception in A
    In main: Exception in A
    int function()
    try
    {
            throw "Exception in function";
    }
    catch(const char *e)
    {
            std::cout << "In function: " << e << std::endl;
    
            return 1;
    }
    
    int main(int argc, char **argv)
    try
    {
            std::cout << function() << std::endl;
    }
    catch(const char *e)
    {
            std::cout << "In main: " << e << std::endl;
    }
    
    The output
    In function: Exception in function
    1
    

Sunday, October 5, 2008

json: comments

I recently read some interesting observation that JSON format doesn't declare the comment strings. Though, for example, XML declares comment strings. The delusion here is that JSON is a data-interchange format, not a language. Yes, programming languages define format of comments. At least all languages I know have comments. But JSON is _not_a_language_. And XML is a language, not a format. This is my consideration. JSON is for machine-to-machine interchange. Machines ignore comments. Anyway you can reserve some part of message for comment:

{
    'comment': '',
}

Friday, October 3, 2008

perl: another way to dereference

When you have structures that have nested arrays or hashes which you want to dereference on the fly w/o using extra variable to store reference you can use special syntax:

%{reference_to_hash} and @{reference_to_array}
Next piece of code shows the common usage
$struct = [1, 2, 3, {a=>1, b=>2, c=>[1,2]}];

%hash = %{$struct->[3]};

@array = @{$struct->[3]->{c}};
This is useful when you want to work with structures but not with their references
push @{$struct->[3]->{c}}, (3, 4);

Thursday, October 2, 2008

perl: arrays and hashes

This is mostly a reminder for me than an article for everybody as I haven't touched perl for ages.

Small reference on arrays and hashes in perl.

Arrays

Declaration

@array = (1, '1', (2));
@array = (1..20);# by range
Access to array members with index
$array[0];
Define reference to array
$array = \@array; #reference to another array
$array = [1, 3, 5, 7, 9]; #reference to anonymous array
$array = [ @array ]; #reference to anonymous copy
@$array = (2, 4, 6, 8, 10); #implicit reference to anonymous array
To deference reference to array put @ or $ before $
@array = @$array;
@array = $$array;
Access to members of array by reference with index
$array->[0];# using -> operator
@$array[0];# dereferencing
$$array[0];# dereferencing
Size of the array
$#array;# [note: size of an empty array is -1, so $#array is a number of elements - 1]
Here is a tricks to remove all elements from an array, add an element to array
$#array = -1;
$c[++$#c] = 'value';
Take a slice of an array
@array[0..2];# first, second and third elements
@array[0,2];# first and third elements
Hashes

Declaration
%hash = ('key0', 'value0', 'key1', 'value1');# amount of elements must be even
%hash = ('key0' => 'value0', 'key1' => 'value1');
Access to hash members with key
$hash{'key0'};
Define reference to hash
$hash = \%hash; #reference to another hash
$hash = {1 => 3, 5 => 7}; #reference to anonymous hash
$hash = {1, 3, 5, 7}; #reference to anonymous hash; amount of elements must be even
$hash = [ %hash ]; #reference to anonymous copy
%$hash = (2, 4, 6, 8); #implicit reference to anonymous hash; amount of elements must be even
%$hash = (2 => 4, 6 => 8); #implicit reference to anonymous hash
To deference reference to hash put % or $ before $
%hash = %$hash;
%hash = $$hash;
Access to members of hash by reference with key
$hash->{'key0'};# using -> operator
%$hash[0];# dereferencing
$$hash[0];# dereferencing
Size of the hash
scalar keys(%hash)
Take a slice of a hash
@hash{'key0','key1'};
@hash{@keys};

c++: separate members from their classes

In my post c++: separate methods from their classes I described how to call class method by reference. The similar staff you can do with class members. Assume you have a collection of class instances and you want to print the values of some members from them. Again you can define two lists - list of class instances and list of pointers to class members. Later you can iterate through these list to touch members of the classes.

#include <iostream>
#include <list>

class A
{
    public:
        int m0; 
        int m1; 
};

template<typename T>
void
print(const T &a, 
    int T::*p)
{
    std::cout << a.*p << std::endl;
}

int main(int argc, char **argv)
{
    std::list<A> ls;
    std::list<int A::*> lsm;

    int A::*p0 = &A::m0;

    lsm.push_back(p0);
    lsm.push_back(&A::m1);

    A a0, a1; 
    a0.*p0 = 0;
    a0.m1 = 1;
    a1.m0 = 2;
    a1.m1 = 3;
    
    ls.push_back(a0);
    ls.push_back(a1);
    
    for (std::list<A>::iterator i = ls.begin();i!=ls.end();++i)
        for (std::list<int A::*>::iterator j = lsm.begin();j!=lsm.end();++j)
            print(*i, *j);

    return 0;
}
With this piece of code you will get
0
1
2
3
This method to access class members can be combined with class methods references to achieve more power.

Monday, September 29, 2008

c++: Koenig lookup

Koenig lookup, named after Andrew Koenig, a former AT&T researcher and programmer known for his work with C++ is also known as argument dependent lookup.
Argument dependent lookup applies to the lookup of unqualified function through the namespaces according to the types of given arguments.
Let's go through a simple example.

#include <iostream>

namespace NS1 
{
    class A
    {   
        public:
            A() {std::cout << "NS1::A::A";}
    };  

    template<typename T>
    void f(T a)
    {   
    }   
}

int main(int argc, char **argv)
{
    NS1::A v;
    f(v);

    return 0;
}
Under normal conditions compiler will look for f in global namespace and fail as there is no such routine there. But modern compiler can be more intelligent: it can look at the arguments and exposes function from the namespace where arguments came. This will compile with modern compilers but will fail with old.

On the other hand the next piece of code might cause another kind problem.
#include <iostream>

namespace NS1 
{
    class A
    {   
        public:
            A() {std::cout << "NS1::A::A";}
    };  

    template<typename T>
    void f(T a)
    {   
    }   
}

template<typename T>
void f(T a)
{
}

int main(int argc, char **argv)
{
    NS1::A v;
    f(v);

    return 0;
}
The compilation will fail with compilers that do Koenig lookup with "call of overloaded function is ambiguous" error but will succeed with compilers that do not perform Koenig lookup. Old compilers will call function from the global namespace.

For portability the namespace should be declared explicitly.

Monday, September 15, 2008

linux: emergency reboot remote box

Once I've faced very unusual problem for me.
I started to reboot the remote box and was unable to do it, because kernel thread [pdflush] was in uninterruptible sleep.

For local machines I used to hit SysRq+b, but here it doesn't work for me.

The solution was quite simple. Send SysRq+b via /proc filesystem:

echo b > /proc/sysrq-trigger

Friday, September 12, 2008

python: tuple with one item

When you want to construct a tuple containing only one item you should write

variable = (1,)
You have to add extra coma after the item. If you forget this coma the item will be returned, not the tuple.
Weird =/

c++: exception in constructor

I wonder why people want to make all the stuff in constructor.

Constructor do not return anything, so it can't indicate that it failed to do something.

The only way is (please close your eyes here)to throw an exception(now you can open the eyes).

When you throw an exception in the constructor the destructor will not be called. Because compiler doesn't actually know whether the object had been constructed or not. So it's more safe to omit execution of destructor in this case.

So you should clean the stuff just before throwing the exception:

class A
{
    public:
        A()
        {
            do_stuff();

            if (smth_goes_wrong)
            {
                clean_the_shoes();

                throw ticket_to_hell;
            }  

            do_other_stuff();
        }  
}
Knowing that you can keep safe your code even anything else may throw an exception in your constructor:
class A
{
    public:
        A()
        {
            try
            {
                do_stuff();
            }
            catch (...)
            {
                clean_the_shoes();

                throw;
            }   

            do_other_stuff();
        }   
};
Or even better not to throw the caught exception upstairs and do not call anything that might throw an exception at all. Much better to have an init method that may fail safely and call it after the constructor had been called and you are able to safely handle any exceptions.

Wednesday, August 6, 2008

c/c++: embed binary data into elf v.2

In previos post I've described how to embed data into object.
The other opprotunity is to store data in the c/c++ array.
Again, I'll use data.txt:

$cat data.txt 
data file
To create a source file with this data I'll use xxd utility:
xxd -i data.txt data.c
$cat data.c
unsigned char data_txt[] = {
  0x64, 0x61, 0x74, 0x61, 0x20, 0x66, 0x69, 0x6c, 0x65, 0x0a
};
unsigned int data_txt_len = 10;

Simple c source file to use this array will look like:
#include <stdio.h>

extern unsigned char data_txt[];
extern unsigned int data_txt_len;

int
main(int argc, char **argv)
{
    printf("%d", data_txt_len);
    printf("%s", data_txt);

    return 0;
}
To compile
gcc test.c data.c

c/c++: embed binary data into elf

It's great idea when you store program data somewhere outside the binary.
It can be modified for changing program's behaivior or for rebranding.

But sometimes you want to keep some data immutable, hidden into executable binary.

This can be help sections. If you don't want to have smth like

void
usage (status)
     int status;
{
  fprintf (status ? stderr : stdout, "\
Usage: %s [-nV] [--quiet] [--silent] [--version] [-e script]\n\
        [-f script-file] [--expression=script] [--file=script-file] [file...]\n",
       myname);
  exit (status);
}
and don't want this help section be stored in the separate file.You can simply embed binary data into your executable.

Consider you have data.txt:
$cat data.txt 
data file
You have to convert it to elf.
I know two ways:
  • use linker:

    ld -r -b binary -o data.o data.txt
  • use objcopy:

    objcopy -I binary -O elf32-i386 --binary-architecture i386 data.txt data.o

Both of these commands produce elf:
$readelf -a data.o 
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          96 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         5
  Section header string table index: 2

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .data             PROGBITS        00000000 000034 00000a 00  WA  0   0  1
  [ 2] .shstrtab         STRTAB          00000000 00003e 000021 00      0   0  1
  [ 3] .symtab           SYMTAB          00000000 000128 000050 10      4   2  4
  [ 4] .strtab           STRTAB          00000000 000178 000043 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

Symbol table '.symtab' contains 5 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 SECTION LOCAL  DEFAULT    1 
     2: 00000000     0 NOTYPE  GLOBAL DEFAULT    1 _binary_data_txt_start
     3: 0000000a     0 NOTYPE  GLOBAL DEFAULT    1 _binary_data_txt_end
     4: 0000000a     0 NOTYPE  GLOBAL DEFAULT  ABS _binary_data_txt_size

_binary_data_txt_size and _binary_data_txt_end contain 
Ok, you have data.o with your data in .data section and three symbols: _binary_data_txt_start, _binary_data_txt_end, _binary_data_txt_size

_binary_data_txt_end and _binary_data_txt_size have the same value here. So I'll use _binary_data_txt_size only.
Let's make a simple c program to use data from the object. It's a bit tricky.
#include <stdio.h>

extern int _binary_data_txt_start;
extern int _binary_data_txt_size;

int
main(int argc, char **argv)
{
    int size = (int)&_binary_data_txt_size;
    char *data = (char *)&_binary_data_txt_start;
    
    printf("%d", size);
    printf("%s", data);

    return 0;
}
_binary_data_txt_start and _binary_data_txt_size contain values in their addresses. So &_binary_data_txt_size contains not an address of the symbol but actually value of the symbol that holds the size of the data and &_binary_data_txt_start contains address of the data.

To compile

gcc test.c data.o