Tuesday, January 27, 2009

c: alignment of structure

In this article I assume that the computer's word size is 4 bytes(IA32), because the main idea is the same for all architectures and I want to keep the article clean. You just have to adjust this article to your platform.

So, what's with the size of the structure?
You should know that members of data structure(represented by struct keyword in c) are aligned to the power of 2. Each element is stored in the closest(next) address with the appropriate alignment. The whole structure should be aligned as aligned its member with the longest alignment.
The type of each member of the structure usually has a default alignment(if you are not using #pragma pack directive). It's 1 byte for char, 2 bytes for short, 4 bytes for int. You should check this table for your arch.
So the structure

struct
{   
    char a;
    short b;
    int c;
} data;
should be 8 bytes long. How is it calculated? The structure has one char, one short and one int members. The alignment should look like
+-----------------------------------+
|char| XX |  short |       int      |
+-----------------------------------+
0         2        4                8
The short member is stored on the distance of 2 bytes from the address of char because the alignment of short is 2 bytes, the int member is stored just after the short, because the address from the beginning(in this case, or in general from the previously aligned members) is suitable for alignment of int.
But if you change the sequence of the members the whole picture could change, though the number of members and their sizes didn't change. The structure
struct
{   
    char a;
    int c;
    short b;
} data;
on the same platform should be 12 bytes long. Why? Let's calculate.
+---------------------------------------------+
|char| XX | XX | XX |       int      |  short |
+---------------------------------------------+
0         2        4                8         12
That's because the address of integer member is adjusted to its alignment.

Knowing these rules you can optimize the sizes of the structures just moving position of the members inside the structure. Let's add one char in the end to the previously declared structure:
struct
{   
    char a;
    short b;
    int c;
    char d;
} data;
The size of the structure is 12. It's easy to calculate. The size of the structure should be as aligned its member with the longest alignment. In this case the longest alignment is 4. Sequence of char, short, int is aligned to 8 bytes. Adding one char to the end you force compiler to align the structure to 12 bytes, it can't be 9 or anything else. If you look at the alignment of the original structure you could see unused byte that appeared because of the alignment. Let's move d just after(or before, doesn't matter) the a structure member:
struct
{   
    char a;
    char d;
    short b;
    int c;
} data;
The size of this structure should be 8 bytes.
+-----------------------------------+
|char|char|  short |       int      |
+-----------------------------------+
0         2        4                8
If you deal with embed devices where you have significantly small amount of memory it's good optimization. 8 bytes vs. 12 bytes, or 1K vs. 1.5K in case of 128 copies of the structure.

Another approach which is also platform and compiler dependent is to use #pragma pack directive. In general structure
struct
{   
    int c;
    short b;
} data;
should be 8 bytes long. But if you use #pragma pack with alignment to 2 bytes you may force the whole structure will be aligned to 2 bytes.
#pragma pack(push)
#pragma pack(2)

struct
{   
    int c;
    short b;
} data;

#pragma pack(pop)
This structure should be 6 bytes long.

The alignment can cause troubles especially if the data is transmitted between different platforms where alignment or size of type may differ. If it's possible strings(sequences of bytes/chars) should be used. Their alignment should always be 1 byte long.

Tuesday, January 20, 2009

c: executing shellcode

In previous article I've described how to overwrite function's return point to execute some code.
I *nix world most of the code is being written in c. So most likely you will have to deal with stack overflows in c.

The basics remain the same. You have to find the top of the stack of a function, calculate the address of the return point and write the beginning of your code into it.
Let's look at the code below.

#include <stdio.h>

void function()
{
    int *p;
    printf("&p: %p\n", &p);
}

int main(int argc, char **argv)
{
    function();

    return 0;
}
The address of pointer p should be the top of the stack of our function. The output should looks like
&p: 0xbffdf594
The address might change between the program execution. Running this program in gdb, stopping in the beginning of the function and looking at address of p and values of the register you can see that the difference between the &p and %esp is 4 bytes.
(gdb) l
2 
3 void function()
4 {
5     int *p;
6     printf("&p: %p\n", &p);
7 }
8 
9 int main(int argc, char **argv)
10 {
11     function();
(gdb) b 5
Breakpoint 1 at 0x804838a: file so.c, line 5.
(gdb) r
Breakpoint 1, function () at so.c:6
6     printf("&p: %p\n", &p);
(gdb) i r
esp            0xbff2d490 0xbff2d490
ebp            0xbff2d4a8 0xbff2d4a8
...
(gdb) p &p
$1 = (int **) 0xbff2d4a4
Indeed, &p is on the top of the stack. We should take into account that usually %ebp is pushed onto the stack, so the difference between &p and return point is 8 bytes.
(gdb) disass
Dump of assembler code for function function:
0x08048384 <function+0>: push   %ebp
0x08048385 <function+1>: mov    %esp,%ebp
0x08048387 <function+3>: sub    $0x18,%esp
0x0804838a <function+6>: lea    -0x4(%ebp),%eax
0x0804838d <function+9>: mov    %eax,0x4(%esp)
0x08048391 <function+13>: movl   $0x80484a0,(%esp)
0x08048398 <function+20>: call   0x8048298 <printf@plt>
0x0804839d <function+25>: leave  
0x0804839e <function+26>: ret    
We are almost ready for the hack. Let's write some shellcode that would be executed instead of returning from function to main.
I'm not strong in writing shellcode and asm, so let it be simple code that will call exit with exit code 1. The asm code is
.text

.global main
main:
movl $1, %eax
movl $1, %ebx
int $0x80
Having compiled and linked code is not enough, we can't just put ELF binary as a shellcode. I used objdump to extract disassembled code of main and its representation in machine commands.
$objdump -d shellcode
...
08048354 <main>:
 8048354: b8 01 00 00 00        mov    $0x1,%eax
 8048359: bb 01 00 00 00        mov    $0x1,%ebx
 804835e: cd 80                 int    $0x80
...
The code begins from address 8048354 and ends at 8048360. To use the instructions as a shellcode they should be put into an ascii zero-ended string where each code is prefixed with '\x'. The string with shellcode will be "\xb8\x01\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80".
Let's integrate the shellcode into our program.
#include <stdio.h>

char shellcode[] = "\xb8\x01\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80";

void function()
{
    int *p; 
    printf("&p: %p\n", &p);
    p = (int *)&p + 2;
    *p = (int)shellcode;
}

int main(int argc, char **argv)
{
    function();

    return 0;
}
Here I assigned the address of &p plus 8 bytes, which should be the return point, to the pointer. So p now points exactly to the return point. Later I write the address of the shellcode to *p, that is actually a return point.
If you execute this code and check the exit code you should see
$./so
&p: 0xbfd262e8
$echo $?
1
As expected the program exited with code 1. Let's walk through the execution process.
(gdb) l
6 {
7     int *p;
8     printf("&p: %p\n", &p);
9     p = (int *)&p + 2;
10     *p = (int)shellcode;
11 }
12 
13 int main(int argc, char **argv)
14 {
15     function();
(gdb) b 11
Breakpoint 1 at 0x80483b0: file so.c, line 11.
(gdb) r
&p: 0xbff2d4a4

Breakpoint 1, function () at so.c:11
11 }
(gdb) n
0x080495b8 in shellcode ()
(gdb) disass
Dump of assembler code for function shellcode:
0x080495b8 <shellcode+0>: mov    $0x1,%eax
0x080495bd <shellcode+5>: mov    $0x1,%ebx
0x080495c2 <shellcode+10>: int    $0x80
0x080495c4 <shellcode+12>: add    %al,(%eax)
Instead of returning to main the execution moved to the address 0x080495b8. Disassembled code of the shellcode is exactly the same as we have generated.

Actually this shellcode won't work in the real world because it contains null-bytes. Mostly buffer overflow attacks are used against string functions from libc and they will cut this code.

A lot of interesting shellcodes you may find at the metasploit project site.

Please note, I've suceeded with runnning this code with gcc-4.3.2 and linux-2.6.27.
With gcc-4.1.2, gcc-3.4.6 and linux-2.6.25 I didn't succeed to run the shellcode and ran into segfault with and without -fno-stack-protector gcc flag. I've also checked kernel.randomize_va_space system parameter but switching it to the different values didn't help. Unfortunately I don't know why this is not working. Actually it's failing on
mov    $0x1,%eax
I have no idea why writing value to the register causes segfault. Most likely that's not a gcc 'issue' but kernel(or kernel configuration), because my kernel 2.6.27 is not secure at all because I'm sitting behind the firewall and some performance benefit by turning off security features is critical for me.

Tuesday, January 13, 2009

c++: partial template specialization of class methods

In c++ it's possible to specify class method of template class.
Before I thought it's only possible to specify the whole class and then redefine functions. It could be painful if template class contains a lot of methods. Of course the expected class could be built deriving from template class specifying the new type and redefining needed functions:

template<typename T>
class B
{
    public:
        void operator ()()
        {  
            std::cout << "B<" << typeid(T).name() << ">::operator ()" << std::endl;
        }  
        void method()
        {  
            std::cout << "B<" << typeid(T).name() << ">::method()" << std::endl;
        }  
};

class C: public B<int>
{
    public:
        void operator ()()
        {  
            std::cout << "C<long#pseudo class specialization>::operator () with deriving from B<int>" << std::endl;
        }  
};
int main(int argc, char **argv)
{
    C()();
    C().method();
}
The output you expect should be
C<long#pseudo class specialization>::operator () with deriving from B<int>
B<i>::method()
We get the class C which is the same as B<int> but with custom operator ().
Unfortunately this code changes the name of the class and developer should keep in mind that class C is B<int> with custom functions. This is not explicit even if you find better name than C.

Another way, that is a c++ way, to specify methods of template class. Let's say we have class A defined below:
template<typename T0>
class A
{
    public:
        void operator ()()
        {
            std::cout << "A<" << typeid(T0).name() << ">::operator ()" << std::endl;
        }
        template<typename T1>
        void method()
        {
            std::cout << "A<" << typeid(T0).name() << ">::method<" << typeid(T1).name() << ">()" << std::endl;
        }
};
Here we can redefine both operator () and method() class methods:
template<>
void
A<float>::operator ()()
{
    std::cout << "A<float#method specialization>::operator ()" << std::endl;
}

template<>
template<>
void
A<float>::method<float>()
{
    std::cout << "A<float#method specialization>::method<float#method specialization>()" << std::endl;
}
operator () for A<float> and method<float> for A<float> have been specified. What actually happen? There was created fully specified class A for type float and both of its methods have been defined.
The output of
int main(int argc, char **argv)
{
    A<int>()();
    A<float>()();
    A<float>().method<float>();

    return 0;
}
should be
A<i>::operator ()
A<float#method specialization>::operator ()
A<float#method specialization>::method<float#method specialization>()
You can see that appropriate methods have been called.

As I mentioned before, if you do a class specialization you have to redefine all methods:
template<>
class A<double>
{
    public:
        void operator ()()
        {  
            std::cout << "A<double#class specialization>::operator ()" << std::endl;
        }  
        template<typename T1>
        void method()
        {  
            std::cout << "A<double#class specialization>::method<" << typeid(T1).name() << ">()" << std::endl;
        }  
};
Otherwise if you don't define method for A<double> and it's used somewhere in this context compiler will rise an error that no member method was defined in class A<double>. Class specialization should be used instead of class method specialization if the specialization changes the behavior of the most members of the class. Doing specialization you are defining new class with the same as template class but optimized for special case. Using the code above the next program
int main(int argc, char **argv)
{
    A<double>()();
    A<double>().method<int>();

    return 0;
}
should produce
A<double#class specialization>::operator ()
A<double#class specialization>::method<i>()
You see that methods from A<double> have been called.

And even in the case of template class specialization you still able to specify its template methods.
template<>
void
A<double>::method<double>()
{
    std::cout << "A<double#class specialization>::method<double#method specialization>()" << std::endl;
}
The program
int main(int argc, char **argv)
{
    A<double>().method<int>();
    A<double>().method<double>();

    return 0;
}
should show on stdout next messages
A<double#class specialization>::method<i>()
A<double#class specialization>::method<double#method specialization>()
At first 'undefined' method of A<double> was called and later specialized one.

Template specialization is a powerful mechanism and should be used with comprehension.

Tuesday, January 6, 2009

c: to exit or to _exit?

There are two functions that allow you to legally terminate your program: exit and _exit.

Both of them immediately terminate the process, close file descriptors, flush all open streams with unwritten buffered data and close all of them, send SIGCHLD signal and return exit code to the parent process if it has no set SA_NOCLDWAIT, or has not set the SIGCHLD handler to SIG_IGN.
If the process is a session leader and its controlling terminal is the controlling terminal of the session, then each process in the foreground process group of this controlling terminal is sent a SIGHUP signal, and the terminal is disassociated from this session, allowing it to be acquired by a new controlling process.

The main difference is that exit calls all functions registered with atexit and on_exit while _exit does not.
Also the threads terminated by a call to _exit() shall not invoke their cancellation cleanup handlers or per-thread data destructors.

It's worth to note that using return from main function has the same behavior as calling exit with the returned value.

How is user or developer is affected by the difference between these calls?
First of all the differences become significant when we are talking about the processes that call fork(or any other routine to create child thread or process).

I read about the side effects of calling exit from the child that caused temporary files unexpectedly removed. I haven't ever been affected by this when I used exit from the child process.
Though I'm trying to use _exit from the child processes.
This sample illustrates the absence of this possible effect at least on my system.

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>

int main(int argc, char **argv)
{
    FILE *tmp = tmpfile();

    if (fork() == 0)
    {   
        fprintf(tmp, "Hello from child<%d>!\n", getpid());
        fflush(tmp);
        exit(0);
    }   
    else
    {   
        sleep(1);
        wait(NULL);

        char msg[256];
        fseek(tmp, 0, SEEK_SET);
        fgets(msg, 256, tmp);
        printf("message: %s\n", msg);
    }   
            
    return 0;
}
$./exit 
message: Hello from child<16489>!
The things go worse when we are talking about c++.

Destructors of global and static data are being called on at_exit stage.
After you have called the constructor of global or static object GCC automatically calls the function
int __cxa_atexit(void (* f)(void *), void *p, void *d);
Where f is a function-pointer to the destructor, p is the parameter for the destructor and d is the "home DSO" (DSO = dynamic shared object). When the process exits
void __cxa_finalize(void *d);
should be called with d = 0 in order to destroy all with __cxa_atexit registered objects.

Let's look at some samples that show side effects of calling _exit or exit from both child and parent processes.
Say we have class A that is stored in the global memory region.
#include <iostream>

using namespace std;

class A
{
    public:
        A(){cout << "A(), " << getpid() << endl;}
        ~A(){cout << "~A(), " << getpid() << endl;}
};

A a;
  • The first case is when both child and parent call exit to terminate themselves.
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            cout << "child, " << getpid() << endl;
            exit(0);
        }   
        else
        {   
            cout << "parent, " << getpid() << endl;
            sleep(1);
            wait(NULL);
            exit(0);
        }   
                
        return 0;
    }
    The output might be
    $./exit
    A(), 17186
    child, 17187
    ~A(), 17187
    parent, 17186
    ~A(), 17186
    You see that the object was constructed once but the destructor was called twice from the child and parent processes. This can cause really bad things.
  • The second case is when the child calls exit and the parent calls _exit.
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            cout << "child, " << getpid() << endl;
            exit(0);
        }   
        else
        {   
            cout << "parent, " << getpid() << endl;
            sleep(1);
            wait(NULL);
            _exit(0);
        }   
                
        return 0;
    }
    The output might be
    $./exit 
    A(), 17212
    parent, 17212
    child, 17213
    ~A(), 17213
    The constructor was called once from the parent process and destructor was called once but from the child process. You may not notice the effect unless you use object in the parent process. In spite of this it's still dangerous. But this could be used if your forked process is going to become a daemon and parent is no longer needed and will be immediately terminated.
  • The third case is when both child and parent call _exit.
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            cout << "child, " << getpid() << endl;
            _exit(0);
        }   
        else
        {   
            cout << "parent, " << getpid() << endl;
            sleep(1);
            wait(NULL);
            _exit(0);
        }   
                
        return 0;
    }
    The output might be
    $./exit 
    A(), 17221
    parent, 17221
    child, 17222
    Here constructor was called once but no calls of destructor. This could be fine. But if you are doing something significant in destructor such as closing connections, deleting temporary files, etc. you may run into the trouble.
  • An the last one, that should be correct, is when the child calls _exit and the parent calls exit.
    int main(int argc, char **argv)
    {
        if (fork() == 0)
        {   
            cout << "child, " << getpid() << endl;
            _exit(0);
        }   
        else
        {   
            cout << "parent, " << getpid() << endl;
            sleep(1);
            wait(NULL);
            exit(0);
        }   
                
        return 0;
    }
    The output should be
    $./exit 
    A(), 17234
    child, 17235
    parent, 17234
    ~A(), 17234
    The parent process has created the object and it has destroyed it. The best behavior you can expect.
The behavior changes a bit if you are using vfork instead of fork. The behavior differ with
int main(int argc, char **argv)
{

    if (vfork() == 0)
    {   
        cout << "child, " << getpid() << endl;
        exit(0);
    }   
    else
    {   
        cout << "parent, " << getpid() << endl;
        sleep(1);
        wait(NULL);
        exit(0);
    }   
            
    return 0;
}
The output might look like
$./exit 
A(), 17321
child, 17322
~A(), 17322
parent, 17321
This is similar to the code where the child called exit and the parent called _exit. This is the expected behavior of vfork. With vfork the new process is being created without copying the page tables of the parent process, the parent and child use the same memory pages. The usage of vfork is dangerous first of all. And in the modern systems that use COW technique for forked processes you may not feel the performance reduction.

You can't be 100% sure that you are not using c++ code in your project that could define global variables somewhere. It could be third-party library in your project that uses static or global objects(the singletons, depending on the implementation, could be affected also). So the best practice is to use _exit to return from the child process and use exit(or return from main) to exit from the parent.