Wednesday, August 6, 2008

c/c++: embed binary data into elf

It's great idea when you store program data somewhere outside the binary.
It can be modified for changing program's behaivior or for rebranding.

But sometimes you want to keep some data immutable, hidden into executable binary.

This can be help sections. If you don't want to have smth like

void
usage (status)
     int status;
{
  fprintf (status ? stderr : stdout, "\
Usage: %s [-nV] [--quiet] [--silent] [--version] [-e script]\n\
        [-f script-file] [--expression=script] [--file=script-file] [file...]\n",
       myname);
  exit (status);
}
and don't want this help section be stored in the separate file.You can simply embed binary data into your executable.

Consider you have data.txt:
$cat data.txt 
data file
You have to convert it to elf.
I know two ways:
  • use linker:

    ld -r -b binary -o data.o data.txt
  • use objcopy:

    objcopy -I binary -O elf32-i386 --binary-architecture i386 data.txt data.o

Both of these commands produce elf:
$readelf -a data.o 
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          96 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         5
  Section header string table index: 2

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .data             PROGBITS        00000000 000034 00000a 00  WA  0   0  1
  [ 2] .shstrtab         STRTAB          00000000 00003e 000021 00      0   0  1
  [ 3] .symtab           SYMTAB          00000000 000128 000050 10      4   2  4
  [ 4] .strtab           STRTAB          00000000 000178 000043 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

Symbol table '.symtab' contains 5 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 00000000     0 SECTION LOCAL  DEFAULT    1 
     2: 00000000     0 NOTYPE  GLOBAL DEFAULT    1 _binary_data_txt_start
     3: 0000000a     0 NOTYPE  GLOBAL DEFAULT    1 _binary_data_txt_end
     4: 0000000a     0 NOTYPE  GLOBAL DEFAULT  ABS _binary_data_txt_size

_binary_data_txt_size and _binary_data_txt_end contain 
Ok, you have data.o with your data in .data section and three symbols: _binary_data_txt_start, _binary_data_txt_end, _binary_data_txt_size

_binary_data_txt_end and _binary_data_txt_size have the same value here. So I'll use _binary_data_txt_size only.
Let's make a simple c program to use data from the object. It's a bit tricky.
#include <stdio.h>

extern int _binary_data_txt_start;
extern int _binary_data_txt_size;

int
main(int argc, char **argv)
{
    int size = (int)&_binary_data_txt_size;
    char *data = (char *)&_binary_data_txt_start;
    
    printf("%d", size);
    printf("%s", data);

    return 0;
}
_binary_data_txt_start and _binary_data_txt_size contain values in their addresses. So &_binary_data_txt_size contains not an address of the symbol but actually value of the symbol that holds the size of the data and &_binary_data_txt_start contains address of the data.

To compile

gcc test.c data.o

8 comments:

CodeMonkeySteve said...

Thanks, that is profoundly helpful! I was afraid I was going to have to do it The Hard Way ...

S.Humphrey said...

This works great on Linux but doesn't work at all on Windows.

On Windows, gcc gives me:
>gcc main.c data.o
.../ccJstKNT.o:main.c:(.text+0x2d): undefined reference to `_binary_data_txt_size'
.../ccJstKNT.o:main.c:(.text+0x34): undefined reference to `_binary_data_txt_start'
collect2: ld returned 1 exit status

It doesn't link properly.. or something

Ni@m said...

Hello, Bob!
I wonder if you have the _binary_data_txt_size and _binary_data_txt_start in your data.o object file.
How did you generate it? I'm not familiar with Windows port of gnu toolkit but if you've generated data.o with objcopy you should you should replace elf32-i386 with windows format - I think it should be pe32-i386 or smth like this. objcopy w/o arguments should show you supported targets in the end of its output.

S.Humphrey said...

I tried playing with it but to no avail.

$wine objcopy -I binary -O pe-i386 --binary-architecture i386 data.txt data.o

Works fine.

$wine objdump -t data.o
data.o: file format pe-i386

SYMBOL TABLE:
[ 0] ... 0x00000000 _binary_data_txt_start
[ 1] ... 0x0000000c _binary_data_txt_end
[ 2] ... 0x0000000c _binary_data_txt_size

Looks alright.


$ wine gcc main.c data.o
...:main.c:(.text+0x2d): undefined ...:main.c:(.text+0x34): undefined reference to `_binary_data_txt_end'
collect2: ld returned 1 exit status

Fail.


objdump for the same Linux-compiled file is a bit different and it doesn't contain a lot of junk like the pe-i386 does.

$ objdump -t data.o
data.o: file format elf32-i386

SYMBOL TABLE:
00000000 l d .data 00000000 .data
00000000 g .data 00000000 _binary_data_txt_start
0000000c g .data 00000000 _binary_data_txt_end
0000000c g *ABS* 00000000 _binary_data_txt_size

Since it's elf32 I can use readelf:

$ readelf -s data.o

Symbol table '.symtab' contains 5 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 SECTION LOCAL DEFAULT 1
2: 00000000 0 NOTYPE GLOBAL DEFAULT 1 _binary_data_txt_start
3: 0000000c 0 NOTYPE GLOBAL DEFAULT 1 _binary_data_txt_end
4: 0000000c 0 NOTYPE GLOBAL DEFAULT ABS _binary_data_txt_size

I don't know what's going wrong. It complains when I corrupt the data.o file:
$wine gcc main.c data.o
data.o: could not read symbols: File in wrong format
collect2: ld returned 1 exit status

- Bob

Ni@m said...

This looks strange for me:

$wine objdump -t data.o
data.o: file format pe-i386

SYMBOL TABLE:
[ 0] ... 0x00000000 _binary_data_txt_start
[ 1] ... 0x0000000c _binary_data_txt_end
[ 2] ... 0x0000000c _binary_data_txt_size


It doesn't show to which section symbols belong. Can you check the output of objdump against any other PE object file?

S.Humphrey said...

I ...ed out some stuff because it didn't seem very useful and it was too long.


$ wine objdump -t data.o
data.o: file format pe-i386

SYMBOL TABLE:
[ 0](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 _binary_data_txt_start
[ 1](sec 1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x0000000c _binary_data_txt_end
[ 2](sec -1)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x0000000c _binary_data_txt_size



Here's another file, color.o that was compiled with MinGW:


$wine objdump -t color.o
err:winedevice:ServiceMain driver L"Vax347s" failed to load

color.o: file format pe-i386

SYMBOL TABLE:
[ 0](sec -2)(fl 0x00)(ty 0)(scl 103) (nx 1) 0x00000000 color.cpp
File
[ 2](sec 1)(fl 0x00)(ty 20)(scl 3) (nx 1) 0x00000000 __Z41__static_initialization_and_destruction_0ii
AUX tagndx 0 ttlsiz 0x0 lnnos 0 next 0
[ 4](sec 6)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x00000000 .text$_ZN9Pineapple5ColorC1Effff
AUX scnlen 0x28 nreloc 0 nlnno 0 checksum 0x0 assoc 0 comdat 2
[ 6](sec 6)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x00000000 __ZN9Pineapple5ColorC1Effff
[ 7](sec 1)(fl 0x00)(ty 20)(scl 3) (nx 0) 0x00000236 __GLOBAL__I__ZN9Pineapple5Color5BLACKE
[ 8](sec 1)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x00000000 .text
AUX scnlen 0x252 nreloc 21 nlnno 0
[ 10](sec 2)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x00000000 .data
AUX scnlen 0x0 nreloc 0 nlnno 0
[ 12](sec 3)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x00000000 .bss
AUX scnlen 0x1c nreloc 0 nlnno 0
[ 14](sec 4)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x00000000 .stab
AUX scnlen 0x33c nreloc 8 nlnno 0
[ 16](sec 5)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x00000000 .stabstr
AUX scnlen 0xb2f nreloc 0 nlnno 0
[ 18](sec 7)(fl 0x00)(ty 0)(scl 3) (nx 1) 0x00000000 .ctors
AUX scnlen 0x4 nreloc 1 nlnno 0
[ 20](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000000 __ZN9Pineapple5Color5BLACKE
[ 21](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000004 __ZN9Pineapple5Color5WHITEE
[ 22](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000008 __ZN9Pineapple5Color3REDE
[ 23](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x0000000c __ZN9Pineapple5Color5GREENE
[ 24](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000010 __ZN9Pineapple5Color4BLUEE
[ 25](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000014 __ZN9Pineapple5Color6YELLOWE
[ 26](sec 3)(fl 0x00)(ty 0)(scl 2) (nx 0) 0x00000018 __ZN9Pineapple5Color17TRANSPARENT_WHITEE
[ 27](sec 0)(fl 0x00)(ty 20)(scl 2) (nx 0) 0x00000000 __Znwj


- Bob

Ni@m said...

Hello, Bob!
I'm sorry for a long delay here. Too much work now.
I can't see anything strange in output of readelf or objdump.
This might be a bug in gnu toolkit for windows.
I believe you can use xxd -i /path/to/data data.h to generate header file with blob array. This should be more portable.

Anonymous said...

Sorry about replying to an old thread, but I wanted to post this comment so others can figure out the solution to the Win32 problem encountered here. The issue is as follows: GCC on Win32 (and I suspect that other compilers on that platform, too) automatically prepend a _ to all symbols they define from compiled sources. So if you define

extern int _binary_data_txt_start;

ld, the linker, looks for a symbol called

__binary_data_txt_start

(note the two _s in the beginning.

The solution would be, in Windows, writing

extern int binary_data_txt_start;

Which will link normally. Probably setting up a preprocessor macro like

#if defined(_WIN32)

#define DECLARE_BLOB(name) \
extern int binary_ ## name ## _start; \
extern int binary_ ## name ## _size

#define BLOB_ADDRESS(name) \
((void *) & binary_ ## name ## _start)

#define BLOB_LENGTH(name) \
((unsigned int) & binary_ ## name ## _size)

#else

#define DECLARE_BLOB(name) \
extern int _binary_ ## name ## _start; \
extern int _binary_ ## name ## _size

#define BLOB_ADDRESS(name) \
((void *) & binary_ ## name ## _start)

#define BLOB_LENGTH(name) \
((unsigned int) & binary_ ## name ## _size)

#endif

you can use the idioms

DECLARE_BLOB(data_txt);

void usage(int status) {
fprintf(status ? stderr : stdout, "%.*s\n", BLOB_LENGTH(data_txt), BLOB_ADDRESS(data_txt));
}

to print the data.txt file portably and, thanks to the %.*s printf directive, without care to whether your data_txt blob ends with a null byte.