Sunday, November 30, 2008

FreeBSD: sem_open bug

Recently I've been testing libdodo on FreeBSD 7.0
Any time sem_open was called I received 'bad system call' and abort signal from ksem_open routine.

sem_open is buggy in FreeBSD, man 3 sem_open:

BUGS
     This implementation places strict requirements on the value of name: it
     must begin with a slash (`/'), contain no other slash characters, and be
     less than 14 characters in length not including the terminating null
     character.
Anyway I was giving sem_open 14 bytes long key with leading '/' and it continued to fail.
Program received signal SIGSYS, Bad system call.
[Switching to Thread 0x28f01100 (LWP 100056)]
0x2891c84b in ksem_open () from /lib/libc.so.7
(gdb) bt
#0  0x2891c84b in ksem_open () from /lib/libc.so.7
#1  0x2891209c in sem_open () from /lib/libc.so.7
#2  0x28767386 in single (this=0x804b520, value=1, a_key=@0x804b4dc) at src/pcSyncProcessDataSingle.cc:58
#3  0x0804969c in __static_initialization_and_destruction_0 (__initialize_p=Variable "__initialize_p" is not available.
) at test.cc:24
#4  0x0804aaf5 in __do_global_ctors_aux ()
#5  0x080491ed in _init ()
#6  0x00000000 in ?? ()
#7  0x00000000 in ?? ()
#8  0xbfbfecbc in ?? ()
#9  0x080495a6 in _start ()
#10 0x00000001 in ?? ()
(gdb) up
#1  0x2891209c in sem_open () from /lib/libc.so.7
(gdb) 
#2  0x28767386 in single (this=0x804b520, value=1, a_key=@0x804b4dc) at src/pcSyncProcessDataSingle.cc:58
58  semaphore = sem_open(key.c_str(), O_CREAT, S_IWUSR, value);
(gdb) p key
$1 = {static npos = 4294967295, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x28f4a24c "/c5b80cc1e0"}}
(gdb) p value
$2 = 1
The bug is somewhere in the kernel. I should look for another implementation of semaphores in FreeBSD.

Thursday, November 27, 2008

gmake: command echoing

By default gmake prints each command line before it is executed.
I knew the method to suppress this output but because I haven't bothered about it for some time I actually forgot how to do that.
Recently I wanted to make build output of libdodo pretty and tried to recall how to tell gmake not to print command lines.
gmake manual has a chapter that describes what should be done to achieve this.
'@' sign before the command tells gmake to suppress the command line echoing:

@echo "-- Compiling: $^"
@$(CXX) $(DEFINES) $(CPPFLAGS) $(CFLAGS) -fPIC -c $^
Long g++ command line won't appear on the output, instead you'll see
-- Compiling: 'the source file'.cc

Tuesday, November 25, 2008

c++: virtual constructor

Recently I was asked about an virtual constructor in c++.
The conception of virtual constructor makes me confused.
First of all on the construction stage object doesn't have v-table. The construction process goes from the base class to the current. So current object doesn't know anything about the derived classes on the constructing stage. Even if class could somehow know(if the v-table was available in constructor) it's nonsense to call functions from the derived objects that haven't been constructed. It may cause undefined behavior if those virtual functions work the the data from their own class.
Specially for me, constructor is the place where the object is being initialized - the class members are initialized by given or default values, some actions to prepare object for the further work. There shouldn't be any other calls that can be resolved via v-table(to be concrete - virtual functions). If there are some, and you expect them to be called in the constructor, they probably should be called in a special 'init' virtual method defined by user and exposed in the documentation that it should be called just after the constructor and it should be redefined in derived classes. And only when the object construction is complete the 'init' method should be called.
There are some idioms of virtual constructor you can find in the Internet which suggest to create special virtual method 'construct' that return new copy of the object using 'new' to construct it. It may cause some troubles with memory leaks but to avoid this the smart pointers can be used.
Anyway this idioms hide the initial meaning of the constructor - to construct the object.

Where 'virtual' constructor can be used?
If, say, you have class A, which works with local data. On construction stage you want to connect to the local storage. Later you want to define class B that works with the remote storage. You are trying to connect to the remote storage in the constructor. You have realized, that can do connection to the storage in 'connect' method that is virtual. When you call 'connect' in the constructor you expect that in case of

A *a = new B;
B::connect will be called in the constructor. But how A knows how to perform connection to the remote storage if the connection metadata will be available in constructor of class B that will be called later?
With understanding of concepts of object construction it should be clear that constructors can not be virtual, or better to say they _should_not_ call functions as virtual.
It's much better to expose 'connect' method to make a connection rather hide it in constructor.
The worse can happen if 'connect' throws exceptions. It's not a good idea to throw exceptions in constructor. It's much harder to work with constructors that may throw exceptions.
Usually I don't expect that exception is going to be thrown in constructor, but if it was - the things are really bad and the program probably should finish here. But if I can't connect - well, probably I should wait some time and try to connect later. It's more expressive to put 'connect' into try...catch block than to put
A *a = new B;
there and try to reconstruct object each time the connection failed. This strategy even takes more memory/cpu resources.

Thursday, November 20, 2008

linux: linux-gate.so.1

I recently read very nice article by Johan Petersson about what is linux-gate.so.1 that is linked to all ELF binaries(that compiled to use shared libraries) on x86 in linux.
He mentioned that linux-gate.so.1 has always the same address in the executable.
This is rather dangerous, as described in "Exploiting with linux-gate.so.1" paper. You can exploit process via linux-gate.so.1 because it's address is always known. Moreover, it has the same address in all ELF files in the system. Determining the address of linux-gate.so.1 in any of ELF file on the machine and having exploit you are able to take control over almost every process in the system.
It's possible to manipulate vdso address or disable it completely with setting appropriate value to /proc/sys/vm/vdso_enabled:

0: no vdso at all
1: random free page(works only if /proc/sys/kernel/randomize_va_space set to 1)
2: top of the stack
Disabling it is a not good idea because the system even can become unusable. But putting it into random free page is good solution. It may break debugger and/or reduce performance a bit.

Tuesday, November 18, 2008

perl: default input and pattern-searching space

Perl has plenty of special variables.
The most usable is probably $_.
$_ stands for default input and pattern-searching space.
$_ implicitly assigned to input streams, subroutine variables, pattern-searching space(when used without an =~ operator).
$_ is a default iterator variable in a foreach loop if no other variable is supplied
The following block

while (<STDIN>)
{
    s/[A-Z]*//g;
    print;
}
is equivalent to
while ($_ = <STDIN>)
{
    $_ =~ s/[A-Z]*//g;
    print $_;
}
$_ is a global variable so this can produce some unwanted side-effects in some cases. The output of the following code
while (<STDIN>)
{
    print;
    last;
}
print;
{
    print;
    while (<STDIN>)
    {  
        s/[A-Z]*//g;
        print;
        last;
    }  
    print;
}
print;
should be
abcABC<<-- my input string
abcABC
abcABC
abcABC
abcABC<<-- my input string
abc
abc
abc
It's possible to declare $_ with my to be relative to the scope of the block(in perl 5.9.1 and later) and using our restores the global scope of the $_.
The output of the this code
while (<STDIN>)
{
    print;
    last;
}
print;
{
    print;
    my $_;
    while (<STDIN>)
    {  
        s/[A-Z]*//g;
        print;
        last;
    }  
    print;
}
print;
should be
abcABC<<-- my input string
abcABC
abcABC
abcABC
abcABC<<-- my input string
abc
abc
abcABC
and with our
while (<STDIN>)
{
    print;
    last;
}
print;
{
    print;
    my $_;
    while (<STDIN>)
    {  
        s/[A-Z]*//g;
        print;
        last;
    }  
    our $_;
    print;
}
print;
should be
abcABC<<-- my input string
abcABC
abcABC
abcABC
abcABC<<-- my input string
abc
abcABC
abcABC
Unfortunately perl 5.10 is not by default in most linux distribution and some workarounds should be done to achieve functionality of my and our with $_.

Friday, November 14, 2008

c++: name lookup changes in g++ 4.3

Recently I've failed to compile my code with g++ 4.3. The error message was something like this:

test.cc:2: error: changes meaning of ‘A’ from ‘class A’
I've created a test case to discover a problem:
class A {};

class B
{
    void foo(const A &a){}
    void A(){}
};
If you try to compile this code with g++ 4.3 you will definitely get
(~~) g++ test.cc -c
test.cc:7: error: declaration of ‘void B::A()’
test.cc:2: error: changes meaning of ‘A’ from ‘class A’
What's going on here?
Method A from class B changes the meaning of the type of the parameter in method foo.
gcc 4.3 now errors out on certain situations in c++ where a given name may refer to more than one type or function. Here name A refer to function B::A and class A.

The reason that this isn't allowed in c++ is because if in the definition of B we write A(), it is ambiguous whether we want to instantiate an object of type class A or call this->A().
It's possible to fix such code in two ways.
To rename one of the names. It's not always possible if you really want these names to stay. There not a lot of synonyms :).
The second one is more technical: to move one of the names such that it is not in the scope:
class A {};

class B
{
    void foo(const ::A &a){}
    void A(){}
};
This code would be compiled w/o any errors from gcc side.

In release notes to gcc 4.3 it's additionally mentioned that -fpermissive option can be used as a temporary workaround to convert the error into a warning until the code is fixed.

I wonder how gcc made a deal with this in previous versions. The next code won't compile with g++ 4.3 but you will succeed with -fpermissive option. Let's see how g++ deal with this ambiguous situation.
#include <iostream>

using namespace std;

class A
{
    public:
        A()
        {  
            std::cout << "A::A" << std::endl;
        }  
        void operator ()()
        {  
            std::cout << "A::operator ()" << std::endl;
        }  
};

class B
{
    public:
        void operator ()()
        {  
            std::cout << "B::operator ()" << std::endl;
        }  
        void foo(A a) 
        {  
        }  
        B A()
        {  
            std::cout << "B::A" << std::endl;

            return B();
        }  
        void bar()
        {  
            ::A()();
            A()();
        }  
};

int main(int argc, char **argv)
{
    B b; 
    b.bar();

    return 0; 
}
(~~) g++ test.cc -fpermissive -o test
test.cc:34: warning: declaration of ‘B B::A()’
test.cc:7: warning: changes meaning of ‘A’ from ‘class A’
(~~) ./test 
A::A
A::operator ()
B::A
B::operator ()
You can see that function B::A will be used in the meaning of current scope, we have to use scope resolution operator to get class A constructed.
To be honest I always expected such behavior, not sure why this should be an ambiguous.
If we use name A, we use the most close to the current scope, to use name from the other namespace we use namespace resolution operator.

Thursday, November 13, 2008

bash: defend yourself from overwriting files

I suppose almost everybody put '>' instead of '>>' to redirect the output to file.
In bash it's possible to set noclobber option to avoid file overwriting.

$touch test
$set -o noclobber
$echo test > test
bash: test: cannot overwrite existing file
If you really know what you are doing you you can use '>|' to overwrite the file successfully.
$set -o | grep noclobber
noclobber       on
$echo test >| test
$cat test
test
Very useful option and I think it should be on by default.

Tuesday, November 11, 2008

c: variable length arrays

Another feature of c99 is variable length arrays.
Before c99 array size had to be declared during compile time. Now array is an array of automatic storage duration whose length is determined at run time.

int size = strlen(*argv);
char array[size];

The variable-sized arrays can be used only in stack scope. The memory for this type of arrays is gotten from the stack, so in file(global) scope you still unable to define variable-sized arrays.

This is not the same as alloca.
Variable size arrays' space is freed at the end of the scope of the name of the array while the space allocated with alloca remains until the end of the function.
Use alloca within a loop it's possible to allocate an additional block on each iteration. This is impossible with variable-sized arrays.

c++ doesn't have this feater but g++ supports it as an extension.

Sunday, November 2, 2008

c++: virtual inheritance

Virtual inheritance is an important thing when we are talking about multiply inheritance.
Basically you can find term 'virtual base class' which means that the base class that is met in inheritance tree is shared between derived classes.
Let's look on the inheritance tree w/o virtual base class

class A
{
};
class B : public A
{
};
class D : public A
{
};
class E : public B, public D
{
};
Class E will have 2 copies of A(derived from B and D). To be more concrete let's look at the output of -fdump-class-hierarchy g++ option
Class A
   size=1 align=1
   base size=0 base align=1
A (0xb7f32680) 0 empty

Class B
   size=1 align=1
   base size=1 base align=1
B (0xb7f326c0) 0 empty
  A (0xb7f32700) 0 empty

Class D
   size=1 align=1
   base size=1 base align=1
D (0xb7f32740) 0 empty
  A (0xb7f32780) 0 empty

Class E
   size=2 align=1
   base size=2 base align=1
E (0xb7f327c0) 0 empty
  B (0xb7f32800) 0 empty
    A (0xb7f32840) 0 empty
  D (0xb7f32880) 1 empty
    A (0xb7f328c0) 1 empty
Indeed, the most obvious is the overhead: E contains 2 instances of A(by addresses 0xb7f32840 and 0xb7f328c0).
The other thing you are unable to call methods of A from E directly. There is no distinct path from E to A. The following code wouldn't be compiled. The compiler will raise an error that reference to methodA is ambiguous.
class A
{
    public:
        virtual void methodA(){}
};
class B : public A
{
};
class D : public A
{
};
class E : public B, public D
{
    virtual void methodE(){ methodA(); }
};
In this case you should explicitly call methodA either from B or D
virtual void methodE(){ B::methodA(); D::methodA(); }
Also you can face a problem with A as a base of E
A *a = new E;//‘A’ is an ambiguous base of ‘E’
You say you can do smth like this
A *a; 
    E *e = new E;
    void *v = (void *)e;
    a = (A *)v;
No chance to expect defined behavior with this piece of c-ish code.

Now let's look how things change w/ virtual base class.
class A
{
};
class B : virtual public A
{
};
class D : virtual public A
{
};
class E : public B, public D
{
};
Class A was defined as a virtual base class in the code above. Let's look what g++ says
Class A
   size=1 align=1
   base size=0 base align=1
A (0xb7f7a680) 0 empty

Class B
   size=4 align=4
   base size=4 base align=4
B (0xb7f7a6c0) 0 nearly-empty
    vptridx=0u vptr=((& B::_ZTV1B) + 12u)
  A (0xb7f7a700) 0 empty virtual
      vbaseoffset=-0x00000000c

Class D
   size=4 align=4
   base size=4 base align=4
D (0xb7f7a7c0) 0 nearly-empty
    vptridx=0u vptr=((& D::_ZTV1D) + 12u)
  A (0xb7f7a800) 0 empty virtual
      vbaseoffset=-0x00000000c

Class E
   size=8 align=4
   base size=8 base align=4
E (0xb7f7a880) 0
    vptridx=0u vptr=((& E::_ZTV1E) + 12u)
  B (0xb7f7a8c0) 0 nearly-empty
      primary-for E (0xb7f7a880)
      subvttidx=4u
    A (0xb7f7a900) 0 empty virtual
        vbaseoffset=-0x00000000c
  D (0xb7f7a940) 4 nearly-empty
      subvttidx=8u vptridx=12u vptr=((& E::_ZTV1E) + 24u)
    A (0xb7f7a900) alternative-path
Class E has one instance of A(by address 0xb7f7a900). We got rid of overhead. In the output you can see that there is only one path from A to E. There is no problem with compiling the following code
class E : public B, public D
{
    virtual void methodE(){ methodA(); }
};
And A can be used as a base class for E
A *a = new E;
With virtual inheritance we achieve better object model but we can loose some performance in order to run-time resolving paths to base from derived and from base to derived classes through v-table. With small classes we can even get overhead if v-table is pretty big.

The other thing you should be aware that c-style casting between derived and base classes(both ways) may break your program. Use dynamic_cast instead. That is because with the v-table classes not more of POD(Plain Old Data) types. c-style casts won't work correctly with non-POD types.