Wednesday, October 29, 2008

c++: specialization of template function

There some differences between template classes and template functions.
In c++ it isn't possible to overload classes but it's possible to overload functions.

Since you are able to overload function you should consider what actually you are doing.
Lets glance at the code below

template<typename T>
void func(T t) 
{
}

template<>
void func<int *>(int *t)
{
}

template<typename T>
void func(T *t)
{
}
Here is template function, specialization of template function and overloaded template function(it's still primary template).
What function will be called in case of
int *i = new int(10);
func(i);
Let's see. We have 2 primary template and 1 template specialization. The standard(13.3.2 Function Template Overloading) says that non-template functions have the highest priority(we don't have any here). Then if there no non-template functions(or none of them suitable) primary-template functions are being examined. There is no defined behavior which would be chosen among the available, probably compiler will ask developer to explicitly specify the template function(we have 2 primary template functions). If there are some specialization of template function compiler can make a decision w/o developer which is the most suitable in case when there is no primary template function that fits better.
In the code above primary template function will be called
template<> void func<int *>(int *t)
Is it possible to call specialized template function for the code below?
int *i = new int(10); func(i);
The answer is yes and the thing that should be done is specialization of overloaded template function
template<>
void
func<int>(int *t)
{
}
This is the specialization of the
template<typename T>
void func(T *t)
{
}
The specialization should appear _after_ the primary template in the code, so the complete piece of code is
template<typename T>
void func(T t) 
{
}

template<>
void func<int *>(int *t)
{
}

template<typename T>
void func(T *t)
{
}

template<>
void func<int>(int *t)
{
}
With this code
template<> void func<int>(int *t)
will be called for
int *i = new int(10);func(i);
The key is that the specialization is not overloading, to be more clear specialization _do_not_ overload functions. Specialization of template function is being examined only after the primary template has been chosen.
The better way I see here is to create non-template function
void func(int *t)
{
}
This one will be chosen first if there is no ambiguity in the prototypes.

Tuesday, October 28, 2008

c: designated initializers for structures

c99 came with a feature of designated initializers.
A designated initializer, or designator, points out a particular element to be initialized. A designator list is a comma-separated list of one or more designators.

struct
{
    int a; 
    int b; 
    int c; 
} s = {.a = 1, .c = 2};
Unfortunately c++ doesn't have this very handy feature.

Sunday, October 26, 2008

c/c++: call stack

A lot of languages such as python, java, ruby etc. produce call trace on exception.

In c/c++ you don't have such opportunity.
Only playing with a debugger it's possible to view the call trace.

In glibc(since 2.1) you can find backtrace function defined in execinfo.h
This function returns a backtrace for the calling program(addresses of the functions).
Using backtrace function with backtrace_symbols you can get symbol names.
backtrace_symbols returns the symbolic representation of each address consists of the function
name (if this can be determined), a hexadecimal offset into the function, and the actual return address (in hexadecimal) according to the man pages. It should look like:

message: ./a.out(_Z1cv+0x19) [0x80488ad]
message: ./a.out(_Z1bv+0xb) [0x804896f]
message: ./a.out(_Z1av+0xb) [0x804897c]
message: ./a.out(main+0x16) [0x8048994]
message: /lib/libc.so.6(__libc_start_main+0xfb) [0xb7e2764b]
In c++, due to function overloading, namespaces, etc., names of the functions are being mangled. It's hard to read this traceback. I really didn't want to parse each string to extract function names but tried to use dladdr, glibc extension for dlfcn. dladdr returns function name and the pathname of shared object that contains this function. Unfortunately dladdr still returns mangled name of the function.
Unexpectedly I have found __cxa_demangle hidden in the caves of g++. This function demangles function name. This is what I was looking for.
The short example of traceback generation:
#include <stdio.h>
#include <signal.h>
#include <execinfo.h>
#include <cxxabi.h>
#include <dlfcn.h>
#include <stdlib.h>

void c()
{
    using namespace abi;

    enum
    {  
        MAX_DEPTH = 10 
    }; 

    void *trace[MAX_DEPTH];

    Dl_info dlinfo;

    int status;
    const char *symname;
    char *demangled;

    int trace_size = backtrace(trace, MAX_DEPTH);

    printf("Call stack: \n");

    for (int i=0; i<trace_size; ++i)
    {  
        if(!dladdr(trace[i], &dlinfo))
            continue;

        symname = dlinfo.dli_sname;

        demangled = __cxa_demangle(symname, NULL, 0, &status);
        if(status == 0 && demangled)
            symname = demangled;

        printf("object: %s, function: %s\n", dlinfo.dli_fname, symname);

        if (demangled)
            free(demangled);
    }  
}

void b()
{
    c();
}

void a()
{
    b();
}

int main(int argc, char **argv)
{
    a();

    return 0;
}
The executable should be compiled with -rdynamic g++ flag to instruct the linker to add all symbols, not only used ones, to the dynamic symbol table.
g++ test.cc -ldl -rdynamic -o test
The output:
Call stack: 
object: ./test, function: c()
object: ./test, function: b()
object: ./test, function: a()
object: ./test, function: main
object: /lib/libc.so.6, function: __libc_start_main
If gcc optimization have been used you may not see the whole traceback in some cases. With -O2/-O3
Call stack: 
object: ./test, function: c()
object: ./test, function: main
object: /lib/libc.so.6, function: __libc_start_main

perl: $#

I was confused by some perl tutorials/books that claim you can use $# to get the size of an array.

Actually $# returns the last index of the array. The things can be messed with $[, special perl variable that stands for the index of the first element in an array.
But if environment haven't been modified $# will return 'size of the array' - 1.

To get amount of elements the array should be used in scalar context or using scalar function.

Please, don't use $# to get the amount of the elements in the array. You may confuse your followers and yourself.

Saturday, October 25, 2008

c: initializing arrays

There various ways to initialize array in c99.

One dimensional array you can initialize just enumerating the values sequentially. Values not initialized explicitly will be initialized with zeros.

int array[3] = {1, 2, 3};
Also you can specify values with designated initializers.
int array[3] = {1, [2] = 3};
In the example above array[0] is 1, array[1] is 0(implicitly initialized) and array[2] is 3.
Multidimensional arrays can be initialized with enumerating the values sequentially.
int array[2][2] = {1, 2, 3, 4};
The values 1, 2, 3, 4 will be assigned to array[0], array[1], array[2], array[3] correspondingly. Grouping the values of the elements of nesting level of elements is more expressive.
int array[2][2] = {{1, 2}, {3, 4}};
As one dimensional arrays multidimensional can be initialized by designated initializers assigning value to each element
int array[2][2] = {[0][0] = 1, [0][1] = 2, [1] = {3, 4}};
or grouping by level
int array[2][2] = {[0] = {1, 2}, [1] = {3, 4}};

Tuesday, October 21, 2008

linux: change the process' name

From time to time developers want to modify the command line of program that is shown in ps or top utilities.
There is no API in linux to modify the command line.
This article is linux specific but I hope that following these steps you can do it in almost every OS(BTW, FreeBSD has setproctitle routine that should do the job).

Both ps and top tools comes from procps package in all linux distribution I worked with. So they have the same base and let's stop on ps because it's a bit easier to investigate it.
To figure out how ps gets information about the processes let's simply trace it

$strace ps aux 2>&1 1>/dev/null|grep $$
stat64("/proc/4391", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/4391/stat", O_RDONLY)       = 6
read(6, "4391 (bash) S 4388 4391 4349 348"..., 1023) = 222
open("/proc/4391/status", O_RDONLY)     = 6
open("/proc/4391/cmdline", O_RDONLY)    = 6
readlink("/proc/4391/fd/2", "/dev/pts/1", 127) = 10
read(6, "4760 (strace) S 4391 4760 4349 3"..., 1023) = 208
read(6, "4761 (grep) S 4391 4760 4349 348"..., 1023) = 196
read(6, "grep\0004391\0", 2047)         = 10
As expected it reads information from procfs.
$cat /proc/$$/cmdline
bash
Now prepare yourself, I'm going to dig into the kernel sourses.

Looking into fs/proc/base.c I found desired function proc_pid_cmdline that is used to show the command line in /proc/<pid>/cmdline
The part of it that we are interested:
static int proc_pid_cmdline(struct task_struct *task, char * buffer) {
...
    struct mm_struct *mm = get_task_mm(task);

...

    len = mm->arg_end - mm->arg_start;
 
    if (len > PAGE_SIZE)
        len = PAGE_SIZE;

    res = access_process_vm(task, mm->arg_start, buffer, len, 0);

    // If the nul at the end of args has been overwritten, then
    // assume application is using setproctitle(3).
    if (res > 0 && buffer[res-1] != '\0' && len < PAGE_SIZE) {
        len = strnlen(buffer, res);
        if (len < res) {
            res = len;
        } else {
            len = mm->env_end - mm->env_start;
            if (len > PAGE_SIZE - res)
                len = PAGE_SIZE - res;
            res += access_process_vm(task, mm->env_start, buffer+res, len, 0);
            res = strnlen(buffer, res);
        }    
    }    
...
That's funny but in comments mentioned setproctitle(3). After I saw these lines I tried to find setproctitle for linux but failed. It's interesting why is mentioned here as it available in FreeBSD but not in linux.
Anyway let's move forward.
The most interesting parts here are
len = mm->arg_end - mm->arg_start;
 
    if (len > PAGE_SIZE)
        len = PAGE_SIZE;

    res = access_process_vm(task, mm->arg_start, buffer, len, 0);
access_process_vm, defined in mm/memory.c, accesses another process' address space. The prototype:
int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write)
Fifth argument write is a flag, if it is 0 then access_process_vm reads len bytes of the process' memory to buf starting from address addr.
Going back to proc_pid_cmdline we can see that start address of the string with the command line is mm->arg_start and its length is mm->arg_end - mm->arg_start but not bigger than PAGE_SIZE. PAGE_SIZE is set to 4096 bytes as defined in include/asm-i386/page.h
#define PAGE_SHIFT 12
#define PAGE_SIZE (1UL << PAGE_SHIFT)
Well, now we know that we have 4KB where we can write.

Who fills the memmory between mm->arg_start and mm->arg_end and what is stored there?
I hope everybody uses ELF binary format now. So let's go to fs/binfmt_elf.c
The name of the function that creates process environment is create_elf_tables.
The part of it we are actually interested in:
static int
create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
        unsigned long load_addr, unsigned long interp_load_addr) {
...
    /* Populate argv and envp */
    p = current->mm->arg_end = current->mm->arg_start;
    while (argc-- > 0) {
        size_t len;
        if (__put_user((elf_addr_t)p, argv++))
            return -EFAULT;
        len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
        if (!len || len > MAX_ARG_STRLEN)
            return -EINVAL;
        p += len;
    }
    if (__put_user(0, argv))
        return -EFAULT;
    current->mm->arg_end = current->mm->env_start = p;
    while (envc-- > 0) {
        size_t len;
        if (__put_user((elf_addr_t)p, envp++))
            return -EFAULT;
        len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
        if (!len || len > MAX_ARG_STRLEN)
            return -EINVAL;
        p += len;
    }
    if (__put_user(0, envp))
        return -EFAULT;
    current->mm->env_end = p;
...
According to create_elf_tables argv points to current->mm->arg_start and current->mm->arg_end points to the end of the environment(envp).

To modify cmdline of the process you have to overwrite memory between current->mm->arg_start and current->mm->arg_end and to keep program's integrity move its environment.

Looking into the sources of getenv/setenv we can see that they access **environ variable(environ(7)). environ is declared in the <unistd.h> but it's preffered to declare it in the user program as
extern char **environ;
The following code rewrites cmdline of the process and moves environment to it's new 'home'.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <limits.h>
#include <sys/user.h>

extern char **environ;

int main(int argc, char **argv)
{   
    unsigned int pid = getpid();
    char proc_pid_cmdline_path[PATH_MAX];
    char cmdline[PAGE_SIZE];
    
    sprintf(proc_pid_cmdline_path, "/proc/%d/cmdline", pid);
    
    FILE *proc_pid_cmdline =  fopen(proc_pid_cmdline_path, "r");
    fgets(cmdline, PAGE_SIZE, proc_pid_cmdline);
    fclose(proc_pid_cmdline);
    
    printf("%s : %s\nenvironment variable HOME = %s\n", proc_pid_cmdline_path, cmdline, getenv("HOME"));
    
    int env_len = -1;
    if (environ) 
        while (environ[++env_len])
            ;
    
    unsigned int size;
    if (env_len > 0)
        size = environ[env_len-1] + strlen(environ[env_len-1]) - argv[0];
    else
        size = argv[argc-1] + strlen(argv[argc-1]) - argv[0];
    
    if (environ)
    {
        
        char **new_environ = malloc(env_len*sizeof(char *));
        
        unsigned int i = -1;
        while (environ[++i])
            new_environ[i] = strdup(environ[i]);
        
        environ = new_environ;
    }   
        
    
    char *args = argv[0];
    memset(args, '\0', size);
    snprintf(args, size - 1, "This is a new title and it should be definetely longer than initial one. Actually we can write %d bytes long string to the title. Well, it works!", size);
    
    proc_pid_cmdline =  fopen(proc_pid_cmdline_path, "r");
    fgets(cmdline, PAGE_SIZE, proc_pid_cmdline);
    fclose(proc_pid_cmdline);
    
    printf("%s : %s\nenvironment variable HOME = %s\n", proc_pid_cmdline_path, cmdline, getenv("HOME"));
    
    return 0; 
}
The output should be
$./chcmdline 
/proc/5865/cmdline : ./chcmdline
environment variable HOME = /home/niam
/proc/5865/cmdline : This is a new title and it should be definetely longer than initial one. Actually we can write 1843 bytes long string to the title. Well, it works!
environment variable HOME = /home/niam
There is an option to expand memory region that we can overwrite. Just assign environment variable for the process:
$ENV_VAR=$(perl -e 'print "0" x 4096;') ./chcmdline 
/proc/5863/cmdline : ./chcmdline
environment variable HOME = /home/niam
/proc/5863/cmdline : This is a new title and it should be definetely longer than initial one. Actually we can write 5948 bytes long string to the title. Well, it works!
environment variable HOME = /home/niam
That small perl script prints "0" 4096 times to stdout, so as we can see the memory size of environment grew up to 5948 bytes.

This code might be portable to other OSes, I don't know. In general in POSIX system with gcc compiler you might be successfull with this code.

Monday, October 20, 2008

bash: file descriptors

In bash you may open file descriptor for reading or(and) writing.
To create file descriptor for reading, writing, reading and writing use these commands:

[fd]<[source] #read-only
[fd]>[source] #write-only
[fd]<>[source] #read-write
[fd] is a digit that describes file descriptor and [source] can be either another file descriptor(should be with leading &) or any other source for reading or writing.
If the source is other file descriptor the new one will be a duplicate.
#file descriptor 3 is opened for reading
#and already contains '123' which comes from the 'echo' command
3< <(echo 1 2 3)
#file descriptor 3 is opened for writing
#which output will be printed to stdout by 'cat' tool
3> >(cat)
#file descriptor 3 is opened for reading/writing
#from/to test.file file
3<>test.file
To move descriptor add '-' suffix to the [source] descriptor.
3<&2- #stderr is redirected to descriptor 3 and closed
By executing
ls -l /proc/$$/fd
you can see what file descriptors are opened for current process.
Using exec for creating/redirecting/duplicating file descriptors allows the changes to take effect in the current shell. Otherwise they are local to subroutine.
#file descriptor 3 is local for 'while' loop
#you can't use it outside
while read line <&3; do echo $line; done 3<test.file
#file descriptor 3 is visible within current process
exec 3<test.file
while read line <&3; do echo $line; done
Special cases of redirection to stdin of the program are 'Here Documents' and 'Here Strings'.
#the shell reads input from the current source until
#a line containing only [word] (with no trailing blanks) is seen
<<[word]
#the shell reads input from the current source until
#a line containing only [word] (with no trailing blanks) is seen
#but all leading tab characters are stripped from input lines and the line containing [word]
<<-[word]
#[word] is expanded and supplied to the command on its standard input
<<<[word]
The examples
$exec 3<<EOF
> text is here
> EOF
$cat <&3
text is here

$exec 3<<<string
$cat <&3
string

Wednesday, October 15, 2008

gdb: modify values of the variables

While you debugging your application you likely want to change behavior a bit.
With gdb you can change values of the variables with set command:

(gdb) whatis a
type = int
By variable name
(gdb) p a
$1 = 5
(gdb) set var a = 10
(gdb) p a
$2 = 10
By variable address
(gdb) p a
$2 = 10
(gdb) p &a
$3 = (int *) 0xbff5ada0
(gdb) set {int}0xbff5ada0 = 15
(gdb) p a
$4 = 15

gdb: examining the core dumps

When your program receives SIGSEGV(Segmentation fault) kernel automatically terminates it(if the application doesn't handle SIGSEGV).
After a couple of long nights of debugging, tracing and drinking coffee you finally find the line in the sources where your application causes system to send this crafty signal.

The most general problem is that you usually unable to run application withing debugger. The fault may be caused by special circumstances. It's really painful to sit in front of the debugger and trying to reproduce the fault. More complexity add multi-threading/processing, network interaction.

Core dumps would be a good solution here.
The linux kernel is able to write a core dump if the application crashes. This core dump records the state of the process at the time of the crash.

Later you can use gdb to analyze the core dump.

Core dumps are disabled by default in linux.
To enable you should run

ulimit -c unlimited
By default kernel writes core dump in the current working directory of the application. You may customize the pattern of file path for core dumps by writing it to /proc/sys/kernel/core_pattern.
According to current documentation pattern consists of following templates
%%  A single % character 
%p  PID of dumped process 
%u  real UID of dumped process 
%g  real GID of dumped process 
%s  number of signal causing dump 
%t  time of dump (secs since 0:00h, 1 Jan 1970) 
%h  hostname (same as the 'nodename' returned by uname(2)) 
%e  executable filename
So with
echo /tmp/%e-%p.core > /proc/sys/kernel/core_pattern
linux should put core dumps into /tmp with -.core filename.
Let's try all this.
Say we have this code
void
crash()
{
    char a[0];
    free(a);
}
int
main(int argc, char **argv)
{
    crash();
    return 0;
}
As you can see application should cause segmentation violation on free call. Let's compile it
gdb test.c -g -o test
and execute
./test 
Segmentation fault (core dumped)
System tells us that core was dumped. Let's see what we have
ll /tmp/*core
-rw------- 1 niam niam  151552 2008-10-15 15:19 /tmp/test-25301.core
Got it. Now I'm going to run gdb
gdb --core /tmp/test-25301.core ./test
gdb clearly tells that application was terminated with SIGSEGV
Core was generated by `./test'.
Program terminated with signal 11, Segmentation fault.
#0  0xb7e4ff97 in free () from /lib/libc.so.6
Now we can use power of gdb to catch the problem code
(gdb) bt
#0  0xb7e4ff97 in free () from /lib/libc.so.6
#1  0x08048392 in crash () at 1.c:9#2  0x080483aa in main () at 1.c:15
(gdb) up
#1  0x08048392 in crash () at 1.c:99  free(a);
(gdb) p a
$1 = 0xbfc5daf8 "\b�ſ�\203\004\b�D�� �ſx�ſ��߷�����\203\004\bx�ſ��߷\001"
(gdb) whatis a
type = char [0]
(gdb) info frame
Stack level 1, frame at 0xbfc5db00:
 eip = 0x8048392 in crash (1.c:9);
 saved eip 0x80483aa called by frame at 0xbfc5db10, caller of frame at 0xbfc5daf0
 source language c.
 Arglist at 0xbfc5daf8, args:
 Locals at 0xbfc5daf8, Previous frame's sp is 0xbfc5db00
 Saved registers:
  ebp at 0xbfc5daf8, eip at 0xbfc5dafc
We can see here that free attempted to free memory of the stack. It shows 'whatis a' and we see that address of a is in the stack(esp holds 0xbfc5db00 and a is stored at 0xbfc5daf8 - just in the beginning of the stack).
gdb gave all needed information for further investigation. The only thing left is to understand who tought you to free array on the stack o_O.

Monday, October 13, 2008

c/c++: array subscripting operator [ ]

Everybody knows that operator [] allows to specify the element of the array.
Using the knowledge about the arrays, which says that the elements of the array are stored sequentially in the memory we can recall that the elements of the array can be accessed using pointer arithmetic.

int a[] = {1,2,3};
std::cout << "First:" << *a << std::endl;
std::cout << "Second:" << *(a + 1) << std::endl;
std::cout << "Third:" << *(a + 2) << std::endl
So, the expression a[index] is equivalent to the expression *(a + index). According to arithmetic rules
*(a + index) == *(index + a)
What does that mean? That means that using the definition of operator [], which says the expression a[b] is equivalent to the expression *((a) + (b)), we can write
int a[] = {1,2,3};
std::cout << "First:" << 0[a] << std::endl;
std::cout << "Second:" << 1[a] << std::endl;
std::cout << "Third:" << 2[a] << std::endl
This is similar to the first listing and to the traditional style
int a[] = {1,2,3};
std::cout << "First:" << a[0] << std::endl;
std::cout << "Second:" << a[1] << std::endl;
std::cout << "Third:" << a[2] << std::endl

Saturday, October 11, 2008

c++: implicit typenames in templates

Template object is an incomplete type, for an obvious reason in some cases it's hard for compiler to decide what typename should be. Let's look at the this example

template<typename T>
class A
{
    public:
        struct B
        {
            T member;
        };
};

template<typename T>
class C
{
    public:
        A<T>::B member;
};
Your(and mine also ;]) compiler will probably tell you " type A<T> is not derived from type C<T>". A<T> is undefined, so A<T>::B is undefined also.
To make this work you have to tell compiler explicitly that A<T>::B is a typename.
template<typename T>
class A
{
    public:
        struct B
        {
            T member;
        };
};

template<typename T>
class C
{
    public:
        typename A<T>::B member;
};

Friday, October 10, 2008

c++: template template parameters

Template template parameter allows to pass not complete template as a template parameter. Sounds like tongue twister. I hope an example below should demonstrate the usage of template template parameter

#include <vector>
template<typename T, template<typename T> class V>
class C
{
   V<T> v;
};
int main(int argc, char **argv)
{
   C<int,std::vector> c;
   return 0;
}

Thursday, October 9, 2008

css: absolute position

I do not do a lot of markup these days, but an article I read recently opened my eyes on the light side of absolute position of the elements.
If you place an element with absolute position into a box with relative position then child's position will be absolute relatively to the parent, it won't be removed from the document.

This one should be on the right of the box with red border.

c++: RVO and NRVO

RVO is stands for "return value optimization" and NRVO for "named return value optimization".
What does all this staff mean?

Typically, when a function returns an instance of an object, a temporary object is created and copied to the target object via the copy constructor.

RVO is a simple way of optimization when compiler doesn't create temporary when you return anonymous instance of class/structure/etc. from the function.

class C
{
    public:
        C() 
        {
            std::cout << "Constructor of C." << std::endl;
        }
        C(const C &)
        {
            std::cout << "Copy-constructor of C." << std::endl;
        }
};

C func()
{
    return C();
}

int main(int argc, char **argv)
{
    C c = func();

    return 0;
}
The output should be
Constructor of C.
Here compiler do not make a copy of C instance on return. This is a great optimization since construction of the object takes time. The implementation depends on compiler but in general compiler extends function with one parameter - reference/pointer to the object where programmer wants to store the return value and actually stores return value exact into this parameter.
It may look like
C func(C &__hidden__)
{
    __hidden__ = C();
    return;
}
NRVO is more complex. Say you have
class C
{
    public:
        C()
        {
            std::cout << "Constructor of C." << std::endl;
        }
        C(const C &c) 
        {
            std::cout << "Copy-constructor of C." << std::endl;
        }
};

C func()
{
    C c;
    return c;
}

int main(int argc, char **argv)
{
    C c = func();

    return 0;
}
Here compiler should deal with named object c. With NRVO temporary object on return shouldn't be created. The pseudocode of
C func()
{
    C c;
    c.method();
    c.member = 10;
    return c;
}
might look like
C func(C &__hidden__)
{
    __hidden__ = C();
    __hidden__.method();
    __hidden__.member = 10;
    return;
}
In both cases temporary object is not created for copying(copy-constructor is not invoked) from the function to the outside object.

When this may not work?
Situation I known when these optimizations won't work when function have different return paths with different named objects.

Wednesday, October 8, 2008

c++: inheritance from "template" class and dependance on template parameter

You may say it's quite weird that you are unable to access members of base class which depends on template parameter.

template<typename T>
class A
{
    public:
        T member;
};

template<typename T>
class B: public A<T>
{
    public:
        B();
};

template<typename T>
<T>::B()
{
    T t;
    member = t;
}
This code will raise an error that 'member' has not been found. An error occurs because of the interactions taking place in the c++ lookup rules. It comes down to that something is dependent upon some "<T>". Not that T depends upon say a typedef, but it depends upon a template parameter. In particular the use of A<T> depends upon the template parameter T, therefore the use of this base's members need to follow the rules of dependent name lookup, and hence are not directly allowed in the code above as written. To make this code work you may
  • Qualify the name with this->
    this->member = t;
  • Qualify the name with A<T>::
    A<T>::member = t;
  • Use a 'using' directive in the class template
    template<typename T>
    class A
    {
        public:
            T member;
    };
    
    template<typename T>
    class B: public A<T>
    {
        using A<T>::member;
        public:
            B();
    };
    
    template<typename T>
    B<T>::B()
    {
        T t;
        member = t;
    }
There is not everything clear with methods that depend on own template parameter.
template<typename T>
class A
{
    public:
        template<typename U>
        void func();
};

template<typename T>
template<typename U>
void
A<T>::func()
{
}

template<typename T>
class B: public A<T>
{
    public:
        B();
};

template<typename T>
B<T>::B()
{
    A<T>::func<int>();
    this->func<int>();
}
You can't write
this->member = t;
nor
A<T>::member = t;
The compiler assumes that the < is a less-than operator. In order for the compiler to recognize the function template call, you must add the template quantifier.
template<typename T>
B<T>::B()
{
    A<T>::template func<int>();
    this->template func<int>();
}
Some compilers(or their versions) don't actually parse the template until the instantiation. Those may successfully compile the code w/o specifying 'template' keyword. Without knowing the instantiation type, it can't know what 'func' refers to. In order to parse correctly, however, the compiler must know which symbols name types, and which name templates. 'template' keyword helps compiler to get that A<T>::func<int> is a template.

Monday, October 6, 2008

c++: function try-catch block

There is no hidden features in c++. Everything you can find in specification.
Somebody from Somewhere
There is an interesting feature in c++ - a function try-catch block. You can replace function body with try-catch block.
void function()
try
{
        //do smth
}
catch(...)
{
        //handle exception
}
This is almost similar to
void function()
{
        try
        {
                //do smth
        }
        catch(...)
        {
                //handle exception
        }
}
Quick notes:
  • the scope and lifetime of the parameters of a function or constructor extend into the function try-catch blocks
  • A function try-catch block on main() does not catch exceptions thrown in destructors of objects with static storage duration. In code below catch won't be called
    class A
    {
            public:
                    ~A()
                    {
                            throw "Exception in ~A";
                    }
    };
    
    int main(int argc, char **argv)
    try
    {
            static A a;
    }
    catch(const char *e)
    {
            std::cout << e << std::endl;
    }
    
  • A function try-catch block on main() does not catch exceptions thrown in constructors/destructors of namespace/global namespace scope objects. In code below catch won't be called
    namespace N
    {
            class A
            {
                    public:
                            A()
                            {
                                    throw "Exception in A";
                            }
            };
    
            A a;
    }
    
    int main(int argc, char **argv)
    try
    {
    }
    catch(const char *e)
    {
            std::cout << e << std::endl;
    }
  • The run time will rethrow an exception at the end of a function try-catch block's handler of a constructor or destructor. All other functions will return once they have reached the end of their function try block's handler
    class A
    {
            public:
                    A()
                    try
                    {
                            throw "Exception in A";
                    }
                    catch(const char *e)
                    {
                            std::cout << "In A: " << e << std::endl;
                    }
    };
    
    int main(int argc, char **argv)
    try
    {
            A a;
    }
    catch(const char *e)
    {
            std::cout << "In main: " << e << std::endl;
    }
    
    The output
    In A: Exception in A
    In main: Exception in A
    int function()
    try
    {
            throw "Exception in function";
    }
    catch(const char *e)
    {
            std::cout << "In function: " << e << std::endl;
    
            return 1;
    }
    
    int main(int argc, char **argv)
    try
    {
            std::cout << function() << std::endl;
    }
    catch(const char *e)
    {
            std::cout << "In main: " << e << std::endl;
    }
    
    The output
    In function: Exception in function
    1
    

Sunday, October 5, 2008

json: comments

I recently read some interesting observation that JSON format doesn't declare the comment strings. Though, for example, XML declares comment strings. The delusion here is that JSON is a data-interchange format, not a language. Yes, programming languages define format of comments. At least all languages I know have comments. But JSON is _not_a_language_. And XML is a language, not a format. This is my consideration. JSON is for machine-to-machine interchange. Machines ignore comments. Anyway you can reserve some part of message for comment:

{
    'comment': '',
}

Friday, October 3, 2008

perl: another way to dereference

When you have structures that have nested arrays or hashes which you want to dereference on the fly w/o using extra variable to store reference you can use special syntax:

%{reference_to_hash} and @{reference_to_array}
Next piece of code shows the common usage
$struct = [1, 2, 3, {a=>1, b=>2, c=>[1,2]}];

%hash = %{$struct->[3]};

@array = @{$struct->[3]->{c}};
This is useful when you want to work with structures but not with their references
push @{$struct->[3]->{c}}, (3, 4);

Thursday, October 2, 2008

perl: arrays and hashes

This is mostly a reminder for me than an article for everybody as I haven't touched perl for ages.

Small reference on arrays and hashes in perl.

Arrays

Declaration

@array = (1, '1', (2));
@array = (1..20);# by range
Access to array members with index
$array[0];
Define reference to array
$array = \@array; #reference to another array
$array = [1, 3, 5, 7, 9]; #reference to anonymous array
$array = [ @array ]; #reference to anonymous copy
@$array = (2, 4, 6, 8, 10); #implicit reference to anonymous array
To deference reference to array put @ or $ before $
@array = @$array;
@array = $$array;
Access to members of array by reference with index
$array->[0];# using -> operator
@$array[0];# dereferencing
$$array[0];# dereferencing
Size of the array
$#array;# [note: size of an empty array is -1, so $#array is a number of elements - 1]
Here is a tricks to remove all elements from an array, add an element to array
$#array = -1;
$c[++$#c] = 'value';
Take a slice of an array
@array[0..2];# first, second and third elements
@array[0,2];# first and third elements
Hashes

Declaration
%hash = ('key0', 'value0', 'key1', 'value1');# amount of elements must be even
%hash = ('key0' => 'value0', 'key1' => 'value1');
Access to hash members with key
$hash{'key0'};
Define reference to hash
$hash = \%hash; #reference to another hash
$hash = {1 => 3, 5 => 7}; #reference to anonymous hash
$hash = {1, 3, 5, 7}; #reference to anonymous hash; amount of elements must be even
$hash = [ %hash ]; #reference to anonymous copy
%$hash = (2, 4, 6, 8); #implicit reference to anonymous hash; amount of elements must be even
%$hash = (2 => 4, 6 => 8); #implicit reference to anonymous hash
To deference reference to hash put % or $ before $
%hash = %$hash;
%hash = $$hash;
Access to members of hash by reference with key
$hash->{'key0'};# using -> operator
%$hash[0];# dereferencing
$$hash[0];# dereferencing
Size of the hash
scalar keys(%hash)
Take a slice of a hash
@hash{'key0','key1'};
@hash{@keys};

c++: separate members from their classes

In my post c++: separate methods from their classes I described how to call class method by reference. The similar staff you can do with class members. Assume you have a collection of class instances and you want to print the values of some members from them. Again you can define two lists - list of class instances and list of pointers to class members. Later you can iterate through these list to touch members of the classes.

#include <iostream>
#include <list>

class A
{
    public:
        int m0; 
        int m1; 
};

template<typename T>
void
print(const T &a, 
    int T::*p)
{
    std::cout << a.*p << std::endl;
}

int main(int argc, char **argv)
{
    std::list<A> ls;
    std::list<int A::*> lsm;

    int A::*p0 = &A::m0;

    lsm.push_back(p0);
    lsm.push_back(&A::m1);

    A a0, a1; 
    a0.*p0 = 0;
    a0.m1 = 1;
    a1.m0 = 2;
    a1.m1 = 3;
    
    ls.push_back(a0);
    ls.push_back(a1);
    
    for (std::list<A>::iterator i = ls.begin();i!=ls.end();++i)
        for (std::list<int A::*>::iterator j = lsm.begin();j!=lsm.end();++j)
            print(*i, *j);

    return 0;
}
With this piece of code you will get
0
1
2
3
This method to access class members can be combined with class methods references to achieve more power.