Monday, February 26, 2007

C++ Objects Part 3: Multiple Inheritance

In my previous post I described how single inheritance works with the CodeWarrior C++ compiler - by placing the superclass as the first item in the subclass in memory (and pulling the same trick for the vtable). Now to something more complicated: multiple inheritance!

One thing is clear: we can't do the same trick for multiple inheritance as we can for single inheritance...only one superclass can be first!

The trick: change the pointer when we change the "class" of the pointer. We simply stick both bases into our subclass, but if we cast from the superclass to the second base, we move the pointer in memory to point to the second base class (since it isn't at the same address).

You can see this for yourself with a program like this:
class a { int a_; };
class b { int b_; };
class c : public a, public b { };
int main()
{
c obj;
printf("a=0x%08x, b=0x%08x, c=0x%08x\n", (a*)&obj,(b*)&obj,(c*)&obj);
}

When we print out a ptr to our object casted, we get this
a=0xbffff460, b=0xbffff464, c=0xbffff460

Note that when we cast to b, the address of our pointer moves in memory! This is because c contains a first in memory, then b second.

So we've at least solved the problem for object data: we'll put the base classes into the derived class sequentially and adjust the pointer any time we change the type, so that we point to the base WITHIN the derived class no matter where it is. This meets our rule that a pointer to the base must look like the base. It does because we move the pointer until it does.

Now let's apply the rule to vtables. We must have a pointer to a vtable as the first item in an object. So if the pointer to an object can point to either of two memory locations (the start of the first or second base) then we can't escape logic: we need TWO pointers to vtables!

The format of a vtable depends on the class we think we have; the contents of a vtable depend on the class we really have. So when we have multiple inheritance, we will have multiple vtables, so that each one can be formatted based on the base class (but filled with function pointers from the derived classes.

This example will get lengthy, so I will abridge it a little bit...

class B1 { virtual void b1(); };
class B2 { virtual void b2(); };
class D : public B1, public B2 { virtual void b1(),virtual void b2(),virtual void d(); } ;

In C we might have this:

struct B1_vtable {
typeid * type_id_ptr;
int offset;
// ptrs to B1 virt funcs
};
struct B2_vtable {
typeid * type_id_ptr;
int offset;
// ptrs to B2 virt funcs
};
struct D_vtable {
B1_vtable parent1;
B2_vtable parent2;
// ptrs to D vfuncs
};

Because D contains the first base's virtual function table first, the "header" information (type-id and offset) for the first base are used for the derived class. This will be important later. Because all of the virtual functions are consecutive in memory, when we have the derived class we can easily find any virtual method's function pointer.

(So when I say we have two vtables really we have one big vtable with two vtables embedded inside it.)

void B1_b1() {}
void B2_b2() {}
void D_b1() {}
void D_b2() {}
void D_d() {}

static type_id B1_type_id = { ... };
static type_id B2_type_id = { ... };
static type_id D_type_id = { ... };

// Virtual functions for B1:
static D_vtable sB1_vtable = { &B1_type_id, 0, B1_b1 };
// Virtual functions for B2:
static B2_vtable sB2_vtable = { &B2_type_id, 0, B2_b2 };

So far there are no surprises here, until we look at the virtual function tables for D:

// Virtual functions for D
static D_vtable sD_vtable = { { &D_type_id,0, D_b1 }, { &D_type_id, -sizeof(B1), D_b2 }, D_d };

Here we have a vtable that contains two vtables within it. Thus we have two pointers to our type-IDs. If we have a pointer to the second one, it looks like the vtable used when looking at our object of type D as if it was a B2. Remember that the pointer to the object changes as we cast, and that controls which vtables is used. (Whenever we call the virtual method b2(), we have a pointer to a B2* and therefore our object pointer is adjusted to the second vtable where we get a D_b2 object.

This is the first time the 'offset' parameter of the vtable isn't 0. I'll explain this in the next post, but for now trust that, since B2 is not the first parent class inside D, it needs a non-zero offset.

struct B1 {
B1_vtable * vtable;
// data for b2
};

struct B2 {
B2_vtable * vtable;
// data for b2
};

struct D {
B1 base1; // base1.vtable inited to &sD_vtable.parent1
B2 base2; // base2.vtable inited to &sD_vtable.parent2
// derived data for D
}

When D is initialized, the vtable for B1 is inited to one value and the vtable for B2 is inited to another value.

3 comments:

  1. Err, then what happens to D_d ? How is that virtual function accessed ? I think your above analogy is wrong. It should instead be:

    struct D {
    void *ptr1
    // B1's non-static data members
    void *ptr2
    // B2's non-static data members
    // D's non-static data members
    };

    Now ptr1 points to a struct:
    struct D_vtable {
    D_b1;
    D_d;
    };

    and ptr2 points to:
    struct D_vtable {
    D_b2;
    D_d;
    };

    ReplyDelete
  2. Why there only 4 bytes different address below?
    ---
    When we print out a ptr to our object casted, we get this
    a=0xbffff460, b=0xbffff464, c=0xbffff460
    ---

    Looks like it should have more?

    struct B1 {
    B1_vtable * vtable;
    // data for b2
    };

    struct B2 {
    B2_vtable * vtable;
    // data for b2
    };

    struct D {
    B1 base1; // base1.vtable inited to &sD_vtable.parent1
    B2 base2; // base2.vtable inited to &sD_vtable.parent2
    // derived data for D
    }

    ReplyDelete
    Replies
    1. Probably an ABI change? The post is 13 years old, and was almost certainly done on a 32-bit compiler. On a 64-bit compiler the vtables are 8-byte pointers.

      Delete