C++ virtual tables

2022-02-22T20:09:10

One of the principal differences between C and C++ is the latter's direct support for virtual functions. While it is entirely possible, as will be demonstrated, to implement virtual dispatch in C, C++ offers language primitives that instruct the compiler to generate the necessary back end code automatically.

At the center of most implementations is the virtual table. Each object with virtual functions has an associated structure containing function pointers that direct the program to the implementation of the object's methods. This article explores the contents of the virtual tables generated by compilers to implement virtual dispatch in C++. It is based on the GCC implementation, which follows the Itanium ABI. In particular:

The text is divided into sections, mostly following the class categories in the document, with some ancillary material around them:

Motivation
Code generation
Class categories
References
Notes

Links to source files are placed throughout the text. They are also available in the repository.

To limit the scope to even a reasonably long description, virtual inheritance is almost completely disregarded. It significantly complicates the implementation, but is mostly orthogonal — if related — to the concepts presented here. It will be mentioned briefly in cases where it influences the table layout, but the discussion of several aspects will omit the details that pertain to virtual inheritance (e.g. the "trivial" case is described as an object with no virtual functions, when in fact it has the additional restriction of having no virtual bases). Readers interested in the details are encouraged to consult the original document.

Motivation

Our goal is to explore the organization of virtual tables and the operations performed by the compiler to implement virtual calls. Along the way, we will develop a C implementation that is byte-by-byte identical to the one generated by the compiler.

To demonstrate how accurate this reproduction is, here is a C++ program which declares a class hierarchy with multiple inheritance similar to the one presented at the end of this article:

test.cpp

struct S { virtual void sf(void); };
struct T { virtual void tf(void); };

struct U : S, T {
    void tf(void) override;
    virtual void uf(void);
};

struct V : U {
    void tf(void) override;
    void uf(void) override;
    virtual void vf(void);
};

The program contains objects of each class and performs a series of operations on them. However, only V, the bottom of the hierarchy, is defined by the C++ source:

$ objdump --syms --demangle test.o
[…]
0000000000000000  w    O .data.rel.ro._ZTV1V	0000000000000048 vtable for V
0000000000000000  w    F .text._ZN1V2tfEv	0000000000000015 V::tf()
0000000000000000  w    F .text._ZN1V2ufEv	000000000000003a V::uf()
0000000000000000  w    F .text._ZN1V2vfEv	0000000000000015 V::vf()
0000000000000020  w    F .text._ZN1V2tfEv	0000000000000015 non-virtual thunk to V::tf()
0000000000000000 g     F .text.startup	0000000000000104 main
[…]

Where are the other definitions? I am glad you asked:

$ make --always-make test
g++ -O2 -fno-rtti   -c -o test.o test.cpp
cc -O2   -c -o evil.o evil.c
g++ -o test test.o evil.o
$ objdump --syms --demangle evil.o
[…]
0000000000000060 g     O .data.rel.ro.local	0000000000000018 vtable for S
0000000000000040 g     O .data.rel.ro.local	0000000000000018 vtable for T
0000000000000000 g     O .data.rel.ro.local	0000000000000038 vtable for U
0000000000000000 g     F .text	000000000000001c S::sf()
0000000000000020 g     F .text	000000000000001c T::tf()
0000000000000040 g     F .text	000000000000001c U::uf()
0000000000000060 g     F .text	0000000000000044 non-virtual thunk to U::tf()
00000000000000b0 g     F .text	000000000000001c U::tf()
00000000000000d0 g     F .text	000000000000002d f(S*)
0000000000000100 g     F .text	000000000000002d f(T*)
0000000000000130 g     F .text	0000000000000041 f(U*)
0000000000000000 g     O .data.rel.local	0000000000000010 u
0000000000000010 g     O .data.rel.local	0000000000000008 t
0000000000000018 g     O .data.rel.local	0000000000000008 s
[…]

Here is the C++ part of the program and its output:

C++ source

test.cpp

#include <cstdio>

// base and derived classes declared in C++
struct S {
    // base function
    virtual void sf(void);
};

struct T {
    // base function overridden in derived
    virtual void tf(void);
};

struct U : S, T {
    // overridden function
    void tf(void) override;
    // own virtual function
    virtual void uf(void);
};

// derived class defined in C++
struct V : U {
    void tf(void) override { std::printf("V::tf() in %s\n", __FILE__); }
    void uf(void) override {
        std::puts("\n=== C++ obj -> C++ offset -> C base impl ===\n");
        U::uf();
        std::puts("\n=== C++ obj -> C++ offset -> C derived impl ===\n");
        std::printf("V::uf() in %s\n", __FILE__);
    }
    virtual void vf(void) { std::printf("V::vf() in %s\n", __FILE__); }
};

// C++ objects defined in C
extern S s;
extern T t;
extern U u;

// C++ functions defined in C
void f(S *p);
void f(T *p);
void f(U *p);

int main(void) {
    std::puts("=== C obj -> C impl ===\n");
    f(&s);
    f(&t);
    std::puts("\n=== C obj -> C++ offset -> C base impl ===\n");
    f(static_cast<S*>(&u));
    std::puts("\n=== C obj -> C++ offset -> C prelude -> C derived impl ===\n");
    f(static_cast<T*>(&u));
    std::puts("\n=== C obj -> C impl -> virtual calls ===\n");
    f(&u);
    V v;
    std::puts("\n=== C++ obj -> C++ offset -> C base impl ===\n");
    static_cast<S*>(&v)->sf();
    std::puts("\n=== C++ obj -> C++ offset -> C++ derived impl ===\n");
    static_cast<T*>(&v)->tf();
    static_cast<U*>(&v)->uf();
    std::puts("\n=== C++ obj -> C++ impl ===\n");
    (&v)->vf();
}

Output

=== C obj -> C impl ===

f(S*) in evil.c
S::sf() in evil.c
f(T*) in evil.c
T::tf() in evil.c

=== C obj -> C++ offset -> C base impl ===

f(S*) in evil.c
S::sf() in evil.c

=== C obj -> C++ offset -> C prelude -> C derived impl ===

f(T*) in evil.c
non-virtual thunk to U::tf() in evil.c
U::tf() in evil.c

=== C obj -> C impl -> virtual calls ===

f(U*) in evil.c
S::sf() in evil.c
non-virtual thunk to U::tf() in evil.c
U::tf() in evil.c
U::uf() in evil.c

=== C++ obj -> C++ offset -> C base impl ===

S::sf() in evil.c

=== C++ obj -> C++ offset -> C++ derived impl ===

V::tf() in test.cpp

=== C++ obj -> C++ offset -> C base impl ===

U::uf() in evil.c

=== C++ obj -> C++ offset -> C derived impl ===

V::uf() in test.cpp

=== C++ obj -> C++ impl ===

V::vf() in test.cpp

By the end of this article, you will have the knowledge to do such horrible things to your C++ compiler.

Code generation

The analysis that follows will contain several source code listings. Examples will be presented in C++ paired with their generated machine code and data. The exposition of the underlying implementation will be written in C, so as to have no interference that may be mistaken as a power only available to the compiler.

Machine code is generated using GCC, currently at version 11.2.0, using the -O2, -fno-exceptions, and -fno-rtti arguments. Machine code disassembly is generated either using GCC's -S flag, which skips the assembly stage, or by direct analysis of the generated object files using objdump. All code was generated on an x86-64 Linux machine.

The generated assembly is filtered through c++filt in order to present names without mangling. Listings may omit details that are not relevant to the topic being discussed. Also, due to aggressive inlining done by the compiler, some examples may not generate exactly the same code as presented. Some liberty has been taken in combining code generated after the addition of compiler instructions of various sorts, but only when they remained truthful to the code presented, modulo visibility from being declared in the same translation unit. The most notable instances have dedicated notes with detailed explanations.

Examples

$ cat test.c
int main(void) {
    return 0;
}

$ gcc -S -O2 -masm=intel -o - test.c
        .file   "test.c"
        .intel_syntax noprefix
        .text
        .section        .text.startup,"ax",@progbits
        .p2align 4
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        xor     eax, eax
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (GNU) 11.2.0"
        .section        .note.GNU-stack,"",@progbits

$ gcc -O2 -c test.c
$ objdump --disassemble --disassembler-options intel test.o


test.o:     file format elf64-x86-64


Disassembly of section .text.startup:

0000000000000000 <main>:
   0:   31 c0                   xor    eax,eax
   2:   c3                      ret

$ c++filt <<< _ZN1S1fEv
S::f()

Trivial

The Itanium ABI document distinguishes virtual tables for several types of class hierarchies. All types have the same general layout, but some cases do not require all sections. The simplest of these is the trivial class, which has no virtual functions: objects of these classes have no need for a virtual table, and so none is generated and the objects do not contain a virtual pointer.

trivial.cpp

struct {
    void f(void);
} s;

void f(void) {
    s.f();
}

f():
    lea rdi, s[rip]
    jmp ._anon_0::f()@PLT
s:
    .zero 1

The code is as simple as it can be. s has no data members, so it is declared as a single-byte object⁰. The free function f loads the address to be used as the this pointer into rdi¹ (as mandated by the ABI) and transfers control to the object's f method.

Leaf

A leaf class has virtual functions of its own but does not inherit any. A virtual table will be generated for it, and a pointer to that table will be present at the beginning of each class instance.

leaf.cpp

struct S {
    virtual void f(void) {}
} s;

S::f():
    ret
vtable for S:
    .quad 0      # offset to top
    .quad 0      # typeinfo*
    .quad S::f() # virtual functions
s:
    .quad vtable for S+16

Three symbols have been generated for this code (_ZN1S1fEv, _ZTV1S, and s are their unadulterated mangled names), placed in the final executable in the .text, .data.rel.ro, and .data sections, respectively. They are: the implementation of S::f, the virtual table for class S, and the global object s.

The function S::f is empty, as expected. It is defined just so the compiler has enough information to generate the virtual table. The object s would also be empty, since it has no data members, but the compiler has added a hidden virtual pointer to it. The Itanium ABI document calls classes such as S whose only effective data member is the virtual pointer nearly empty classes.

Finally, the virtual table itself is composed of three quad (i.e. 64-bit) words. They are, as indicated by the comments:

Offset to top: this value is used in the implementation of dynamic_cast and is not relevant for this article. It is always zero under single inheritance.
typeinfo pointer: pointer to the object used for run-time type information (RTTI). It will be zero/nullptr in all examples since they are compiled with RTTI disabled.
Virtual function pointers: this is the array which gives virtual tables their name. In this case, it has a single entry: the address of S::f.

Note that the virtual pointer in s points to the third element of the table (16 bytes from the beginning), not to the top. Thus, given an object of type S, a virtual method can be called in the following manner²:

leaf.cpp

void f(S *p) {
    p->f();
}

f(S*):
    mov rax, QWORD PTR [rdi]
    jmp [QWORD PTR [rax]]

The first step is to dereference the pointer p to get to the object. As described above, the virtual pointer is placed at the beginning of each object, and it points directly to the function pointer array, which in this case has a single entry. p is already in the register which corresponds to the first argument and points to the derived object, so calling the virtual function f is a simple matter of doing the required triple-dereference to transfer execution to method.

A quick reference for those unfamiliar with x86 assembly:

rdi: the register containing the first function parameter (p).
[rdi]: dereference the pointer contained in the rdi register.
n[rdi]: apply an immediate offset before dereferencing the pointer, as in the C expression array[n]³.
mov …, QWORD PTR […]: copy the pointed-to, 64-bit (quad word) value.
jmp [QWORD PTR [rax]]: set the instruction pointer to the result of dereferencing the rax register twice, both times as 64-bit values.

The equivalent code in pseudo-C would be:

extern u64 rdi;
u64 **rax = *(u64***)rdi; // mov
goto **rax; // jmp

Using a fictitious triple-dereference addressing mode, the same could be achieved with jmp [QWORD PTR [QWORD PTR [rdi]]] (i.e. goto ***rdi). This is reflected in the triple indirection using the -> and () operators in the C implementation below.

And finally, adding more virtual functions to S simply extends the table. Functions are added in order of declaration. An offset is required in the second-to-last dereference to obtain the correct function pointer⁴. Adding data members has no effect on this layout since they are placed after the virtual pointer. Base classes similarly have no effect, since their data members are placed between the virtual pointer and the derived data members (they have no virtual functions since this is a leaf class).

leaf_fns_inline.cpp

struct S {
    virtual void f(void) {}
    virtual void g(void) {}
    virtual void h(void) {}
};

void f(S *p) {
    p->f(), p->g(), p->h();
}

vtable for S:
    .quad 0
    .quad 0
    .quad S::f()
    .quad S::g()
    .quad S::h()
f:
    push rbp
    mov  rax, QWORD PTR [rdi]
    mov  rbp, rdi
    call [QWORD PTR [rax]]
    mov  rax, QWORD PTR 0[rbp]
    mov  rdi, rbp
    call [QWORD PTR 8[rax]]
    mov  rax, QWORD PTR 0[rbp]
    mov  rdi, rbp
    pop  rbp
    mov  rax, QWORD PTR 16[rax]
    jmp  rax

Implementation

From the description above, we can start to derive an implementation equivalent to that generated by the compiler. First, we declare the header present in every table.

leaf.c

struct typeinfo;

struct vtable_header {
    ptrdiff_t offset_to_top;
    const struct typeinfo *typeinfo;
};

The definition of struct typeinfo is not interesting, so it is declared but left undefined, which is enough to form a pointer to it and include it as a member. This header is used only to describe and adhere to the layout, its members will be zero (and implicitly zero-initialized).

Next, the virtual table for S, which consists of the header and a function pointer for each virtual function. It takes as its first argument a pointer to the object, as is done implicitly in C++. S objects are no longer empty: they now contain a pointer to the function pointer array in the virtual table.

leaf.c

struct S;

struct S_vtable {
    void (*const f)(struct S*);
};

void S_f(struct S*) {}

const struct {
    const struct vtable_header header;
    const struct S_vtable vtable;
} S_vtable = {
    .vtable.f = S_f,
};

struct S {
    const struct S_vtable *vptr;
};

struct S s = {
    .vptr = &S_vtable.vtable,
};

S_f:
    ret
S_vtable:
    .zero 16
    .quad S_f
s:
    .quad S_vtable+16

We can then implement the virtual call, which uses the function pointer in the virtual table arrived at via the virtual pointer:

leaf.c

void f(struct S *p) {
    p->vptr->f(p);
}

f:
    mov rax, QWORD PTR [rdi]
    jmp [QWORD PTR [rax]]

General single inheritance

While not discriminated in the document, this category has enough distinguishing characteristics from the others to be presented in its own section. These are classes which inherit virtual functions through single inheritance only, i.e. each class in the hierarchy inherits from a single base. Their inheritance graph is a single straight line.

Surprisingly little has to be added to support this case, which is one reason why single inheritance is universally supported while languages often forbid multiple and virtual inheritance. Since each class has a single base, their objects are all simple concatenations of all classes in the hierarchy and all objects in the hierarchy are inter-convertible⁵. In other words, they all point to the same block of memory and differ only in how much of that memory each class is aware of. Similarly, their lists of virtual functions are all simple concatenations of each hierarchy.

Since a pointer to a derived class can be passed to a function in one of its bases unchanged and the derived virtual table can be used by any of its bases also unchanged, the only requirement to support single inheritance is that new functions added by a derived class be appended to the virtual table, while overridden functions replace entries in their original positions.

Implementation

Quite a bit of code is required to display the assembly and C implementations. They are interesting, but very long and repetitive. Click the details elements below to view them.

single_inline.cpp

struct S {
    virtual void sf(void) {}
} s;

struct T : S {
    virtual void tf(void) {}
} t;

struct U : T {
    virtual void uf(void) {}
} u;

void f(S *p) {
    p->sf();
}

void f(T *p) {
    p->sf(), p->tf();
}

void f(U *p) {
    p->sf(), p->tf(), p->uf();
}

Assembly

f(S*):
    mov rax, QWORD PTR [rdi]
    jmp [QWORD PTR [rax]]
f(T*):
    push rbp
    mov rax, QWORD PTR [rdi]
    mov rbp, rdi
    call [QWORD PTR [rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    pop rbp
    mov rax, QWORD PTR 8[rax]
    jmp rax
f(U*):
    push rbp
    mov rax, QWORD PTR [rdi]
    mov rbp, rdi
    call [QWORD PTR [rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    call [QWORD PTR 8[rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    pop rbp
    mov rax, QWORD PTR 16[rax]
    jmp rax
vtable for S:
    .quad 0
    .quad 0
    .quad S::sf()
vtable for T:
    .quad 0
    .quad 0
    .quad S::sf()
    .quad T::tf()
vtable for U:
    .quad 0
    .quad 0
    .quad S::sf()
    .quad T::tf()
    .quad U::uf()
u:
    .quad vtable for U+16
t:
    .quad vtable for T+16
s:
    .quad vtable for S+16

Implementation

single.c

struct S;
struct T;
struct U;

struct S_vtable {
    void (*const sf)(struct S*);
};

struct T_vtable {
    struct S_vtable S;
    void (*const tf)(struct T*);
};

struct U_vtable {
    struct T_vtable T;
    void (*const uf)(struct U*);
};

struct S {
    const struct S_vtable *vptr;
};

struct T {
    union {
        const struct T_vtable *vptr;
        struct S s;
    };
};

struct U {
    union {
        const struct U_vtable *vptr;
        struct T t;
    };
};

void S_sf(struct S*) {}
void T_tf(struct T*) {}
void U_uf(struct U*) {}

const struct {
    const struct vtable_header header;
    const struct S_vtable vtable;
} S_vtable = {
    .vtable.sf = S_sf,
};

const struct {
    const struct vtable_header header;
    const struct T_vtable vtable;
} T_vtable = {
    .vtable.S.sf = S_sf,
    .vtable.tf = T_tf,
};

const struct {
    const struct vtable_header header;
    const struct U_vtable vtable;
} U_vtable = {
    .vtable.T.S.sf = S_sf,
    .vtable.T.tf = T_tf,
    .vtable.uf = U_uf,
};

struct S s = {
    .vptr = &S_vtable.vtable,
};

struct T t = {
    .vptr = &T_vtable.vtable,
};

struct U u = {
    .vptr = &U_vtable.vtable,
};

void f_S(struct S *p) {
    p->vptr->sf(p);
}

void f_T(struct T *p) {
    p->vptr->S.sf(&p->s);
    p->vptr->tf(p);
}

void f_U(struct U *p) {
    p->vptr->T.S.sf(&p->t.s);
    p->vptr->T.tf(&p->t);
    p->vptr->uf(p);
}

f_S:
    mov rax, QWORD PTR [rdi]
    jmp [QWORD PTR [rax]]
f_T:
    push rbp
    mov rax, QWORD PTR [rdi]
    mov rbp, rdi
    call [QWORD PTR [rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    pop rbp
    mov rax, QWORD PTR 8[rax]
    jmp rax
f_U:
    push rbp
    mov rax, QWORD PTR [rdi]
    mov rbp, rdi
    call [QWORD PTR [rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    call [QWORD PTR 8[rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    pop rbp
    mov rax, QWORD PTR 16[rax]
    jmp rax
u:
    .quad U_vtable+16
t:
    .quad T_vtable+16
s:
    .quad S_vtable+16
U_vtable:
    .zero 16
    .quad S_sf
    .quad T_tf
    .quad U_uf
T_vtable:
    .zero 16
    .quad S_sf
    .quad T_tf
S_vtable:
    .zero 16
    .quad S_sf

Non-virtual bases only

For the constraints of this article, this category encompasses non-virtual multiple inheritance, where a single class can derive directly from more than one base. The fundamental problem introduced by this category is that pointers are no longer inter-convertible in general: they cannot point to objects of two or more distinct types at the same time. A pointer to the derived class can no longer be used unchanged in all cases by functions expecting a pointer to one of its bases, neither can its virtual table be used in this way.

This is a problem because functions ultimately defined in a base class (e.g. S and T in the implementation above) expect a pointer to an object of that type (S*/T*) as their this pointer argument. When calling these function, pointers to a derived class have to be adjusted (offset).

Note that this also applies for single inheritance, which can be thought of as a degenerate case where all offsets are zero. For analogous reason, the top of a class hierarchy (e.g. S above), called the primary base class, is special in that it can share the virtual table with the derived type and avoid some of the complications described below.

Pointer adjustments

Pointers may need adjustment in two distinct phases of a virtual call. Initially, the pointer to the derived object is cast to a base pointer, which involves offsetting it to point to a sub-object⁶. This offset is done by the compiler whenever a cast occurs. Since it knows the source and destination types, it can generate the offset at compile time and encode it in the machine code.

adj.cpp

void f(B*);

void g(D *p) {
    f(p);
}

g(D*):
    lea    rax, 16[rdi]
    test   rdi, rdi
    cmovne rdi, rax
    jmp    f(B*)@PLT

This sub-object needs to be indistinguishable from a regular object of the base class. Ultimately, this means each sub-object needs its own virtual table — the derived object is no longer a concatenation of all classes with a single virtual pointer at the top. Even worse, it means a given class will have multiple virtual tables: one for objects of that type and an additional one for each hierarchy where that type appears in a sub-object due to multiple inheritance.

In general, the reverse operation is not as trivial. Given a base type, it is not known to which concrete object it belongs and thus which offset to apply. This is the reason for the "offset to top" value, the first member of the virtual table: it contains the offset to be applied to reach the top of the derived object (this also explains why it is always zero in a single-inheritance hierarchy).

dyn.cpp

void *f(B *p) {
    return dynamic_cast<void*>(p);
}

f(B*):
    test rdi, rdi
    je .L3
    mov rax, QWORD PTR [rdi]
    add rdi, QWORD PTR -16[rax]
    mov rax, rdi
    ret
.L3:
    xor eax, eax
    ret

Function preludes

When a pointer to a sub-object is used to call a function that is overridden by a subclass, a second adjustment may be needed to point to the (sub-)object expected by it. A common way to deal with this offset is through the use of function preludes (called thunks in the Itanium ABI). This is a short code fragment which performs the pointer adjustment before executing the original function (called the non-adjusting entry point). The address of this fragment is what is then placed in the virtual table.

Preludes can sometimes be located right before the original function, in which case no change in control flow is necessary: the instruction stream can simply flow into the original function after the pointer is adjusted. Most commonly, they are placed in the same object file and perform a call or direct jump at the end. They may also be used for other purposes, such as adjustments for covariant return types.

Implementation

Here is the final implementation, supporting class hierarchies with multiple inheritance. A variant of this source is what is defined in the evil.c presented at the beginning of the article. There, the objects and functions are carefully named to coincide with the mangled names the compiler would generate so that the linker finds them when it looks for the implementation. This is completely undefined behavior (for this exact reason), but is done here to demonstrate that the implementation of virtual dispatch is valid.

multiple_inline.cpp

struct S {
    virtual void sf(void) {}
} s;

struct T {
    virtual void tf(void) {}
} t;

struct U : S, T {
    void tf(void) override {}
    virtual void uf(void) {}
} u;

void f(S *p) {
    p->sf();
}

void f(T *p) {
    p->tf();
}

void f(U *p) {
    p->sf(), p->tf(), p->uf();
}

Assembly

non-virtual thunk to U::tf():
    sub rdi, 8
    jmp U::tf
f(S*):
    mov rax, QWORD PTR [rdi]
    jmp [QWORD PTR [rax]]
f(T*):
    mov rax, QWORD PTR [rdi]
    jmp [QWORD PTR [rax]]
f(U*):
    push rbp
    mov rax, QWORD PTR [rdi]
    mov rbp, rdi
    call [QWORD PTR [rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    call [QWORD PTR 8[rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    pop rbp
    mov rax, QWORD PTR 16[rax]
    jmp rax
vtable for S:
    .quad 0
    .quad 0
    .quad S::sf()
vtable for T:
    .quad 0
    .quad 0
    .quad T::tf()
vtable for U:
    .quad 0
    .quad 0
    .quad S::sf()
    .quad U::tf()
    .quad U::uf()
    .quad -8
    .quad 0
    .quad non-virtual
          thunk to U::tf()
s:
    .quad vtable for S+16
t:
    .quad vtable for T+16
u:
    .quad vtable for U+16
    .quad vtable for U+56

Implementation

multiple_inline.c

struct S;
struct T;
struct U;

struct S_vtable {
    void (*const sf)(struct S*);
};

struct T_vtable {
    void (*const tf)(struct T*);
};

struct U_vtable {
    const struct S_vtable S;
    const struct T_vtable T;
    void (*const uf)(struct U*);
};

struct S {
    const struct S_vtable *vptr;
};

struct T {
    const struct T_vtable *vptr;
};

struct U {
    union {
        const struct U_vtable *vptr;
        struct S s;
    };
    struct T t;
};

void S_sf(struct S*) {}
void T_tf(struct T*) {}
void U_tf(struct U*) {}
void U_uf(struct U*) {}

void U_t_tf(struct T* p) {
    U_tf(container_of(p, struct U, t));
}

const struct {
    const struct vtable_header header;
    const struct S_vtable vtable;
} S_vtable = {
    .vtable.sf = S_sf,
};

const struct {
    const struct vtable_header header;
    const struct T_vtable vtable;
} T_vtable = {
    .vtable.tf = T_tf,
};

const struct {
    const struct vtable_header header;
    const struct U_vtable vtable;
    const struct vtable_header T_header;
    const struct T_vtable T_vtable;
} U_vtable = {
    .vtable.S.sf = S_sf,
    .vtable.T.tf =
        (void(*)(struct T*))U_tf,
    .vtable.uf = U_uf,
    .T_header.offset_to_top =
        -offsetof(struct U, t),
    .T_vtable.tf = U_t_tf,
};

struct S s = {
    .vptr = &S_vtable.vtable,
};

struct T t = {
    .vptr = &T_vtable.vtable,
};

struct U u = {
    .vptr = &U_vtable.vtable,
    .t.vptr = &U_vtable.T_vtable,
};

void f_S(struct S *p) {
    p->vptr->sf(p);
}

void f_T(struct T *p) {
    p->vptr->tf(p);
}

void f_U(struct U *p) {
    p->vptr->S.sf(&p->s);
    p->vptr->T.tf((struct T*)p);
    p->vptr->uf(p);
}

U_t_tf:
    sub rdi, 8
    jmp U_tf@PLT
f_S:
    mov rax, QWORD PTR [rdi]
    jmp [QWORD PTR [rax]]
f_T:
    mov rax, QWORD PTR [rdi]
    jmp [QWORD PTR [rax]]
f_U:
    push rbp
    mov rax, QWORD PTR [rdi]
    mov rbp, rdi
    call [QWORD PTR [rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    call [QWORD PTR 8[rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    pop rbp
    mov rax, QWORD PTR 16[rax]
    jmp rax
S_vtable:
    .zero 16
    .quad S_sf
T_vtable:
    .zero 16
    .quad T_tf
U_vtable:
    .zero 16
    .quad S_sf
    .quad U_tf
    .quad U_uf
    .quad -8
    .zero 8
    .quad U_t_tf
s:
    .quad S_vtable+16
t:
    .quad T_vtable+16
u:
    .quad U_vtable+16
    .quad U_vtable+56

References

This article was the result of my frustration when researching this topic. It tries to provide an intermediate step between the abundant basic introductions and the quasi-inscrutable reference manuals. For this reason, and despite its length, it leaves much to be explored. Here are some pointers (heh) for further reading:

As mentioned previously, virtual inheritance significantly complicates the implementation of virtual dispatch. It does so, however, in a mostly non-intrusive way with regards to what was presented.
Constructors must initialize the virtual pointer(s) and must ensure that they are valid at all points as defined by the standard. Destructors are also virtual functions whose implementation needs some care.
In general, an immense amount of detail has been omitted — destructor variants, covariant return types, construction tables, to name a few.
Bjarne Stroustrup's original article on the C++ implementation, Multiple Inheritance for C++, is a very good summary and introduction.
Morgan Deters' notes, VTable Notes on Multiple Inheritance in GCC C++ Compiler v4.0.1, provide a nice visual accompaniment to these descriptions, especially for the virtual inheritance cases.
Axel-Tobias Schreiner's book, Object-Oriented Programming With ANSI-C, is a phenomenal book on the topic. The C implementation presented here makes no attempt to be usable, it is a simple, reverse-engineered, glorified disassembly of the compiler output. Schreiner's books has an incredibly elegant and beautiful implementation of single-inheritance in C. The book is available in print and online.
Neil Brown's series in LWN, Object-oriented design patterns in the kernel, explores the manifestation of many of these patterns — some less orthodox than others — in the Linux kernel, where object-oriented programming is heavily used.

I hope this was helpful. Happy hacking.

Notes

C++ requires sizeof(x) > 0 for all objects.
lea is the load effective address x86 instruction. In this case, it is a simpler (to the assembler) form of adding the address of s to rip (the instruction pointer). Both this offset and the reference to @PLT are due to the fact that GCC builds position-independent code (PIC) by default, i.e. code that can be placed in shared libraries and dynamically loaded at runtime.
The actual implementation generated by the compiler is not as simple:
```
f(S*):
    mov rax, QWORD PTR [rdi]
    lea rdx, S::f()[rip]
    mov rax, QWORD PTR [rax]
    cmp rax, rdx
    jne .L5
    ret
.L5:
    jmp rax
```
To arrive at the machine code in the main text, the definition of S::f was removed, leaving only its declaration. The reason for this is that if the compiler can see the (trivial) implementation of the function, it can decide to inline the call. In this case, there is no implementation to inline, but the function code would be added between the jne and ret instructions if it existed.

This is only a valid optimization if the function has not been overridden (p may point to a subclass of S since it is not declared final), which is the reason for the extra comparison. The address in the object's virtual table is compared to the address of S::f, and jumped into if it is a different (i.e. overridden) function.

The implementation according to the C version presented later in the article which generates the same machine code is:
leaf_inline.c
```
void f(struct S *p) {
    void (*f)(struct S*) = p->vptr->f;
    __builtin_expect(f == S_f, 1) ? S_f(p) : f(p);
}
```
Which amusingly can also be written as n[array], with the same semantics and mirroring the assembly syntax.

A very subtle aspect of the implementation generated by the compiler here is that the virtual table is reloaded after each virtual call (from rbp, since rdi is a caller-saved register). From the compiler's perspective, this is because the functions, which are opaque since they are behind pointers, have access to the object through the argument p and could change the pointer and even the contents of the table.

Doing so, however, would be undefined behavior since the layout of objects with virtual functions is not defined (it need not even involve a virtual pointer at all). Exposing enough information to the compiler back end to take advantage of this optimization is an active area of research (e.g. Modeling the Invariance of Virtual Pointers in LLVM).

We can emulate this in our C implementation, compare:

multiple_reload.cpp

void f(U *p) {
    p->uf();
    p->uf();
    p->uf();
    p->uf();
}

f(U*):
    push rbp
    mov rax, QWORD PTR [rdi]
    mov rbp, rdi
    call [QWORD PTR 16[rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    call [QWORD PTR 16[rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    call [QWORD PTR 16[rax]]
    mov rax, QWORD PTR 0[rbp]
    mov rdi, rbp
    pop rbp
    mov rax, QWORD PTR 16[rax]
    jmp rax

vs.

multiple_reload.c

void f_U(struct U *p) {
    void (*f)(struct U*) =
        p->vptr->uf;
    f(p);
    f(p);
    f(p);
    f(p);
}

f_U:
    push rbp
    mov rbp, rdi
    push rbx
    sub rsp, 8
    mov rax, QWORD PTR [rdi]
    mov rbx, QWORD PTR 16[rax]
    call rbx
    mov rdi, rbp
    call rbx
    mov rdi, rbp
    call rbx
    add rsp, 8
    mov rdi, rbp
    mov rax, rbx
    pop rbx
    pop rbp
    jmp rax

Whether this is a beneficial optimization depends on caching behavior and the implementation of f and the function(s) being called.

This is true, both in C and C++, even for unrelated structs when there is a common prefix. C++ 20 provides several type traits to verify that this relation exists for a pair of types:
Allow me a short diatribe. You do not need dynamic memory allocation for dynamic dispatch. This will work just fine:
```
D d;
B *b = &d;
b->vfn();
```
As will static_cast<B*>(&d)->vfn(). It annoys me to no end to see B *b = new D or similar atrocities in dynamic dispatch examples.

assembly c c++ linux programming unix