I have Certified Hand Wavy™ knowledge of how C++ virtual functions work, cobbled together from StackOverflow over the years:

  • The C++ standard does not specify how virtual functions must be implemented
  • Many compilers implement virtual functions using ‘virtual tables’ (vtables)

In other words, I know pretty much nothing, a virtual Jon Snow (pause for polite laughter). So let’s do three things:

  1. Get an official definition of the vtable
  2. Inspect a vtable
  3. See where vptr gets added

Get an official definition of the vtable

I know that Clang and GCC both use the Itanium C++ ABI , which contains section on the Virtual Table Layout . Two things become clear from this:

  • The vtable contains more than just function pointers
  • virtual inheritance complicates things

There are actually five vtable components, four more than I thought:

  • Virtual call (vcall) offets - pointer adjustment for virtual functions declared in a virtual base
  • Virtual base (vbase) offsets - used to access the virtual bases of an object
  • Offset to top - ‘displacement to the top of the object from the location within the object of the virtual table pointer that addresses this virtual table’
    • Always present
    • Necessary for dynamic_cast
    • This will always be 0 in a complete object virtual table / all of its primary base vtables
    • Appears to only be non-zero with multiple inheritance (specifically, when an object has a secondary vtable)
  • Typeinfo pointer - runtime type information (RTTI) for the object
    • Always present
  • Virtual table address pointer - this is the virtual table address contained in an object’s virtual pointer
    • Technically not a distinct type of entry - this is just where the vptr in an object points
  • Virtual function pointers - addresses of virtual functions

There is plenty more to learn in the spec, but this is the gist and is enough to start exploring.

Inspect a vtable

Let’s examine the following setup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
class Base {
 public:
  virtual ~Base() = default;
  virtual void BaseA() {}
};

class Derived : public Base {
 public:
  ~Derived() override = default;
  void BaseA() override {}
  virtual void DerivedB() {}
};

int main() {
  Derived d;
  Base& c = d;
  c.BaseA();
}

using two tools:

  1. clang
  2. gdb

Examining with clang

clang itself can dump the vtable layouts for the code above:

$ clang++ -c virtual.cc -Xclang -fdump-vtable-layouts
Vtable for 'Base' (5 entries).
   0 | offset_to_top (0)
   1 | Base RTTI
       -- (Base, 0) vtable address --
   2 | Base::~Base() [complete]
   3 | Base::~Base() [deleting]
   4 | void Base::BaseA()

VTable indices for 'Base' (3 entries).
   0 | Base::~Base() [complete]
   1 | Base::~Base() [deleting]
   2 | void Base::BaseA()

Vtable for 'Derived' (6 entries).
   0 | offset_to_top (0)
   1 | Derived RTTI
       -- (Base, 0) vtable address --
       -- (Derived, 0) vtable address --
   2 | Derived::~Derived() [complete]
   3 | Derived::~Derived() [deleting]
   4 | void Derived::BaseA()
   5 | void Derived::DerivedB()

VTable indices for 'Derived' (4 entries).
   0 | Derived::~Derived() [complete]
   1 | Derived::~Derived() [deleting]
   2 | void Derived::BaseA()
   3 | void Derived::DerivedB()

Unsurprisingly, this aligns with what we read in the ABI docs:

  • Neither vtable has a vcall or vbase (since we aren’t using virtual inheritance)
  • offset_to_top is 0 (since we aren’t using virtual or multiple inheritance)
  • Virtual functions follow RTTI

The output has a separate VTable indices structure, which matches what I originally thought the vtable was. More importantly, why is the capitalization different ('Vtable for' vs. ‘VTable indices for’)?

Examining with gdb

With gdb we can jump into a live program and see things in action. For this, let’s take a look at a few things:

  • c’s vptr
  • c’s vtable
  • The names stored in c’s RTTI

First, we’ll compile the program, set a breakpoint just before the final call and run it:

$ clang++ -std=c++20 -O0 -g virtual.cc -o virtual && gdb -q virt
Reading symbols from virt...
(gdb) b 17
Breakpoint 1 at 0x401149: file virtual.cc, line 17.
(gdb) r
Breakpoint 1, main () at virtual.cc:17
17        c.BaseA();

Then we can start inspecting:

(gdb) p c
$1 = (Base &) @0x7fffffffdc38: {_vptr$Base = 0x402018 <vtable for Derived+16>}

Right off the bat we see that c has a hidden member that points to the beginning of its vtable (vptr$Base). But as we learned earlier, this is not the beginning of the entire vtable. Based on what we read, the actual vtable beginning should be two words before the vptr (one for offset_to_top, one for RTTI), so let’s jump 0x2 words back from vptr$Base:

(gdb) x/8gx c._vptr$Base-0x2
0x402008 <_ZTV7Derived>:        0x0000000000000000      0x0000000000402058
0x402018 <_ZTV7Derived+16>:     0x00000000004011d0      0x00000000004011f0
0x402028 <_ZTV7Derived+32>:     0x0000000000401220      0x0000000000401230
0x402038 <_ZTS7Derived>:        0x6465766972654437      0x0000657361423400

Things are looking good so far, but let’s take a brief detour into…

Mangled names

In addition to vtable info, the ABI docs have multiple sections on External Names (a.k.a. Mangling) which contain some interesting bits:

  • Mangled names are prefixed with ‘_Z’ - this is visible in the output above
  • Names have a well-specified encoding - obvious in retrospect, but useful to see where it is codified
  • The vtable/its entities have their own encodings - this further explains some of the debugging output above

So a symbol like _ZTV7Derived is a mangled external name (_Z) for a virtual table (TV) of type 7Derived, a.k.a. Derived’s vtable. 7Derived is a source-name , which per this contains the byte length of the identifier + the identifier.

Knowing that, a quick search for ‘gdb demangling’ returns a few options to make GDB’s output a bit smarter:

  • set print demangle on - demangle C++ encoded names
  • set print asm-demangle on - demangle C++ encoded names in disassembly

We’ll flip those on for the rest of this, adding a bit more detail to the output:

(gdb) set print demangle on
(gdb) set print asm-demangle on
(gdb) x/8gx c._vptr$Base-0x2
0x402008 <vtable for Derived>:          0x0000000000000000      0x0000000000402058
0x402018 <vtable for Derived+16>:       0x00000000004011d0      0x00000000004011f0
0x402028 <vtable for Derived+32>:       0x0000000000401220      0x0000000000401230
0x402038 <typeinfo name for Derived>:   0x6465766972654437      0x0000657361423400

With the names demangled, let’s track down the names of c’s class names. We know that c’s RTTI is one word before its vptr, which per the output above is 0x0000000000402058:

(gdb) x/8gx c._vptr$Base-0x2
0x402008 <vtable for Derived>:        0x0000000000000000      0x0000000000402058
0x402018 <vtable for Derived+16>:     0x00000000004011d0      0x00000000004011f0

Per the docs , the Itanium ABI stores RTTI as a class derived from std::type_info:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class type_info {
 public:
  virtual ~type_info();
  bool operator==(const type_info &) const;
  bool operator!=(const type_info &) const;
  bool before(const type_info &) const;
  const char* name() const;
 private:
  type_info (const type_info& rhs);
  type_info& operator= (const type_info& rhs);
  // Added in section 2.9.4.
  const char *__type_name;
};

Note the highlighted virtual destructor! This means objects of this type/derived from it will have their own vtables.

Our object (c) should be of type abi::__si_class_type_info, which per the docs is used for ‘classes containing only a single, public, non-virtual base at offset zero’ and is defined as:

// Base type for class representations.
class __class_type_info : public std::type_info {}

class __si_class_type_info : public __class_type_info {
 public:
  const __class_type_info *__base_type;
};

Summing this up, our RTTI should have:

  • A pointer to a vtable for __si_class_type_info
  • The __type_name of c
  • A pointer to c’s __base_type (which should itself have a type_name

The suspense is killing me, let’s see. Here is what the RTTI pointer holds, with the colors hopefully mapping to the values above:

(gdb) x/4gx 0x0000000000402058
0x402058 <typeinfo for Derived>:        0x0000000000403d90      0x0000000000402038
0x402068 <typeinfo for Derived+16>:     0x0000000000402048      0x0000000000000000

and those values hold…

(gdb) x/1gx 0x0000000000403d90
0x403d90 <vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3+16>:   0x00007ffff7e49e70
(gdb) x/1s 0x0000000000402038
0x402038 <typeinfo name for Derived>:   "7Derived"
(gdb) x/2gx 0x0000000000402048
0x402048 <typeinfo for Base>:   0x0000000000403d30     0x0000000000402041
(gdb) x/1s 0x0000000000402041
0x402041 <typeinfo name for Base>:      "4Base"

See where vptr gets added

To wrap this up, where does the vptr actually get added to an object? This must happen in the constructor, so let’s confirm by first dumping the assembly for Derived::Derived:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
; (gdb) disas Derived::Derived
0x0000000000401160 <+0>:     push   rbp
0x0000000000401161 <+1>:     mov    rbp,rsp
0x0000000000401164 <+4>:     sub    rsp,0x10
0x0000000000401168 <+8>:     mov    QWORD PTR [rbp-0x8],rdi
0x000000000040116c <+12>:    mov    rax,QWORD PTR [rbp-0x8]
0x0000000000401170 <+16>:    mov    rcx,rax
0x0000000000401173 <+19>:    mov    rdi,rcx
0x0000000000401176 <+22>:    mov    QWORD PTR [rbp-0x10],rax
0x000000000040117a <+26>:    call   0x4011a0 <Base::Base()>
0x000000000040117f <+31>:    movabs rax,0x402008
0x0000000000401189 <+41>:    add    rax,0x10
0x000000000040118f <+47>:    mov    rcx,QWORD PTR [rbp-0x10]
0x0000000000401193 <+51>:    mov    QWORD PTR [rcx],rax
0x0000000000401196 <+54>:    add    rsp,0x10
0x000000000040119a <+58>:    pop    rbp
0x000000000040119b <+59>:    ret

After the preamble, the value in register $rdi (the space that the caller is requesting be filled with a Derived object) is moved into $rbp-0x8. That value moved into $rcx and $rdi prior to calling Base::Base, which looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
; (gdb) disas Base::Base
0x00000000004011a0 <+0>:     push   rbp
0x00000000004011a1 <+1>:     mov    rbp,rsp
0x00000000004011a4 <+4>:     movabs rax,0x402070
0x00000000004011ae <+14>:    add    rax,0x10
0x00000000004011b4 <+20>:    mov    QWORD PTR [rbp-0x8],rdi
0x00000000004011b8 <+24>:    mov    rcx,QWORD PTR [rbp-0x8]
0x00000000004011bc <+28>:    mov    QWORD PTR [rcx],rax
0x00000000004011bf <+31>:    pop    rbp
0x00000000004011c0 <+32>:    ret 

Note the highlighted lines: 0x402070 is Base’s vtable, and 0x10 after that is the start of the virtual functions (after offset_to_top and RTTI). That value is the vptr, which is then placed in the object passed in $rdi.

After the base object is created and control returns to Derived::Derived, the object in $rdi contains a vptr that points to Base’s vtable. That is adjusted to point to Derived’s vtable in the subsequent lines:

1
2
3
4
5
6
7
8
0x000000000040117a <+26>:    call   0x4011a0 <Base::Base()>
0x000000000040117f <+31>:    movabs rax,0x402008
0x0000000000401189 <+41>:    add    rax,0x10
0x000000000040118f <+47>:    mov    rcx,QWORD PTR [rbp-0x10]
0x0000000000401193 <+51>:    mov    QWORD PTR [rcx],rax
0x0000000000401196 <+54>:    add    rsp,0x10
0x000000000040119a <+58>:    pop    rbp
0x000000000040119b <+59>:    ret 

Wrapping up

Altough this is likely obvious to anyone who has worked with virtual functions, I learned multiple things (namely, that the vtable has a lot more stuff in it than I thought). There is a lot more exploration that could be done (multiple inheritance, virtual inheritance, etc.), but I’ll leave those for Future Me.