Inspecting C++ virtual tables
I have Certified Hand Wavy™ knowledge of how C++ virtual functions work,
cobbled together from StackOverflow over the years:
- The C++ standard does not specify how
virtualfunctions must be implemented - Many compilers implement
virtualfunctions using ‘virtual tables’ (vtables)
In other words, I know pretty much nothing, a virtual Jon Snow (pause for
polite laughter). So let’s do three things:
- Get an official definition of the
vtable - Inspect a
vtable - See where
vptrgets added
Get an official definition of the vtable
I know that Clang and GCC both use the Itanium C++ ABI , which contains section on the Virtual Table Layout . Two things become clear from this:
- The
vtablecontains more than just function pointers virtualinheritance complicates things
There are actually five vtable components, four more than I thought:
- Virtual call (
vcall) offets - pointer adjustment for virtual functions declared in avirtualbase - Virtual base (
vbase) offsets - used to access the virtual bases of an object - Offset to top - ‘displacement to the top of the object from the location
within the object of the virtual table pointer that addresses this virtual
table’
- Always present
- Necessary for
dynamic_cast - This will always be
0in a complete object virtual table / all of its primary basevtables - Appears to only be non-zero with multiple inheritance (specifically, when
an object has a secondary
vtable)
- Typeinfo pointer - runtime type information (RTTI) for the object
- Always present
- Virtual table address pointer - this is the virtual table address
contained in an object’s virtual pointer
- Technically not a distinct type of entry - this is just where the
vptrin an object points
- Technically not a distinct type of entry - this is just where the
- Virtual function pointers - addresses of virtual functions
There is plenty more to learn in the spec, but this is the gist and is enough to start exploring.
Inspect a vtable
Let’s examine the following setup:
|
|
using two tools:
clanggdb
Examining with clang
clang itself can dump the vtable layouts for the code above:
$ clang++ -c virtual.cc -Xclang -fdump-vtable-layouts
Vtable for 'Base' (5 entries).
0 | offset_to_top (0)
1 | Base RTTI
-- (Base, 0) vtable address --
2 | Base::~Base() [complete]
3 | Base::~Base() [deleting]
4 | void Base::BaseA()
VTable indices for 'Base' (3 entries).
0 | Base::~Base() [complete]
1 | Base::~Base() [deleting]
2 | void Base::BaseA()
Vtable for 'Derived' (6 entries).
0 | offset_to_top (0)
1 | Derived RTTI
-- (Base, 0) vtable address --
-- (Derived, 0) vtable address --
2 | Derived::~Derived() [complete]
3 | Derived::~Derived() [deleting]
4 | void Derived::BaseA()
5 | void Derived::DerivedB()
VTable indices for 'Derived' (4 entries).
0 | Derived::~Derived() [complete]
1 | Derived::~Derived() [deleting]
2 | void Derived::BaseA()
3 | void Derived::DerivedB()
Unsurprisingly, this aligns with what we read in the ABI docs:
- Neither
vtablehas avcallorvbase(since we aren’t using virtual inheritance) offset_to_topis0(since we aren’t using virtual or multiple inheritance)- Virtual functions follow
RTTI
The output has a separate VTable indices structure, which matches what I
originally thought the vtable was. More importantly, why is the capitalization
different ('Vtable for' vs. ‘VTable indices for’)?
Examining with gdb
With gdb we can jump into a live program and see things in action. For this,
let’s take a look at a few things:
c’svptrc’svtable- The names stored in
c’s RTTI
First, we’ll compile the program, set a breakpoint just before the final call and run it:
$ clang++ -std=c++20 -O0 -g virtual.cc -o virtual && gdb -q virt
Reading symbols from virt...
(gdb) b 17
Breakpoint 1 at 0x401149: file virtual.cc, line 17.
(gdb) r
Breakpoint 1, main () at virtual.cc:17
17 c.BaseA();
Then we can start inspecting:
(gdb) p c
$1 = (Base &) @0x7fffffffdc38: {_vptr$Base = 0x402018 <vtable for Derived+16>}
Right off the bat we see that c has a hidden member that points to the
beginning of its vtable (vptr$Base). But as we learned earlier, this is not the
beginning of the entire vtable. Based on what we read, the actual vtable
beginning should be two words before the vptr (one for offset_to_top, one for RTTI),
so let’s jump 0x2 words back from vptr$Base:
(gdb) x/8gx c._vptr$Base-0x2 0x402008 <_ZTV7Derived>: 0x0000000000000000 0x0000000000402058 0x402018 <_ZTV7Derived+16>: 0x00000000004011d0 0x00000000004011f0 0x402028 <_ZTV7Derived+32>: 0x0000000000401220 0x0000000000401230 0x402038 <_ZTS7Derived>: 0x6465766972654437 0x0000657361423400
Things are looking good so far, but let’s take a brief detour into…
Mangled names
In addition to
vtableinfo, the ABI docs have multiple sections on External Names (a.k.a. Mangling) which contain some interesting bits:
- Mangled names are prefixed with ‘_Z’ - this is visible in the output above
- Names have a well-specified encoding - obvious in retrospect, but useful to see where it is codified
- The
vtable/its entities have their own encodings - this further explains some of the debugging output aboveSo a symbol like
_ZTV7Derivedis a mangled external name (_Z) for a virtual table (TV) of type7Derived, a.k.a.Derived’svtable.7Derivedis a source-name , which per this contains the byte length of the identifier + the identifier.Knowing that, a quick search for ‘gdb demangling’ returns a few options to make GDB’s output a bit smarter:
set print demangle on- demangle C++ encoded namesset print asm-demangle on- demangle C++ encoded names in disassemblyWe’ll flip those on for the rest of this, adding a bit more detail to the output:
(gdb) set print demangle on (gdb) set print asm-demangle on (gdb) x/8gx c._vptr$Base-0x2 0x402008 <vtable for Derived>: 0x0000000000000000 0x0000000000402058 0x402018 <vtable for Derived+16>: 0x00000000004011d0 0x00000000004011f0 0x402028 <vtable for Derived+32>: 0x0000000000401220 0x0000000000401230 0x402038 <typeinfo name for Derived>: 0x6465766972654437 0x0000657361423400
With the names demangled, let’s track down the names of c’s class names. We
know that c’s RTTI is one word
before its vptr, which per the output above is 0x0000000000402058:
(gdb) x/8gx c._vptr$Base-0x2
0x402008 <vtable for Derived>: 0x0000000000000000 0x0000000000402058
0x402018 <vtable for Derived+16>: 0x00000000004011d0 0x00000000004011f0
Per the
docs
, the Itanium
ABI stores RTTI as a class derived from std::type_info:
|
|
Note the highlighted virtual destructor! This means objects of this
type/derived from it will have their own vtables.
Our object (c) should be of type abi::__si_class_type_info, which per the
docs is used for ‘classes containing only a single, public, non-virtual base at
offset zero’ and is defined as:
// Base type for class representations.
class __class_type_info : public std::type_info {}
class __si_class_type_info : public __class_type_info {
public:
const __class_type_info *__base_type;
};
Summing this up, our RTTI should have:
- A pointer to a
vtablefor__si_class_type_info - The
__type_nameofc - A pointer to
c’s__base_type(which should itself have atype_name
The suspense is killing me, let’s see. Here is what the RTTI pointer holds, with the colors hopefully mapping to the values above:
(gdb) x/4gx 0x0000000000402058
0x402058 <typeinfo for Derived>: 0x0000000000403d90 0x0000000000402038
0x402068 <typeinfo for Derived+16>: 0x0000000000402048 0x0000000000000000
and those values hold…
(gdb) x/1gx 0x0000000000403d90
0x403d90 <vtable for __cxxabiv1::__si_class_type_info@CXXABI_1.3+16>: 0x00007ffff7e49e70
(gdb) x/1s 0x0000000000402038
0x402038 <typeinfo name for Derived>: "7Derived"
(gdb) x/2gx 0x0000000000402048
0x402048 <typeinfo for Base>: 0x0000000000403d30 0x0000000000402041
(gdb) x/1s 0x0000000000402041
0x402041 <typeinfo name for Base>: "4Base"
See where vptr gets added
To wrap this up, where does the vptr actually get added to an object? This
must happen in the constructor, so let’s confirm by first dumping the assembly
for Derived::Derived:
|
|
After the preamble, the value in register $rdi (the space that the caller is
requesting be filled with a Derived object) is moved into $rbp-0x8. That
value moved into $rcx and $rdi prior to calling Base::Base, which looks
like:
|
|
Note the highlighted lines: 0x402070 is Base’s vtable, and 0x10 after
that is the start of the virtual functions (after offset_to_top and RTTI).
That value is the vptr, which is then placed in the object passed in $rdi.
After the base object is created and control returns to Derived::Derived, the
object in $rdi contains a vptr that points to Base’s vtable. That is
adjusted to point to Derived’s vtable in the subsequent lines:
|
|
Wrapping up
Altough this is likely obvious to anyone who has worked with virtual
functions, I learned multiple things (namely, that the vtable has a lot more
stuff in it than I thought). There is a lot more exploration that could be done
(multiple inheritance, virtual inheritance, etc.), but I’ll leave those for
Future Me.