IMT-based interface invocation support The mono JIT can use an IMT-style invocation system to call interface methods. This considerably reduces the runtime memory usage when many interface types are loaded, because the old system required an array in MonoVTable indexed by the interface id, which grows linearly as more interfaces are loaded. In some cases there are also speedups, since an interface call can reduce to a virtual call automatically. IMT instead uses a fixed-size table and hashes each method in the implemented interfaces to a slot in the IMT table. To be able to resolve collisions, at each callsite we store the interface MonoMethod to be called in a well-known register and the IMT table will contain a snippet of code that uses it to jump to the proper vtable slot. The interface invocation sequence becomes (in pseudo-code): mov magic_reg, interface_monomethod call vtable [imt_slot] The IMT table is stored at negative addresses in the vtable, like the old interface array used to be. A small note on the choice of magic_reg for different JIT backends: the IMT method identifier doesn't necessarily need to be stored in a register, though doing so is fast and the JIT code has already the infrastructure to handle this case in an arch-independent way. A JIT porter just needs to #define MONO_ARCH_IMT_REG to the chosen register. Note that this register should be part of the MONO_ARCH_CALLEE_REGS set as it will be handled by the local register allocator (see mini/inssel.brg) and it must not be part of the registers used for argument passing as you'd overwrite an argument in that case. Also note that the method-specific trampoline code should make sure to preserve this register (but it should already if it's in MONO_ARCH_CALLEE_REGS as it could have been used for a vtable indirect call). Note that in the case of a nono-colliding IMT slot, the interface call instruction sequence becomes equivalent to a virtual call, as the IMT slot will contain the direct trampoline for the method and the magic trampoline will set the slot to the method's native code address once it is compiled. In case of collisions in the IMT slot, the JIT performs a linear search if the colliding methods are few or a binary search otherwise. To make this easier for each JIT port, a sort of internal representation of the code is created: this is an array of MonoIMTCheckItem structures built in a way to allow easy generation of a bsearch, when the list of colliding methods becomes large. Each item in the array represents either a direct check for a method to be invoked or a bisection check in the bsearch algorithm. struct _MonoIMTCheckItem { MonoMethod *method; int check_target_idx; int vtable_slot; guint8 *jmp_code; guint8 *code_target; guint8 is_equals; guint8 compare_done; guint8 chunk_size; guint8 short_branch; }; For a direct check, the is_equals value is non-zero and the emitted code should be equivalent to: if (magic_reg != item->method) jump_to_item (array [item->check_target_idx]); jump_to_vtable (item->vtable_slot); Note that if item->check_target_idx is 0, the jump should be omitted since this is the end of a linear sequence (you might want to insert a jump to a breakpoint, though, for debugging) and this would mean that we have an error: the IMT slot was asked to execute an interface method that the type doesn't implement. In the future we might want to handle this case not with a breakpoint or assert, but by either throwing an InvalidCast exception or by going into the runtime and adding support for the interface automagically to the type/vtable: this could be used both for transparent proxies and for the implicit interfaces that vectors in 2.0 provide. For a bisect check the code is even simpler: if (magic_reg >= item->method) jump_to_item (array [item->check_target_idx]); In this case item->check_target_idx is always non-zero. Note that in both cases item->method becomes an immediate constant in the jitted code. The other fields in the structure are there to provide to the backend common storage for data needed during emission. As each item's code is emitted, the start of it is stored in the code_target field. At the same time when a conditional branch is inserted, its address is stored in jmp_code: this way with a single forward pass on the array at the end of the emission phase the branches can be patched to point to the proper target item's code (this process would patch the jump_to_item pseudo instructions described above). chunk_size can be used to store the size of the code generated for the item: this can be used to optimize the short/long branch instructions, together with info stored in short_branch. It is also used to calculate the size of the code to allocate for the whole IMT thunk. The compare_done field can be used to avoid doing an additional compare in a is_equals item for the same MonoMethod that was just compared in a bisecting item. Suppose we have 4 methods colliding in a slot, A, B, C and D. The arch-independent code already took care of sorting them, so that: A < B < C < D The generated code will look like (M is the method to call): compare (C, M) goto upper_sequence if bigger_equals /* linear sequence */ compare (M, A) goto B_found if not_equals jump to A's slot B_found: jump to B's slot upper_sequence: /* we just did a compare against C, no need to compare again */ goto D_found if not_equals jump to C's slot D_found: jump to D's slot This optimization is of course valid for architectures with flags registers. As a further optimization to reduce memory usage, the Mono runtime sets the IMT slots initially to a single-instance magic trampoline so there is actually no memory used up by the thunks in the case of collisions. When an interface method is called the magic trampoline will fill-in the IMT slot with the proper thunk or trampoline, so later calls will use the fast path. This single-instance trampoline will use MONO_FAKE_IMT_METHOD as the method it's asking to be compiled and executed: the trampoline code does recognize this special value and retrieves the interface method to call from the usual MONO_ARCH_IMT_REG saved by the trampoline code. Given that only the IMT slots that are actually used will be initialized, this saves quite a bit of memory, as it's unlikely that all the interface methods are called on all the different types.