docs/aot-compiler.txt

   1 Mono Ahead Of Time Compiler
   2 ===========================
   3
   4         The Ahead of Time compilation feature in Mono allows Mono to
   5         precompile assemblies to minimize JIT time, reduce memory
   6         usage at runtime and increase the code sharing across multiple
   7         running Mono application.
   8
   9         To precompile an assembly use the following command:
  10
  11            mono --aot -O=all assembly.exe
  12
  13         The `--aot' flag instructs Mono to ahead-of-time compile your
  14         assembly, while the -O=all flag instructs Mono to use all the
  15         available optimizations.
  16
  17 * Caching metadata
  18 ------------------
  19
  20         Besides code, the AOT file also contains cached metadata information which allows
  21         the runtime to avoid certain computations at runtime, like the computation of
  22         generic vtables. This reduces both startup time, and memory usage. It is possible
  23         to create an AOT image which contains only this cached information and no code by
  24         using the 'metadata-only' option during compilation:
  25
  26            mono --aot=metadata-only assembly.exe
  27
  28         This works even on platforms where AOT is not normally supported.
  29
  30 * Position Independent Code
  31 ---------------------------
  32
  33         On x86 and x86-64 the code generated by Ahead-of-Time compiled
  34         images is position-independent code.  This allows the same
  35         precompiled image to be reused across multiple applications
  36         without having different copies: this is the same way in which
  37         ELF shared libraries work: the code produced can be relocated
  38         to any address.
  39
  40         The implementation of Position Independent Code had a
  41         performance impact on Ahead-of-Time compiled images but
  42         compiler bootstraps are still faster than JIT-compiled images,
  43         specially with all the new optimizations provided by the Mono
  44         engine.
  45
  46 * How to support Position Independent Code in new Mono Ports
  47 ------------------------------------------------------------
  48
  49         Generated native code needs to reference various runtime
  50         structures/functions whose address is only known at run
  51         time. JITted code can simple embed the address into the native
  52         code, but AOT code needs to do an indirection. This
  53         indirection is done through a table called the Global Offset
  54         Table (GOT), which is similar to the GOT table in the Elf
  55         spec.  When the runtime saves the AOT image, it saves some
  56         information for each method describing the GOT table entries
  57         used by that method. When loading a method from an AOT image,
  58         the runtime will fill out the GOT entries needed by the
  59         method.
  60
  61    * Computing the address of the GOT
  62
  63         Methods which need to access the GOT first need to compute its
  64         address. On the x86 it is done by code like this:
  65
  66                 call <IP + 5>
  67                 pop ebx
  68                 add <OFFSET TO GOT>, ebx
  69                 <save got addr to a register>
  70
  71         The variable representing the got is stored in
  72         cfg->got_var. It is allways allocated to a global register to
  73         prevent some problems with branches + basic blocks.
  74
  75    * Referencing GOT entries
  76
  77         Any time the native code needs to access some other runtime
  78         structure/function (i.e. any time the backend calls
  79         mono_add_patch_info ()), the code pointed by the patch needs
  80         to load the value from the got. For example, instead of:
  81
  82         call <ABSOLUTE ADDR>
  83         it needs to do:
  84         call *<OFFSET>(<GOT REG>)
  85
  86         Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler.
  87
  88         For more examples on the changes required, see
  89
  90         svn diff -r 37739:38213 mini-x86.c
  91
  92         * The Program Linkage Table
  93
  94         As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is
  95         made to an entry in the Program Linkage Table (PLT). This is based on the fact that on
  96         most architectures, call instructions use a displacement instead of an absolute address, so
  97         they are already position independent. An PLT entry is usually a jump instruction, which
  98         initially points to some trampoline code which transfers control to the AOT loader, which
  99         will compile the called method, and patch the PLT entry so that further calls are made
 100         directly to the called method.
 101         If the called method is in the same assembly, and does not need initialization (i.e. it
 102     doesn't have GOT slots etc), then the call is made directly, bypassing the PLT.
 103
 104 * The Precompiled File Format
 105 -----------------------------
 106
 107         We use the native object format of the platform. That way it
 108         is possible to reuse existing tools like objdump and the
 109         dynamic loader. All we need is a working assembler, i.e. we
 110         write out a text file which is then passed to gas (the gnu
 111         assembler) to generate the object file.
 112
 113         The precompiled image is stored in a file next to the original
 114         assembly that is precompiled with the native extension for a shared
 115         library (on Linux its ".so" to the generated file).
 116
 117         For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
 118
 119         The following things are saved in the object file and can be
 120         looked up using the equivalent to dlsym:
 121
 122                 mono_assembly_guid
 123
 124                         A copy of the assembly GUID.
 125
 126                 mono_aot_version
 127
 128                         The format of the AOT file format.
 129
 130                 mono_aot_opt_flags
 131
 132                         The optimizations flags used to build this
 133                         precompiled image.
 134
 135                 method_infos
 136
 137                         Contains additional information needed by the runtime for using the
 138                         precompiled method, like the GOT entries it uses.
 139
 140                 method_info_offsets
 141
 142                     Maps method indexes to offsets in the method_infos array.
 143
 144                 mono_icall_table
 145
 146                         A table that lists all the internal calls
 147                         references by the precompiled image.
 148
 149                 mono_image_table
 150
 151                         A list of assemblies referenced by this AOT
 152                         module.
 153
 154                 methods
 155
 156                         The precompiled code itself.
 157
 158                 method_offsets
 159
 160                         Maps method indexes to offsets in the methods array.
 161
 162                 ex_info
 163
 164                         Contains information about methods which is rarely used during normal execution,
 165                         like exception and debug info.
 166
 167                 ex_info_offsets
 168
 169                         Maps method indexes to offsets in the ex_info array.
 170
 171                 class_info
 172
 173                         Contains precomputed metadata used to speed up various runtime functions.
 174
 175                 class_info_offsets
 176
 177                         Maps class indexes to offsets in the class_info array.
 178
 179                 class_name_table
 180
 181                         A hash table mapping class names to class indexes. Used to speed up
 182                         mono_class_from_name ().
 183
 184                 plt
 185
 186                         The Program Linkage Table
 187
 188                 plt_info
 189
 190                         Contains information needed to find the method belonging to a given PLT entry.
 191
 192 * Performance considerations
 193 ----------------------------
 194
 195         Using AOT code is a trade-off which might lead to higher or
 196         slower performance, depending on a lot of circumstances. Some
 197         of these are:
 198
 199         - AOT code needs to be loaded from disk before being used, so
 200           cold startup of an application using AOT code MIGHT be
 201           slower than using JITed code. Warm startup (when the code is
 202           already in the machines cache) should be faster.  Also,
 203           JITing code takes time, and the JIT compiler also need to
 204           load additional metadata for the method from the disk, so
 205           startup can be faster even in the cold startup case.
 206
 207         - AOT code is usually compiled with all optimizations turned
 208           on, while JITted code is usually compiled with default
 209           optimizations, so the generated code in the AOT case should
 210           be faster.
 211
 212         - JITted code can directly access runtime data structures and
 213           helper functions, while AOT code needs to go through an
 214           indirection (the GOT) to access them, so it will be slower
 215           and somewhat bigger as well.
 216
 217         - When JITting code, the JIT compiler needs to load a lot of
 218           metadata about methods and types into memory.
 219
 220         - JITted code has better locality, meaning that if A method
 221           calls B, then the native code for A and B is usually quite
 222           close in memory, leading to better cache behaviour thus
 223           improved performance. In contrast, the native code of
 224           methods inside the AOT file is in a somewhat random order.
 225
 226 * Future Work
 227 -------------
 228
 229         - Currently, when an AOT module is loaded, all of its
 230           dependent assemblies are also loaded eagerly, and these
 231           assemblies need to be exactly the same as the ones loaded
 232           when the AOT module was created ('hard binding'). Non-hard
 233           binding should be allowed.
 234
 235         - On x86, the generated code uses call 0, pop REG, add
 236           GOTOFFSET, REG to materialize the GOT address. Newer
 237           versions of gcc use a separate function to do this, maybe we
 238           need to do the same.
 239
 240         - Currently, we get vtable addresses from the GOT. Another
 241           solution would be to store the data from the vtables in the
 242           .bss section, so accessing them would involve less
 243           indirection.
 244
 245
 246