docs/aot-compiler.txt

   1 Mono Ahead Of Time Compiler
   2 ===========================
   3
   4         The Ahead of Time compilation feature in Mono allows Mono to
   5         precompile assemblies to minimize JIT time, reduce memory
   6         usage at runtime and increase the code sharing across multiple
   7         running Mono application.
   8
   9         To precompile an assembly use the following command:
  10
  11            mono --aot -O=all assembly.exe
  12
  13         The `--aot' flag instructs Mono to ahead-of-time compile your
  14         assembly, while the -O=all flag instructs Mono to use all the
  15         available optimizations.
  16
  17 * Caching metadata
  18 ------------------
  19
  20         Besides code, the AOT file also contains cached metadata information which allows
  21         the runtime to avoid certain computations at runtime, like the computation of
  22         generic vtables. This reduces both startup time, and memory usage. It is possible
  23         to create an AOT image which contains only this cached information and no code by
  24         using the 'metadata-only' option during compilation:
  25
  26            mono --aot=metadata-only assembly.exe
  27
  28         This works even on platforms where AOT is not normally supported.
  29
  30 * Position Independent Code
  31 ---------------------------
  32
  33         On x86 and x86-64 the code generated by Ahead-of-Time compiled
  34         images is position-independent code.  This allows the same
  35         precompiled image to be reused across multiple applications
  36         without having different copies: this is the same way in which
  37         ELF shared libraries work: the code produced can be relocated
  38         to any address.
  39
  40         The implementation of Position Independent Code had a
  41         performance impact on Ahead-of-Time compiled images but
  42         compiler bootstraps are still faster than JIT-compiled images,
  43         specially with all the new optimizations provided by the Mono
  44         engine.
  45
  46 * How to support Position Independent Code in new Mono Ports
  47 ------------------------------------------------------------
  48
  49         Generated native code needs to reference various runtime
  50         structures/functions whose address is only known at run
  51         time. JITted code can simple embed the address into the native
  52         code, but AOT code needs to do an indirection. This
  53         indirection is done through a table called the Global Offset
  54         Table (GOT), which is similar to the GOT table in the Elf
  55         spec.  When the runtime saves the AOT image, it saves some
  56         information for each method describing the GOT table entries
  57         used by that method. When loading a method from an AOT image,
  58         the runtime will fill out the GOT entries needed by the
  59         method.
  60
  61    * Computing the address of the GOT
  62
  63         Methods which need to access the GOT first need to compute its
  64         address. On the x86 it is done by code like this:
  65
  66                 call <IP + 5>
  67                 pop ebx
  68                 add <OFFSET TO GOT>, ebx
  69                 <save got addr to a register>
  70
  71         The variable representing the got is stored in
  72         cfg->got_var. It is allways allocated to a global register to
  73         prevent some problems with branches + basic blocks.
  74
  75    * Referencing GOT entries
  76
  77         Any time the native code needs to access some other runtime
  78         structure/function (i.e. any time the backend calls
  79         mono_add_patch_info ()), the code pointed by the patch needs
  80         to load the value from the got. For example, instead of:
  81
  82         call <ABSOLUTE ADDR>
  83         it needs to do:
  84         call *<OFFSET>(<GOT REG>)
  85
  86         Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler.
  87
  88         For more examples on the changes required, see
  89
  90         svn diff -r 37739:38213 mini-x86.c
  91
  92         * The Program Linkage Table
  93
  94         As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is
  95         made to an entry in the Program Linkage Table (PLT). This is based on the fact that on
  96         most architectures, call instructions use a displacement instead of an absolute address, so
  97         they are already position independent. An PLT entry is usually a jump instruction, which
  98         initially points to some trampoline code which transfers control to the AOT loader, which
  99         will compile the called method, and patch the PLT entry so that further calls are made
 100         directly to the called method.
 101         If the called method is in the same assembly, and does not need initialization (i.e. it
 102     doesn't have GOT slots etc), then the call is made directly, bypassing the PLT.
 103
 104 * Implementation
 105 ----------------
 106
 107 ** The Precompiled File Format
 108 -----------------------------
 109
 110         We use the native object format of the platform. That way it
 111         is possible to reuse existing tools like objdump and the
 112         dynamic loader. All we need is a working assembler, i.e. we
 113         write out a text file which is then passed to gas (the gnu
 114         assembler) to generate the object file.
 115
 116         The precompiled image is stored in a file next to the original
 117         assembly that is precompiled with the native extension for a shared
 118         library (on Linux its ".so" to the generated file).
 119
 120         For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
 121
 122         To avoid symbol lookup overhead and to save space, some things like the
 123         compiled code of the individual methods are not identified by specific symbols
 124     like method_code_1234. Instead, they are stored in one big array and the
 125         offsets inside this array are stored in another array, requiring just two
 126         symbols. The offsets array is usually named 'FOO_offsets', where FOO is the
 127         array the offsets refer to, like 'methods', and 'method_offsets'.
 128
 129         Generating code using an assembler and linker has some disadvantages:
 130         - it requires GNU binutils or an equivalent package to be installed on the
 131           machine running the aot compilation.
 132         - it is slow.
 133
 134         There is some support in the aot compiler for directly emitting elf files, but
 135         its not complete (yet).
 136
 137         The following things are saved in the object file and can be
 138         looked up using the equivalent to dlsym:
 139
 140                 mono_assembly_guid
 141
 142                         A copy of the assembly GUID.
 143
 144                 mono_aot_version
 145
 146                         The format of the AOT file format.
 147
 148                 mono_aot_opt_flags
 149
 150                         The optimizations flags used to build this
 151                         precompiled image.
 152
 153                 method_infos
 154
 155                         Contains additional information needed by the runtime for using the
 156                         precompiled method, like the GOT entries it uses.
 157
 158                 method_info_offsets
 159
 160                     Maps method indexes to offsets in the method_infos array.
 161
 162                 mono_icall_table
 163
 164                         A table that lists all the internal calls
 165                         references by the precompiled image.
 166
 167                 mono_image_table
 168
 169                         A list of assemblies referenced by this AOT
 170                         module.
 171
 172                 methods
 173
 174                         The precompiled code itself.
 175
 176                 method_offsets
 177
 178                         Maps method indexes to offsets in the methods array.
 179
 180                 ex_info
 181
 182                         Contains information about methods which is rarely used during normal execution,
 183                         like exception and debug info.
 184
 185                 ex_info_offsets
 186
 187                         Maps method indexes to offsets in the ex_info array.
 188
 189                 class_info
 190
 191                         Contains precomputed metadata used to speed up various runtime functions.
 192
 193                 class_info_offsets
 194
 195                         Maps class indexes to offsets in the class_info array.
 196
 197                 class_name_table
 198
 199                         A hash table mapping class names to class indexes. Used to speed up
 200                         mono_class_from_name ().
 201
 202                 plt
 203
 204                         The Program Linkage Table
 205
 206                 plt_info
 207
 208                         Contains information needed to find the method belonging to a given PLT entry.
 209
 210 ** Source file structure
 211 -----------------------------
 212
 213         The AOT infrastructure is split into two files, aot-compiler.c and
 214         aot-runtime.c. aot-compiler.c contains the AOT compiler which is invoked by
 215         --aot, while aot-runtime.c contains the runtime support needed for loading
 216         code and other things from the aot files.
 217
 218 ** Compilation process
 219 ----------------------------
 220
 221         AOT compilation consists of the following stages:
 222         - collecting the methods to be compiled.
 223         - compiling them using the JIT.
 224         - emitting the JITted code and other information into an assembly file (.s).
 225         - assembling the file using the system assembler.
 226         - linking the resulting object file into a shared library using the system
 227           linker.
 228
 229 ** Handling compiled code
 230 ----------------------------
 231
 232           Each method is identified by a method index. For normal methods, this is
 233         equivalent to its index in the METHOD metadata table. For runtime generated
 234         methods (wrappers), it is an arbitrary number.
 235           Compiled code is created by invoking the JIT, requesting it to created AOT
 236         code instead of normal code. This is done by the compile_method () function.
 237         The output of the JIT is compiled code and a set of patches (relocations). Each
 238         relocation specifies an offset inside the compiled code, and a runtime object
 239         whose address is accessed at that offset.
 240         Patches are described by a MonoJumpInfo structure. From the perspective
 241         of the AOT compiler, there are two kinds of patches:
 242         - calls, which require an entry in the PLT table.
 243         - everything else, which require an entry in the GOT table.
 244         How patches is handled is described in the next section.
 245           After all the method are compiled, they are emitted into the output file into
 246           a byte array called 'methods', The emission
 247         is done by the emit_method_code () and emit_and_reloc_code () functions. Each
 248         piece of compiled code is identified by the local symbol .Lm_<method index>.
 249         While compiled code is emitted, all the locations which have an associated patch
 250         are rewritten using a platform specific process so the final generated code will
 251         refer to the plt and got entries belonging to the patches.
 252         The compiled code array
 253 can be accessed using the 'methods' global symbol.
 254
 255 ** Handling patches
 256 ----------------------------
 257
 258           Before a piece of AOTed code can be used, the GOT entries used by it must be
 259         filled out with the addresses of runtime objects. Those objects are identified
 260         by MonoJumpInfo structures. These stuctures are saved in a serialized form in
 261         the AOT file, so the AOT loader can deconstruct them. The serialization is done
 262         by the encode_patch () function, while the deserialization is done by the
 263         decode_patch_info () function.
 264         Every method has an associated method info blob inside the 'method_info' byte
 265         array in the AOT file. This contains all the information required to load the
 266         method at runtime:
 267         - the first got entry used by the method.
 268         - the number of got entries used by the method.
 269         - the serialized patch info for the got entries.
 270         Some patches, like vtables, icalls are very common, so instead of emitting their
 271         info every time they are used by a method, we emit the info only once into a
 272         byte array named 'got_info', and only emit an index into this array for every
 273         access.
 274
 275 ** The Procedure Linkage Table (PLT)
 276 ------------------------------------
 277
 278         Our PLT is similar to the elf PLT, it is used to handle calls between methods.
 279         If method A needs to call method B, then an entry is allocated in the PLT for
 280         method B, and A calls that entry instead of B directly. This is useful because
 281         in some cases the runtime needs to do some processing the first time B is
 282         called.
 283         There are two cases:
 284         - if B is in another assembly, then it needs to be looked up, then JITted or the
 285         corresponding AOT code needs to be found.
 286         - if B is in the same assembly, but has got slots, then the got slots need to be
 287         initialized.
 288         If none of these cases is true, then the PLT is not used, and the call is made
 289         directly to the native code of the target method.
 290         A PLT entry is usually implemented by a jump though a jump table, where the
 291         jump table entries are initially filled up with the address of a trampoline so
 292         the runtime can get control, and after the native code of the called method is
 293         created/found, the jump table entry is changed to point to the native code.
 294         All PLT entries also embed a integer offset after the jump which indexes into
 295         the 'plt_info' table, which stores the information required to find the called
 296         method. The PLT is emitted by the emit_plt () function.
 297
 298 ** Exception/Debug info
 299 ----------------------------
 300
 301         Each compiled method has some additional info generated by the JIT, usable
 302         for debugging (IL offset-native offset maps) and exception handling
 303         (saved registers, native offsets of try/catch clauses). Since this info is
 304         rarely needed, it is saved into a separate byte array called 'ex_info'.
 305
 306 ** Cached metadata
 307 ---------------------------
 308
 309         When the runtime loads a class, it needs to compute a variety of information
 310         which is not readily available in the metadata, like the instance size,
 311         vtable, whenever the class has a finalizer/type initializer etc. Computing this
 312         information requires a lot of time, causes the loading of lots of metadata,
 313         and it usually involves the creation of many runtime data structures
 314         (MonoMethod/MonoMethodSignature etc), which are long living, and usually persist
 315         for the lifetime of the app. To avoid this, we compute the required information
 316         at aot compilation time, and save it into the aot image, into an array called
 317         'class_info'. The runtime can query this information using the
 318         mono_aot_get_cached_class_info () function, and if the information is available,
 319         it can avoid computing it.
 320
 321 ** Full AOT mode
 322 -------------------------
 323
 324         Some platforms like the iphone prohibit JITted code, using technical and/or
 325         legal means. This is a significant problem for the mono runtime, since it
 326         generates a lot of code dynamically, using either the JIT or more low-level
 327         code generation macros. To solve this, the AOT compiler is able to function in
 328         full-aot or aot-only mode, where it generates and saves all the neccesary code
 329         in the aot image, so at runtime, no code needs to be generated.
 330         There are two kinds of code which needs to be considered:
 331         - wrapper methods, that is methods whose IL is generated dynamically by the
 332           runtime. They are handled by generating them in the add_wrappers () function,
 333           then emitting them the same way as the 'normal' methods. The only problem is
 334           that these methods do not have a methoddef token, so we need a separate table
 335           in the aot image ('wrapper_info') to find their method index.
 336         - trampolines and other small hand generated pieces of code. They are handled
 337           in an ad-hoc way in the emit_trampolines () function.
 338
 339 * Performance considerations
 340 ----------------------------
 341
 342         Using AOT code is a trade-off which might lead to higher or
 343         slower performance, depending on a lot of circumstances. Some
 344         of these are:
 345
 346         - AOT code needs to be loaded from disk before being used, so
 347           cold startup of an application using AOT code MIGHT be
 348           slower than using JITed code. Warm startup (when the code is
 349           already in the machines cache) should be faster.  Also,
 350           JITing code takes time, and the JIT compiler also need to
 351           load additional metadata for the method from the disk, so
 352           startup can be faster even in the cold startup case.
 353
 354         - AOT code is usually compiled with all optimizations turned
 355           on, while JITted code is usually compiled with default
 356           optimizations, so the generated code in the AOT case should
 357           be faster.
 358
 359         - JITted code can directly access runtime data structures and
 360           helper functions, while AOT code needs to go through an
 361           indirection (the GOT) to access them, so it will be slower
 362           and somewhat bigger as well.
 363
 364         - When JITting code, the JIT compiler needs to load a lot of
 365           metadata about methods and types into memory.
 366
 367         - JITted code has better locality, meaning that if A method
 368           calls B, then the native code for A and B is usually quite
 369           close in memory, leading to better cache behaviour thus
 370           improved performance. In contrast, the native code of
 371           methods inside the AOT file is in a somewhat random order.
 372
 373 * Future Work
 374 -------------
 375
 376         - Currently, when an AOT module is loaded, all of its
 377           dependent assemblies are also loaded eagerly, and these
 378           assemblies need to be exactly the same as the ones loaded
 379           when the AOT module was created ('hard binding'). Non-hard
 380           binding should be allowed.
 381
 382         - On x86, the generated code uses call 0, pop REG, add
 383           GOTOFFSET, REG to materialize the GOT address. Newer
 384           versions of gcc use a separate function to do this, maybe we
 385           need to do the same.
 386
 387         - Currently, we get vtable addresses from the GOT. Another
 388           solution would be to store the data from the vtables in the
 389           .bss section, so accessing them would involve less
 390           indirection.
 391
 392
 393