Mono Ahead Of Time Compiler
===========================
-The new mono JIT has sophisticated optimization features. It uses SSA and has a
-pluggable architecture for further optimizations. This makes it possible and
-efficient to use the JIT also for AOT compilation.
+ The Ahead of Time compilation feature in Mono allows Mono to
+ precompile assemblies to minimize JIT time, reduce memory
+ usage at runtime and increase the code sharing across multiple
+ running Mono application.
+ To precompile an assembly use the following command:
+
+ mono --aot -O=all assembly.exe
-* file format: We use the native object format of the platform. That way it is
- possible to reuse existing tools like objdump and the dynamic loader. All we
- need is a working assembler, i.e. we write out a text file which is then
- passed to gas (the gnu assembler) to generate the object file.
+ The `--aot' flag instructs Mono to ahead-of-time compile your
+ assembly, while the -O=all flag instructs Mono to use all the
+ available optimizations.
-* file names: we simply add ".so" to the generated file. For example:
- basic.exe -> basic.exe.so
- corlib.dll -> corlib.dll.so
+* Position Independent Code
+---------------------------
-* staring the AOT compiler: mini --aot assembly_name
+ On x86 and x86-64 the code generated by Ahead-of-Time compiled
+ images is position-independent code. This allows the same
+ precompiled image to be reused across multiple applications
+ without having different copies: this is the same way in which
+ ELF shared libraries work: the code produced can be relocated
+ to any address.
-The following things are saved in the object file:
+ The implementation of Position Independent Code had a
+ performance impact on Ahead-of-Time compiled images but
+ compiler bootstraps are still faster than JIT-compiled images,
+ specially with all the new optimizations provided by the Mono
+ engine.
-* version infos:
+* How to support Position Independent Code in new Mono Ports
+------------------------------------------------------------
-* native code: this is labeled with method_XXXXXXXX: where XXXXXXXX is the
- hexadecimal token number of the method.
+ Generated native code needs to reference various runtime
+ structures/functions whose address is only known at run
+ time. JITted code can simple embed the address into the native
+ code, but AOT code needs to do an indirection. This
+ indirection is done through a table called the Global Offset
+ Table (GOT), which is similar to the GOT table in the Elf
+ spec. When the runtime saves the AOT image, it saves some
+ information for each method describing the GOT table entries
+ used by that method. When loading a method from an AOT image,
+ the runtime will fill out the GOT entries needed by the
+ method.
-* additional informations needed by the runtime: For example we need to store
- the code length and the exception tables. We also need a way to patch
- constants only available at runtime (for example vtable and class
- addresses). This is stored i a binary blob labeled method_info_XXXXXXXX:
+ * Computing the address of the GOT
-PROBLEMS:
+ Methods which need to access the GOT first need to compute its
+ address. On the x86 it is done by code like this:
+
+ call <IP + 5>
+ pop ebx
+ add <OFFSET TO GOT>, ebx
+ <save got addr to a register>
+
+ The variable representing the got is stored in
+ cfg->got_var. It is allways allocated to a global register to
+ prevent some problems with branches + basic blocks.
+
+ * Referencing GOT entries
+
+ Any time the native code needs to access some other runtime
+ structure/function (i.e. any time the backend calls
+ mono_add_patch_info ()), the code pointed by the patch needs
+ to load the value from the got. For example, instead of:
+
+ call <ABSOLUTE ADDR>
+ it needs to do:
+ call *<OFFSET>(<GOT REG>)
+
+ Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler.
+
+ For more examples on the changes required, see
+
+ svn diff -r 37739:38213 mini-x86.c
+
+* The Precompiled File Format
+-----------------------------
+
+ We use the native object format of the platform. That way it
+ is possible to reuse existing tools like objdump and the
+ dynamic loader. All we need is a working assembler, i.e. we
+ write out a text file which is then passed to gas (the gnu
+ assembler) to generate the object file.
+
+ The precompiled image is stored in a file next to the original
+ assembly that is precompiled with the native extension for a shared
+ library (on Linux its ".so" to the generated file).
+
+ For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
+
+ The following things are saved in the object file and can be
+ looked up using the equivalent to dlsym:
+
+ mono_assembly_guid
+
+ A copy of the assembly GUID.
+
+ mono_aot_version
+
+ The format of the AOT file format.
+
+ mono_aot_opt_flags
+
+ The optimizations flags used to build this
+ precompiled image.
+
+ method_infos
+
+ Contains additional information needed by the runtime for using the
+ precompiled method, like the GOT entries it uses.
+
+ method_info_offsets
+
+ Maps method indexes to offsets in the method_infos array.
+
+ mono_icall_table
+
+ A table that lists all the internal calls
+ references by the precompiled image.
+
+ mono_image_table
+
+ A list of assemblies referenced by this AOT
+ module.
+
+ method_offsets
+
+ The equivalent to a procedure linkage table.
+
+* Performance considerations
+----------------------------
+
+Using AOT code is a trade-off which might lead to higher or slower performance,
+depending on a lot of circumstances. Some of these are:
+
+- AOT code needs to be loaded from disk before being used, so cold startup of
+ an application using AOT code MIGHT be slower than using JITed code. Warm
+ startup (when the code is already in the machines cache) should be faster.
+ Also, JITing code takes time, and the JIT compiler also need to load
+ additional metadata for the method from the disk, so startup can be faster
+ even in the cold startup case.
+- AOT code is usually compiled with all optimizations turned on, while JITted
+ code is usually compiled with default optimizations, so the generated code
+ in the AOT case should be faster.
+- JITted code can directly access runtime data structures and helper functions,
+ while AOT code needs to go through an indirection (the GOT) to access them,
+ so it will be slower and somewhat bigger as well.
+- When JITting code, the JIT compiler needs to load a lot of metadata about
+ methods and types into memory.
+- JITted code has better locality, meaning that if A method calls B, then
+ the native code for A and B is usually quite close in memory, leading to
+ better cache behaviour thus improved performance. In contrast, the native
+ code of methods inside the AOT file is in a somewhat random order.
+
+* Future Work
+-------------
+
+- Currently, the runtime needs to setup some data structures and fill out
+ GOT entries before a method is first called. This means that even calls to
+ a method whose code is in the same AOT image need to go through the GOT,
+ instead of using a direct call.
+- On x86, the generated code uses call 0, pop REG, add GOTOFFSET, REG to
+ materialize the GOT address. Newer versions of gcc use a separate function
+ to do this, maybe we need to do the same.
+- Currently, we get vtable addresses from the GOT. Another solution would be
+ to store the data from the vtables in the .bss section, so accessing them
+ would involve less indirection.
-- all precompiled methods must be domain independent, or we add patch infos to
- patch the target doamin.
-- the main problem is how to patch runtime related addresses, for example:
- - current application domain
- - string objects loaded with LDSTR
- - address of MonoClass data
- - static field offsets
- - method addreses
- - virtual function and interface slots