1 Mono Ahead Of Time Compiler
2 ===========================
4 The Ahead of Time compilation feature in Mono allows Mono to
5 precompile assemblies to minimize JIT time, reduce memory
6 usage at runtime and increase the code sharing across multiple
7 running Mono application.
9 To precompile an assembly use the following command:
11 mono --aot -O=all assembly.exe
13 The `--aot' flag instructs Mono to ahead-of-time compile your
14 assembly, while the -O=all flag instructs Mono to use all the
15 available optimizations.
20 Besides code, the AOT file also contains cached metadata information which allows
21 the runtime to avoid certain computations at runtime, like the computation of
22 generic vtables. This reduces both startup time, and memory usage. It is possible
23 to create an AOT image which contains only this cached information and no code by
24 using the 'metadata-only' option during compilation:
26 mono --aot=metadata-only assembly.exe
28 This works even on platforms where AOT is not normally supported.
30 * Position Independent Code
31 ---------------------------
33 On x86 and x86-64 the code generated by Ahead-of-Time compiled
34 images is position-independent code. This allows the same
35 precompiled image to be reused across multiple applications
36 without having different copies: this is the same way in which
37 ELF shared libraries work: the code produced can be relocated
40 The implementation of Position Independent Code had a
41 performance impact on Ahead-of-Time compiled images but
42 compiler bootstraps are still faster than JIT-compiled images,
43 specially with all the new optimizations provided by the Mono
46 * How to support Position Independent Code in new Mono Ports
47 ------------------------------------------------------------
49 Generated native code needs to reference various runtime
50 structures/functions whose address is only known at run
51 time. JITted code can simple embed the address into the native
52 code, but AOT code needs to do an indirection. This
53 indirection is done through a table called the Global Offset
54 Table (GOT), which is similar to the GOT table in the Elf
55 spec. When the runtime saves the AOT image, it saves some
56 information for each method describing the GOT table entries
57 used by that method. When loading a method from an AOT image,
58 the runtime will fill out the GOT entries needed by the
61 * Computing the address of the GOT
63 Methods which need to access the GOT first need to compute its
64 address. On the x86 it is done by code like this:
68 add <OFFSET TO GOT>, ebx
69 <save got addr to a register>
71 The variable representing the got is stored in
72 cfg->got_var. It is allways allocated to a global register to
73 prevent some problems with branches + basic blocks.
75 * Referencing GOT entries
77 Any time the native code needs to access some other runtime
78 structure/function (i.e. any time the backend calls
79 mono_add_patch_info ()), the code pointed by the patch needs
80 to load the value from the got. For example, instead of:
84 call *<OFFSET>(<GOT REG>)
86 Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler.
88 For more examples on the changes required, see
90 svn diff -r 37739:38213 mini-x86.c
92 * The Program Linkage Table
94 As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is
95 made to an entry in the Program Linkage Table (PLT). This is based on the fact that on
96 most architectures, call instructions use a displacement instead of an absolute address, so
97 they are already position independent. An PLT entry is usually a jump instruction, which
98 initially points to some trampoline code which transfers control to the AOT loader, which
99 will compile the called method, and patch the PLT entry so that further calls are made
100 directly to the called method.
101 If the called method is in the same assembly, and does not need initialization (i.e. it
102 doesn't have GOT slots etc), then the call is made directly, bypassing the PLT.
104 * The Precompiled File Format
105 -----------------------------
107 We use the native object format of the platform. That way it
108 is possible to reuse existing tools like objdump and the
109 dynamic loader. All we need is a working assembler, i.e. we
110 write out a text file which is then passed to gas (the gnu
111 assembler) to generate the object file.
113 The precompiled image is stored in a file next to the original
114 assembly that is precompiled with the native extension for a shared
115 library (on Linux its ".so" to the generated file).
117 For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
119 The following things are saved in the object file and can be
120 looked up using the equivalent to dlsym:
124 A copy of the assembly GUID.
128 The format of the AOT file format.
132 The optimizations flags used to build this
137 Contains additional information needed by the runtime for using the
138 precompiled method, like the GOT entries it uses.
142 Maps method indexes to offsets in the method_infos array.
146 A table that lists all the internal calls
147 references by the precompiled image.
151 A list of assemblies referenced by this AOT
156 The precompiled code itself.
160 Maps method indexes to offsets in the methods array.
164 Contains information about methods which is rarely used during normal execution,
165 like exception and debug info.
169 Maps method indexes to offsets in the ex_info array.
173 Contains precomputed metadata used to speed up various runtime functions.
177 Maps class indexes to offsets in the class_info array.
181 A hash table mapping class names to class indexes. Used to speed up
182 mono_class_from_name ().
186 The Program Linkage Table
190 Contains information needed to find the method belonging to a given PLT entry.
192 * Performance considerations
193 ----------------------------
195 Using AOT code is a trade-off which might lead to higher or
196 slower performance, depending on a lot of circumstances. Some
199 - AOT code needs to be loaded from disk before being used, so
200 cold startup of an application using AOT code MIGHT be
201 slower than using JITed code. Warm startup (when the code is
202 already in the machines cache) should be faster. Also,
203 JITing code takes time, and the JIT compiler also need to
204 load additional metadata for the method from the disk, so
205 startup can be faster even in the cold startup case.
207 - AOT code is usually compiled with all optimizations turned
208 on, while JITted code is usually compiled with default
209 optimizations, so the generated code in the AOT case should
212 - JITted code can directly access runtime data structures and
213 helper functions, while AOT code needs to go through an
214 indirection (the GOT) to access them, so it will be slower
215 and somewhat bigger as well.
217 - When JITting code, the JIT compiler needs to load a lot of
218 metadata about methods and types into memory.
220 - JITted code has better locality, meaning that if A method
221 calls B, then the native code for A and B is usually quite
222 close in memory, leading to better cache behaviour thus
223 improved performance. In contrast, the native code of
224 methods inside the AOT file is in a somewhat random order.
229 - Currently, when an AOT module is loaded, all of its
230 dependent assemblies are also loaded eagerly, and these
231 assemblies need to be exactly the same as the ones loaded
232 when the AOT module was created ('hard binding'). Non-hard
233 binding should be allowed.
235 - On x86, the generated code uses call 0, pop REG, add
236 GOTOFFSET, REG to materialize the GOT address. Newer
237 versions of gcc use a separate function to do this, maybe we
240 - Currently, we get vtable addresses from the GOT. Another
241 solution would be to store the data from the vtables in the
242 .bss section, so accessing them would involve less