1 Mono Ahead Of Time Compiler
2 ===========================
4 The Ahead of Time compilation feature in Mono allows Mono to
5 precompile assemblies to minimize JIT time, reduce memory
6 usage at runtime and increase the code sharing across multiple
7 running Mono application.
9 To precompile an assembly use the following command:
11 mono --aot -O=all assembly.exe
13 The `--aot' flag instructs Mono to ahead-of-time compile your
14 assembly, while the -O=all flag instructs Mono to use all the
15 available optimizations.
20 Besides code, the AOT file also contains cached metadata information which allows
21 the runtime to avoid certain computations at runtime, like the computation of
22 generic vtables. This reduces both startup time, and memory usage. It is possible
23 to create an AOT image which contains only this cached information and no code by
24 using the 'metadata-only' option during compilation:
26 mono --aot=metadata-only assembly.exe
28 This works even on platforms where AOT is not normally supported.
30 * Position Independent Code
31 ---------------------------
33 On x86 and x86-64 the code generated by Ahead-of-Time compiled
34 images is position-independent code. This allows the same
35 precompiled image to be reused across multiple applications
36 without having different copies: this is the same way in which
37 ELF shared libraries work: the code produced can be relocated
40 The implementation of Position Independent Code had a
41 performance impact on Ahead-of-Time compiled images but
42 compiler bootstraps are still faster than JIT-compiled images,
43 specially with all the new optimizations provided by the Mono
46 * How to support Position Independent Code in new Mono Ports
47 ------------------------------------------------------------
49 Generated native code needs to reference various runtime
50 structures/functions whose address is only known at run
51 time. JITted code can simple embed the address into the native
52 code, but AOT code needs to do an indirection. This
53 indirection is done through a table called the Global Offset
54 Table (GOT), which is similar to the GOT table in the Elf
55 spec. When the runtime saves the AOT image, it saves some
56 information for each method describing the GOT table entries
57 used by that method. When loading a method from an AOT image,
58 the runtime will fill out the GOT entries needed by the
61 * Computing the address of the GOT
63 Methods which need to access the GOT first need to compute its
64 address. On the x86 it is done by code like this:
68 add <OFFSET TO GOT>, ebx
69 <save got addr to a register>
71 The variable representing the got is stored in
72 cfg->got_var. It is allways allocated to a global register to
73 prevent some problems with branches + basic blocks.
75 * Referencing GOT entries
77 Any time the native code needs to access some other runtime
78 structure/function (i.e. any time the backend calls
79 mono_add_patch_info ()), the code pointed by the patch needs
80 to load the value from the got. For example, instead of:
84 call *<OFFSET>(<GOT REG>)
86 Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler.
88 For more examples on the changes required, see
90 svn diff -r 37739:38213 mini-x86.c
92 * The Program Linkage Table
94 As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is
95 made to an entry in the Program Linkage Table (PLT). This is based on the fact that on
96 most architectures, call instructions use a displacement instead of an absolute address, so
97 they are already position independent. An PLT entry is usually a jump instruction, which
98 initially points to some trampoline code which transfers control to the AOT loader, which
99 will compile the called method, and patch the PLT entry so that further calls are made
100 directly to the called method.
101 If the called method is in the same assembly, and does not need initialization (i.e. it
102 doesn't have GOT slots etc), then the call is made directly, bypassing the PLT.
107 ** The Precompiled File Format
108 -----------------------------
110 We use the native object format of the platform. That way it
111 is possible to reuse existing tools like objdump and the
112 dynamic loader. All we need is a working assembler, i.e. we
113 write out a text file which is then passed to gas (the gnu
114 assembler) to generate the object file.
116 The precompiled image is stored in a file next to the original
117 assembly that is precompiled with the native extension for a shared
118 library (on Linux its ".so" to the generated file).
120 For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
122 To avoid symbol lookup overhead and to save space, some things like the
123 compiled code of the individual methods are not identified by specific symbols
124 like method_code_1234. Instead, they are stored in one big array and the
125 offsets inside this array are stored in another array, requiring just two
126 symbols. The offsets array is usually named 'FOO_offsets', where FOO is the
127 array the offsets refer to, like 'methods', and 'method_offsets'.
129 Generating code using an assembler and linker has some disadvantages:
130 - it requires GNU binutils or an equivalent package to be installed on the
131 machine running the aot compilation.
134 There is some support in the aot compiler for directly emitting elf files, but
135 its not complete (yet).
137 The following things are saved in the object file and can be
138 looked up using the equivalent to dlsym:
142 A copy of the assembly GUID.
146 The format of the AOT file format.
150 The optimizations flags used to build this
155 Contains additional information needed by the runtime for using the
156 precompiled method, like the GOT entries it uses.
160 Maps method indexes to offsets in the method_infos array.
164 A table that lists all the internal calls
165 references by the precompiled image.
169 A list of assemblies referenced by this AOT
174 The precompiled code itself.
178 Maps method indexes to offsets in the methods array.
182 Contains information about methods which is rarely used during normal execution,
183 like exception and debug info.
187 Maps method indexes to offsets in the ex_info array.
191 Contains precomputed metadata used to speed up various runtime functions.
195 Maps class indexes to offsets in the class_info array.
199 A hash table mapping class names to class indexes. Used to speed up
200 mono_class_from_name ().
204 The Program Linkage Table
208 Contains information needed to find the method belonging to a given PLT entry.
210 ** Source file structure
211 -----------------------------
213 The AOT infrastructure is split into two files, aot-compiler.c and
214 aot-runtime.c. aot-compiler.c contains the AOT compiler which is invoked by
215 --aot, while aot-runtime.c contains the runtime support needed for loading
216 code and other things from the aot files.
218 ** Compilation process
219 ----------------------------
221 AOT compilation consists of the following stages:
222 - collecting the methods to be compiled.
223 - compiling them using the JIT.
224 - emitting the JITted code and other information into an assembly file (.s).
225 - assembling the file using the system assembler.
226 - linking the resulting object file into a shared library using the system
229 ** Handling compiled code
230 ----------------------------
232 Each method is identified by a method index. For normal methods, this is
233 equivalent to its index in the METHOD metadata table. For runtime generated
234 methods (wrappers), it is an arbitrary number.
235 Compiled code is created by invoking the JIT, requesting it to created AOT
236 code instead of normal code. This is done by the compile_method () function.
237 The output of the JIT is compiled code and a set of patches (relocations). Each
238 relocation specifies an offset inside the compiled code, and a runtime object
239 whose address is accessed at that offset.
240 Patches are described by a MonoJumpInfo structure. From the perspective
241 of the AOT compiler, there are two kinds of patches:
242 - calls, which require an entry in the PLT table.
243 - everything else, which require an entry in the GOT table.
244 How patches is handled is described in the next section.
245 After all the method are compiled, they are emitted into the output file into
246 a byte array called 'methods', The emission
247 is done by the emit_method_code () and emit_and_reloc_code () functions. Each
248 piece of compiled code is identified by the local symbol .Lm_<method index>.
249 While compiled code is emitted, all the locations which have an associated patch
250 are rewritten using a platform specific process so the final generated code will
251 refer to the plt and got entries belonging to the patches.
252 The compiled code array
253 can be accessed using the 'methods' global symbol.
256 ----------------------------
258 Before a piece of AOTed code can be used, the GOT entries used by it must be
259 filled out with the addresses of runtime objects. Those objects are identified
260 by MonoJumpInfo structures. These stuctures are saved in a serialized form in
261 the AOT file, so the AOT loader can deconstruct them. The serialization is done
262 by the encode_patch () function, while the deserialization is done by the
263 decode_patch_info () function.
264 Every method has an associated method info blob inside the 'method_info' byte
265 array in the AOT file. This contains all the information required to load the
267 - the first got entry used by the method.
268 - the number of got entries used by the method.
269 - the serialized patch info for the got entries.
270 Some patches, like vtables, icalls are very common, so instead of emitting their
271 info every time they are used by a method, we emit the info only once into a
272 byte array named 'got_info', and only emit an index into this array for every
275 ** The Procedure Linkage Table (PLT)
276 ------------------------------------
278 Our PLT is similar to the elf PLT, it is used to handle calls between methods.
279 If method A needs to call method B, then an entry is allocated in the PLT for
280 method B, and A calls that entry instead of B directly. This is useful because
281 in some cases the runtime needs to do some processing the first time B is
284 - if B is in another assembly, then it needs to be looked up, then JITted or the
285 corresponding AOT code needs to be found.
286 - if B is in the same assembly, but has got slots, then the got slots need to be
288 If none of these cases is true, then the PLT is not used, and the call is made
289 directly to the native code of the target method.
290 A PLT entry is usually implemented by a jump though a jump table, where the
291 jump table entries are initially filled up with the address of a trampoline so
292 the runtime can get control, and after the native code of the called method is
293 created/found, the jump table entry is changed to point to the native code.
294 All PLT entries also embed a integer offset after the jump which indexes into
295 the 'plt_info' table, which stores the information required to find the called
296 method. The PLT is emitted by the emit_plt () function.
298 ** Exception/Debug info
299 ----------------------------
301 Each compiled method has some additional info generated by the JIT, usable
302 for debugging (IL offset-native offset maps) and exception handling
303 (saved registers, native offsets of try/catch clauses). Since this info is
304 rarely needed, it is saved into a separate byte array called 'ex_info'.
307 ---------------------------
309 When the runtime loads a class, it needs to compute a variety of information
310 which is not readily available in the metadata, like the instance size,
311 vtable, whenever the class has a finalizer/type initializer etc. Computing this
312 information requires a lot of time, causes the loading of lots of metadata,
313 and it usually involves the creation of many runtime data structures
314 (MonoMethod/MonoMethodSignature etc), which are long living, and usually persist
315 for the lifetime of the app. To avoid this, we compute the required information
316 at aot compilation time, and save it into the aot image, into an array called
317 'class_info'. The runtime can query this information using the
318 mono_aot_get_cached_class_info () function, and if the information is available,
319 it can avoid computing it.
322 -------------------------
324 Some platforms like the iphone prohibit JITted code, using technical and/or
325 legal means. This is a significant problem for the mono runtime, since it
326 generates a lot of code dynamically, using either the JIT or more low-level
327 code generation macros. To solve this, the AOT compiler is able to function in
328 full-aot or aot-only mode, where it generates and saves all the neccesary code
329 in the aot image, so at runtime, no code needs to be generated.
330 There are two kinds of code which needs to be considered:
331 - wrapper methods, that is methods whose IL is generated dynamically by the
332 runtime. They are handled by generating them in the add_wrappers () function,
333 then emitting them the same way as the 'normal' methods. The only problem is
334 that these methods do not have a methoddef token, so we need a separate table
335 in the aot image ('wrapper_info') to find their method index.
336 - trampolines and other small hand generated pieces of code. They are handled
337 in an ad-hoc way in the emit_trampolines () function.
339 * Performance considerations
340 ----------------------------
342 Using AOT code is a trade-off which might lead to higher or
343 slower performance, depending on a lot of circumstances. Some
346 - AOT code needs to be loaded from disk before being used, so
347 cold startup of an application using AOT code MIGHT be
348 slower than using JITed code. Warm startup (when the code is
349 already in the machines cache) should be faster. Also,
350 JITing code takes time, and the JIT compiler also need to
351 load additional metadata for the method from the disk, so
352 startup can be faster even in the cold startup case.
354 - AOT code is usually compiled with all optimizations turned
355 on, while JITted code is usually compiled with default
356 optimizations, so the generated code in the AOT case should
359 - JITted code can directly access runtime data structures and
360 helper functions, while AOT code needs to go through an
361 indirection (the GOT) to access them, so it will be slower
362 and somewhat bigger as well.
364 - When JITting code, the JIT compiler needs to load a lot of
365 metadata about methods and types into memory.
367 - JITted code has better locality, meaning that if A method
368 calls B, then the native code for A and B is usually quite
369 close in memory, leading to better cache behaviour thus
370 improved performance. In contrast, the native code of
371 methods inside the AOT file is in a somewhat random order.
376 - Currently, when an AOT module is loaded, all of its
377 dependent assemblies are also loaded eagerly, and these
378 assemblies need to be exactly the same as the ones loaded
379 when the AOT module was created ('hard binding'). Non-hard
380 binding should be allowed.
382 - On x86, the generated code uses call 0, pop REG, add
383 GOTOFFSET, REG to materialize the GOT address. Newer
384 versions of gcc use a separate function to do this, maybe we
387 - Currently, we get vtable addresses from the GOT. Another
388 solution would be to store the data from the vtables in the
389 .bss section, so accessing them would involve less