X-Git-Url: http://wien.tomnetworks.com/gitweb/?a=blobdiff_plain;f=docs%2Faot-compiler.txt;h=3d77c0a11cadb0f71c4569e7b4159b5935f81c88;hb=122601004c427fa2da2c0878403100c303b9d7bc;hp=ab1af90d96584d1eafae6b96adb883570a185724;hpb=0abc2e6270020edc4a5b4c66f93b4ae582815f20;p=mono.git diff --git a/docs/aot-compiler.txt b/docs/aot-compiler.txt index ab1af90d965..3d77c0a11ca 100644 --- a/docs/aot-compiler.txt +++ b/docs/aot-compiler.txt @@ -1,44 +1,393 @@ Mono Ahead Of Time Compiler =========================== -The new mono JIT has sophisticated optimization features. It uses SSA and has a -pluggable architecture for further optimizations. This makes it possible and -efficient to use the JIT also for AOT compilation. + The Ahead of Time compilation feature in Mono allows Mono to + precompile assemblies to minimize JIT time, reduce memory + usage at runtime and increase the code sharing across multiple + running Mono application. + To precompile an assembly use the following command: + + mono --aot -O=all assembly.exe -* file format: We use the native object format of the platform. That way it is - possible to reuse existing tools like objdump and the dynamic loader. All we - need is a working assembler, i.e. we write out a text file which is then - passed to gas (the gnu assembler) to generate the object file. + The `--aot' flag instructs Mono to ahead-of-time compile your + assembly, while the -O=all flag instructs Mono to use all the + available optimizations. -* file names: we simply add ".so" to the generated file. For example: - basic.exe -> basic.exe.so - corlib.dll -> corlib.dll.so +* Caching metadata +------------------ -* staring the AOT compiler: mini --aot assembly_name + Besides code, the AOT file also contains cached metadata information which allows + the runtime to avoid certain computations at runtime, like the computation of + generic vtables. This reduces both startup time, and memory usage. It is possible + to create an AOT image which contains only this cached information and no code by + using the 'metadata-only' option during compilation: -The following things are saved in the object file: + mono --aot=metadata-only assembly.exe -* version infos: + This works even on platforms where AOT is not normally supported. -* native code: this is labeled with method_XXXXXXXX: where XXXXXXXX is the - hexadecimal token number of the method. +* Position Independent Code +--------------------------- -* additional informations needed by the runtime: For example we need to store - the code length and the exception tables. We also need a way to patch - constants only available at runtime (for example vtable and class - addresses). This is stored i a binary blob labeled method_info_XXXXXXXX: + On x86 and x86-64 the code generated by Ahead-of-Time compiled + images is position-independent code. This allows the same + precompiled image to be reused across multiple applications + without having different copies: this is the same way in which + ELF shared libraries work: the code produced can be relocated + to any address. -PROBLEMS: + The implementation of Position Independent Code had a + performance impact on Ahead-of-Time compiled images but + compiler bootstraps are still faster than JIT-compiled images, + specially with all the new optimizations provided by the Mono + engine. -- all precompiled methods must be domain independent, or we add patch infos to - patch the target doamin. +* How to support Position Independent Code in new Mono Ports +------------------------------------------------------------ + + Generated native code needs to reference various runtime + structures/functions whose address is only known at run + time. JITted code can simple embed the address into the native + code, but AOT code needs to do an indirection. This + indirection is done through a table called the Global Offset + Table (GOT), which is similar to the GOT table in the Elf + spec. When the runtime saves the AOT image, it saves some + information for each method describing the GOT table entries + used by that method. When loading a method from an AOT image, + the runtime will fill out the GOT entries needed by the + method. + + * Computing the address of the GOT + + Methods which need to access the GOT first need to compute its + address. On the x86 it is done by code like this: + + call + pop ebx + add , ebx + + + The variable representing the got is stored in + cfg->got_var. It is allways allocated to a global register to + prevent some problems with branches + basic blocks. + + * Referencing GOT entries + + Any time the native code needs to access some other runtime + structure/function (i.e. any time the backend calls + mono_add_patch_info ()), the code pointed by the patch needs + to load the value from the got. For example, instead of: + + call + it needs to do: + call *() + + Here, the can be 0, it will be fixed up by the AOT compiler. + + For more examples on the changes required, see + + svn diff -r 37739:38213 mini-x86.c + + * The Program Linkage Table + + As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is + made to an entry in the Program Linkage Table (PLT). This is based on the fact that on + most architectures, call instructions use a displacement instead of an absolute address, so + they are already position independent. An PLT entry is usually a jump instruction, which + initially points to some trampoline code which transfers control to the AOT loader, which + will compile the called method, and patch the PLT entry so that further calls are made + directly to the called method. + If the called method is in the same assembly, and does not need initialization (i.e. it + doesn't have GOT slots etc), then the call is made directly, bypassing the PLT. + +* Implementation +---------------- + +** The Precompiled File Format +----------------------------- + + We use the native object format of the platform. That way it + is possible to reuse existing tools like objdump and the + dynamic loader. All we need is a working assembler, i.e. we + write out a text file which is then passed to gas (the gnu + assembler) to generate the object file. + + The precompiled image is stored in a file next to the original + assembly that is precompiled with the native extension for a shared + library (on Linux its ".so" to the generated file). + + For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so + + To avoid symbol lookup overhead and to save space, some things like the + compiled code of the individual methods are not identified by specific symbols + like method_code_1234. Instead, they are stored in one big array and the + offsets inside this array are stored in another array, requiring just two + symbols. The offsets array is usually named 'FOO_offsets', where FOO is the + array the offsets refer to, like 'methods', and 'method_offsets'. + + Generating code using an assembler and linker has some disadvantages: + - it requires GNU binutils or an equivalent package to be installed on the + machine running the aot compilation. + - it is slow. + + There is some support in the aot compiler for directly emitting elf files, but + its not complete (yet). + + The following things are saved in the object file and can be + looked up using the equivalent to dlsym: + + mono_assembly_guid + + A copy of the assembly GUID. + + mono_aot_version + + The format of the AOT file format. + + mono_aot_opt_flags + + The optimizations flags used to build this + precompiled image. + + method_infos + + Contains additional information needed by the runtime for using the + precompiled method, like the GOT entries it uses. + + method_info_offsets + + Maps method indexes to offsets in the method_infos array. + + mono_icall_table + + A table that lists all the internal calls + references by the precompiled image. + + mono_image_table + + A list of assemblies referenced by this AOT + module. + + methods + + The precompiled code itself. + + method_offsets + + Maps method indexes to offsets in the methods array. + + ex_info + + Contains information about methods which is rarely used during normal execution, + like exception and debug info. + + ex_info_offsets + + Maps method indexes to offsets in the ex_info array. + + class_info + + Contains precomputed metadata used to speed up various runtime functions. + + class_info_offsets + + Maps class indexes to offsets in the class_info array. + + class_name_table + + A hash table mapping class names to class indexes. Used to speed up + mono_class_from_name (). + + plt + + The Program Linkage Table + + plt_info + + Contains information needed to find the method belonging to a given PLT entry. + +** Source file structure +----------------------------- + + The AOT infrastructure is split into two files, aot-compiler.c and + aot-runtime.c. aot-compiler.c contains the AOT compiler which is invoked by + --aot, while aot-runtime.c contains the runtime support needed for loading + code and other things from the aot files. + +** Compilation process +---------------------------- + + AOT compilation consists of the following stages: + - collecting the methods to be compiled. + - compiling them using the JIT. + - emitting the JITted code and other information into an assembly file (.s). + - assembling the file using the system assembler. + - linking the resulting object file into a shared library using the system + linker. + +** Handling compiled code +---------------------------- + + Each method is identified by a method index. For normal methods, this is + equivalent to its index in the METHOD metadata table. For runtime generated + methods (wrappers), it is an arbitrary number. + Compiled code is created by invoking the JIT, requesting it to created AOT + code instead of normal code. This is done by the compile_method () function. + The output of the JIT is compiled code and a set of patches (relocations). Each + relocation specifies an offset inside the compiled code, and a runtime object + whose address is accessed at that offset. + Patches are described by a MonoJumpInfo structure. From the perspective + of the AOT compiler, there are two kinds of patches: + - calls, which require an entry in the PLT table. + - everything else, which require an entry in the GOT table. + How patches is handled is described in the next section. + After all the method are compiled, they are emitted into the output file into + a byte array called 'methods', The emission + is done by the emit_method_code () and emit_and_reloc_code () functions. Each + piece of compiled code is identified by the local symbol .Lm_. + While compiled code is emitted, all the locations which have an associated patch + are rewritten using a platform specific process so the final generated code will + refer to the plt and got entries belonging to the patches. + The compiled code array +can be accessed using the 'methods' global symbol. + +** Handling patches +---------------------------- + + Before a piece of AOTed code can be used, the GOT entries used by it must be + filled out with the addresses of runtime objects. Those objects are identified + by MonoJumpInfo structures. These stuctures are saved in a serialized form in + the AOT file, so the AOT loader can deconstruct them. The serialization is done + by the encode_patch () function, while the deserialization is done by the + decode_patch_info () function. + Every method has an associated method info blob inside the 'method_info' byte + array in the AOT file. This contains all the information required to load the + method at runtime: + - the first got entry used by the method. + - the number of got entries used by the method. + - the serialized patch info for the got entries. + Some patches, like vtables, icalls are very common, so instead of emitting their + info every time they are used by a method, we emit the info only once into a + byte array named 'got_info', and only emit an index into this array for every + access. + +** The Procedure Linkage Table (PLT) +------------------------------------ + + Our PLT is similar to the elf PLT, it is used to handle calls between methods. + If method A needs to call method B, then an entry is allocated in the PLT for + method B, and A calls that entry instead of B directly. This is useful because + in some cases the runtime needs to do some processing the first time B is + called. + There are two cases: + - if B is in another assembly, then it needs to be looked up, then JITted or the + corresponding AOT code needs to be found. + - if B is in the same assembly, but has got slots, then the got slots need to be + initialized. + If none of these cases is true, then the PLT is not used, and the call is made + directly to the native code of the target method. + A PLT entry is usually implemented by a jump though a jump table, where the + jump table entries are initially filled up with the address of a trampoline so + the runtime can get control, and after the native code of the called method is + created/found, the jump table entry is changed to point to the native code. + All PLT entries also embed a integer offset after the jump which indexes into + the 'plt_info' table, which stores the information required to find the called + method. The PLT is emitted by the emit_plt () function. + +** Exception/Debug info +---------------------------- + + Each compiled method has some additional info generated by the JIT, usable + for debugging (IL offset-native offset maps) and exception handling + (saved registers, native offsets of try/catch clauses). Since this info is + rarely needed, it is saved into a separate byte array called 'ex_info'. + +** Cached metadata +--------------------------- + + When the runtime loads a class, it needs to compute a variety of information + which is not readily available in the metadata, like the instance size, + vtable, whenever the class has a finalizer/type initializer etc. Computing this + information requires a lot of time, causes the loading of lots of metadata, + and it usually involves the creation of many runtime data structures + (MonoMethod/MonoMethodSignature etc), which are long living, and usually persist + for the lifetime of the app. To avoid this, we compute the required information + at aot compilation time, and save it into the aot image, into an array called + 'class_info'. The runtime can query this information using the + mono_aot_get_cached_class_info () function, and if the information is available, + it can avoid computing it. + +** Full AOT mode +------------------------- + + Some platforms like the iphone prohibit JITted code, using technical and/or + legal means. This is a significant problem for the mono runtime, since it + generates a lot of code dynamically, using either the JIT or more low-level + code generation macros. To solve this, the AOT compiler is able to function in + full-aot or aot-only mode, where it generates and saves all the neccesary code + in the aot image, so at runtime, no code needs to be generated. + There are two kinds of code which needs to be considered: + - wrapper methods, that is methods whose IL is generated dynamically by the + runtime. They are handled by generating them in the add_wrappers () function, + then emitting them the same way as the 'normal' methods. The only problem is + that these methods do not have a methoddef token, so we need a separate table + in the aot image ('wrapper_info') to find their method index. + - trampolines and other small hand generated pieces of code. They are handled + in an ad-hoc way in the emit_trampolines () function. + +* Performance considerations +---------------------------- + + Using AOT code is a trade-off which might lead to higher or + slower performance, depending on a lot of circumstances. Some + of these are: + + - AOT code needs to be loaded from disk before being used, so + cold startup of an application using AOT code MIGHT be + slower than using JITed code. Warm startup (when the code is + already in the machines cache) should be faster. Also, + JITing code takes time, and the JIT compiler also need to + load additional metadata for the method from the disk, so + startup can be faster even in the cold startup case. + + - AOT code is usually compiled with all optimizations turned + on, while JITted code is usually compiled with default + optimizations, so the generated code in the AOT case should + be faster. + + - JITted code can directly access runtime data structures and + helper functions, while AOT code needs to go through an + indirection (the GOT) to access them, so it will be slower + and somewhat bigger as well. + + - When JITting code, the JIT compiler needs to load a lot of + metadata about methods and types into memory. + + - JITted code has better locality, meaning that if A method + calls B, then the native code for A and B is usually quite + close in memory, leading to better cache behaviour thus + improved performance. In contrast, the native code of + methods inside the AOT file is in a somewhat random order. + +* Future Work +------------- + + - Currently, when an AOT module is loaded, all of its + dependent assemblies are also loaded eagerly, and these + assemblies need to be exactly the same as the ones loaded + when the AOT module was created ('hard binding'). Non-hard + binding should be allowed. + + - On x86, the generated code uses call 0, pop REG, add + GOTOFFSET, REG to materialize the GOT address. Newer + versions of gcc use a separate function to do this, maybe we + need to do the same. + + - Currently, we get vtable addresses from the GOT. Another + solution would be to store the data from the vtables in the + .bss section, so accessing them would involve less + indirection. + -- the main problem is how to patch runtime related addresses, for example: - - current application domain - - string objects loaded with LDSTR - - address of MonoClass data - - static field offsets - - method addreses - - virtual function and interface slots