+* How to support Position Independent Code in new Mono Ports
+------------------------------------------------------------
+
+ Generated native code needs to reference various runtime
+ structures/functions whose address is only known at run
+ time. JITted code can simple embed the address into the native
+ code, but AOT code needs to do an indirection. This
+ indirection is done through a table called the Global Offset
+ Table (GOT), which is similar to the GOT table in the Elf
+ spec. When the runtime saves the AOT image, it saves some
+ information for each method describing the GOT table entries
+ used by that method. When loading a method from an AOT image,
+ the runtime will fill out the GOT entries needed by the
+ method.
+
+ * Computing the address of the GOT
+
+ Methods which need to access the GOT first need to compute its
+ address. On the x86 it is done by code like this:
+
+ call <IP + 5>
+ pop ebx
+ add <OFFSET TO GOT>, ebx
+ <save got addr to a register>
+
+ The variable representing the got is stored in
+ cfg->got_var. It is allways allocated to a global register to
+ prevent some problems with branches + basic blocks.
+
+ * Referencing GOT entries
+
+ Any time the native code needs to access some other runtime
+ structure/function (i.e. any time the backend calls
+ mono_add_patch_info ()), the code pointed by the patch needs
+ to load the value from the got. For example, instead of:
+
+ call <ABSOLUTE ADDR>
+ it needs to do:
+ call *<OFFSET>(<GOT REG>)
+
+ Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler.
+
+ For more examples on the changes required, see
+
+ svn diff -r 37739:38213 mini-x86.c
+
+ * The Program Linkage Table
+
+ As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is
+ made to an entry in the Program Linkage Table (PLT). This is based on the fact that on
+ most architectures, call instructions use a displacement instead of an absolute address, so
+ they are already position independent. An PLT entry is usually a jump instruction, which
+ initially points to some trampoline code which transfers control to the AOT loader, which
+ will compile the called method, and patch the PLT entry so that further calls are made
+ directly to the called method.
+ If the called method is in the same assembly, and does not need initialization (i.e. it
+ doesn't have GOT slots etc), then the call is made directly, bypassing the PLT.
+
+* Implementation
+----------------
+
+** The Precompiled File Format
+-----------------------------
+
+ We use the native object format of the platform. That way it
+ is possible to reuse existing tools like objdump and the
+ dynamic loader. All we need is a working assembler, i.e. we
+ write out a text file which is then passed to gas (the gnu
+ assembler) to generate the object file.
+
+ The precompiled image is stored in a file next to the original
+ assembly that is precompiled with the native extension for a shared
+ library (on Linux its ".so" to the generated file).
+
+ For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
+
+ To avoid symbol lookup overhead and to save space, some things like the
+ compiled code of the individual methods are not identified by specific symbols
+ like method_code_1234. Instead, they are stored in one big array and the
+ offsets inside this array are stored in another array, requiring just two
+ symbols. The offsets array is usually named 'FOO_offsets', where FOO is the
+ array the offsets refer to, like 'methods', and 'method_offsets'.
+
+ Generating code using an assembler and linker has some disadvantages:
+ - it requires GNU binutils or an equivalent package to be installed on the
+ machine running the aot compilation.
+ - it is slow.
+
+ There is some support in the aot compiler for directly emitting elf files, but
+ its not complete (yet).
+
+ The following things are saved in the object file and can be
+ looked up using the equivalent to dlsym:
+
+ mono_assembly_guid
+
+ A copy of the assembly GUID.
+
+ mono_aot_version
+
+ The format of the AOT file format.
+
+ mono_aot_opt_flags
+
+ The optimizations flags used to build this
+ precompiled image.
+
+ method_infos
+
+ Contains additional information needed by the runtime for using the
+ precompiled method, like the GOT entries it uses.
+
+ method_info_offsets
+
+ Maps method indexes to offsets in the method_infos array.
+
+ mono_icall_table
+
+ A table that lists all the internal calls
+ references by the precompiled image.
+
+ mono_image_table
+
+ A list of assemblies referenced by this AOT
+ module.
+
+ methods
+
+ The precompiled code itself.
+
+ method_offsets
+
+ Maps method indexes to offsets in the methods array.
+
+ ex_info
+
+ Contains information about methods which is rarely used during normal execution,
+ like exception and debug info.
+
+ ex_info_offsets
+
+ Maps method indexes to offsets in the ex_info array.
+
+ class_info
+
+ Contains precomputed metadata used to speed up various runtime functions.
+
+ class_info_offsets
+
+ Maps class indexes to offsets in the class_info array.
+
+ class_name_table
+
+ A hash table mapping class names to class indexes. Used to speed up
+ mono_class_from_name ().
+
+ plt
+
+ The Program Linkage Table
+
+ plt_info
+
+ Contains information needed to find the method belonging to a given PLT entry.
+
+** Source file structure
+-----------------------------
+
+ The AOT infrastructure is split into two files, aot-compiler.c and
+ aot-runtime.c. aot-compiler.c contains the AOT compiler which is invoked by
+ --aot, while aot-runtime.c contains the runtime support needed for loading
+ code and other things from the aot files.
+
+** Compilation process
+----------------------------
+
+ AOT compilation consists of the following stages:
+ - collecting the methods to be compiled.
+ - compiling them using the JIT.
+ - emitting the JITted code and other information into an assembly file (.s).
+ - assembling the file using the system assembler.
+ - linking the resulting object file into a shared library using the system
+ linker.
+
+** Handling compiled code
+----------------------------
+
+ Each method is identified by a method index. For normal methods, this is
+ equivalent to its index in the METHOD metadata table. For runtime generated
+ methods (wrappers), it is an arbitrary number.
+ Compiled code is created by invoking the JIT, requesting it to created AOT
+ code instead of normal code. This is done by the compile_method () function.
+ The output of the JIT is compiled code and a set of patches (relocations). Each
+ relocation specifies an offset inside the compiled code, and a runtime object
+ whose address is accessed at that offset.
+ Patches are described by a MonoJumpInfo structure. From the perspective
+ of the AOT compiler, there are two kinds of patches:
+ - calls, which require an entry in the PLT table.
+ - everything else, which require an entry in the GOT table.
+ How patches is handled is described in the next section.
+ After all the method are compiled, they are emitted into the output file into
+ a byte array called 'methods', The emission
+ is done by the emit_method_code () and emit_and_reloc_code () functions. Each
+ piece of compiled code is identified by the local symbol .Lm_<method index>.
+ While compiled code is emitted, all the locations which have an associated patch
+ are rewritten using a platform specific process so the final generated code will
+ refer to the plt and got entries belonging to the patches.
+ The compiled code array
+can be accessed using the 'methods' global symbol.
+
+** Handling patches
+----------------------------
+
+ Before a piece of AOTed code can be used, the GOT entries used by it must be
+ filled out with the addresses of runtime objects. Those objects are identified
+ by MonoJumpInfo structures. These stuctures are saved in a serialized form in
+ the AOT file, so the AOT loader can deconstruct them. The serialization is done
+ by the encode_patch () function, while the deserialization is done by the
+ decode_patch_info () function.
+ Every method has an associated method info blob inside the 'method_info' byte
+ array in the AOT file. This contains all the information required to load the
+ method at runtime:
+ - the first got entry used by the method.
+ - the number of got entries used by the method.
+ - the serialized patch info for the got entries.
+ Some patches, like vtables, icalls are very common, so instead of emitting their
+ info every time they are used by a method, we emit the info only once into a
+ byte array named 'got_info', and only emit an index into this array for every
+ access.
+
+** The Procedure Linkage Table (PLT)
+------------------------------------
+
+ Our PLT is similar to the elf PLT, it is used to handle calls between methods.
+ If method A needs to call method B, then an entry is allocated in the PLT for
+ method B, and A calls that entry instead of B directly. This is useful because
+ in some cases the runtime needs to do some processing the first time B is
+ called.
+ There are two cases:
+ - if B is in another assembly, then it needs to be looked up, then JITted or the
+ corresponding AOT code needs to be found.
+ - if B is in the same assembly, but has got slots, then the got slots need to be
+ initialized.
+ If none of these cases is true, then the PLT is not used, and the call is made
+ directly to the native code of the target method.
+ A PLT entry is usually implemented by a jump though a jump table, where the
+ jump table entries are initially filled up with the address of a trampoline so
+ the runtime can get control, and after the native code of the called method is
+ created/found, the jump table entry is changed to point to the native code.
+ All PLT entries also embed a integer offset after the jump which indexes into
+ the 'plt_info' table, which stores the information required to find the called
+ method. The PLT is emitted by the emit_plt () function.
+
+** Exception/Debug info
+----------------------------
+
+ Each compiled method has some additional info generated by the JIT, usable
+ for debugging (IL offset-native offset maps) and exception handling
+ (saved registers, native offsets of try/catch clauses). Since this info is
+ rarely needed, it is saved into a separate byte array called 'ex_info'.
+
+** Cached metadata
+---------------------------
+
+ When the runtime loads a class, it needs to compute a variety of information
+ which is not readily available in the metadata, like the instance size,
+ vtable, whenever the class has a finalizer/type initializer etc. Computing this
+ information requires a lot of time, causes the loading of lots of metadata,
+ and it usually involves the creation of many runtime data structures
+ (MonoMethod/MonoMethodSignature etc), which are long living, and usually persist
+ for the lifetime of the app. To avoid this, we compute the required information
+ at aot compilation time, and save it into the aot image, into an array called
+ 'class_info'. The runtime can query this information using the
+ mono_aot_get_cached_class_info () function, and if the information is available,
+ it can avoid computing it.
+
+** Full AOT mode
+-------------------------
+
+ Some platforms like the iphone prohibit JITted code, using technical and/or
+ legal means. This is a significant problem for the mono runtime, since it
+ generates a lot of code dynamically, using either the JIT or more low-level
+ code generation macros. To solve this, the AOT compiler is able to function in
+ full-aot or aot-only mode, where it generates and saves all the neccesary code
+ in the aot image, so at runtime, no code needs to be generated.
+ There are two kinds of code which needs to be considered:
+ - wrapper methods, that is methods whose IL is generated dynamically by the
+ runtime. They are handled by generating them in the add_wrappers () function,
+ then emitting them the same way as the 'normal' methods. The only problem is
+ that these methods do not have a methoddef token, so we need a separate table
+ in the aot image ('wrapper_info') to find their method index.
+ - trampolines and other small hand generated pieces of code. They are handled
+ in an ad-hoc way in the emit_trampolines () function.
+
+* Performance considerations
+----------------------------
+
+ Using AOT code is a trade-off which might lead to higher or
+ slower performance, depending on a lot of circumstances. Some
+ of these are:
+
+ - AOT code needs to be loaded from disk before being used, so
+ cold startup of an application using AOT code MIGHT be
+ slower than using JITed code. Warm startup (when the code is
+ already in the machines cache) should be faster. Also,
+ JITing code takes time, and the JIT compiler also need to
+ load additional metadata for the method from the disk, so
+ startup can be faster even in the cold startup case.
+
+ - AOT code is usually compiled with all optimizations turned
+ on, while JITted code is usually compiled with default
+ optimizations, so the generated code in the AOT case should
+ be faster.
+
+ - JITted code can directly access runtime data structures and
+ helper functions, while AOT code needs to go through an
+ indirection (the GOT) to access them, so it will be slower
+ and somewhat bigger as well.
+
+ - When JITting code, the JIT compiler needs to load a lot of
+ metadata about methods and types into memory.
+
+ - JITted code has better locality, meaning that if A method
+ calls B, then the native code for A and B is usually quite
+ close in memory, leading to better cache behaviour thus
+ improved performance. In contrast, the native code of
+ methods inside the AOT file is in a somewhat random order.
+
+* Future Work
+-------------
+
+ - Currently, when an AOT module is loaded, all of its
+ dependent assemblies are also loaded eagerly, and these
+ assemblies need to be exactly the same as the ones loaded
+ when the AOT module was created ('hard binding'). Non-hard
+ binding should be allowed.
+
+ - On x86, the generated code uses call 0, pop REG, add
+ GOTOFFSET, REG to materialize the GOT address. Newer
+ versions of gcc use a separate function to do this, maybe we
+ need to do the same.
+
+ - Currently, we get vtable addresses from the GOT. Another
+ solution would be to store the data from the vtables in the
+ .bss section, so accessing them would involve less
+ indirection.
+