X-Git-Url: http://wien.tomnetworks.com/gitweb/?a=blobdiff_plain;f=docs%2Faot-compiler.txt;h=3d77c0a11cadb0f71c4569e7b4159b5935f81c88;hb=122601004c427fa2da2c0878403100c303b9d7bc;hp=22a868099cbf003664849f79367db2a81fbac3cf;hpb=e004ed5c5a87d6701afbdf1ce435020d6b834ec3;p=mono.git

diff --git a/docs/aot-compiler.txt b/docs/aot-compiler.txt
index 22a868099cb..3d77c0a11ca 100644
--- a/docs/aot-compiler.txt
+++ b/docs/aot-compiler.txt
@@ -14,6 +14,19 @@ Mono Ahead Of Time Compiler
 	assembly, while the -O=all flag instructs Mono to use all the
 	available optimizations.
 
+* Caching metadata
+------------------
+
+	Besides code, the AOT file also contains cached metadata information which allows
+	the runtime to avoid certain computations at runtime, like the computation of
+	generic vtables. This reduces both startup time, and memory usage. It is possible
+	to create an AOT image which contains only this cached information and no code by
+	using the 'metadata-only' option during compilation:
+
+	   mono --aot=metadata-only assembly.exe
+
+	This works even on platforms where AOT is not normally supported.
+
 * Position Independent Code
 ---------------------------
 
@@ -76,7 +89,22 @@ Mono Ahead Of Time Compiler
 	
 	svn diff -r 37739:38213 mini-x86.c 
 
-* The Precompiled File Format
+	* The Program Linkage Table
+
+	As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is
+	made to an entry in the Program Linkage Table (PLT). This is based on the fact that on
+	most architectures, call instructions use a displacement instead of an absolute address, so
+	they are already position independent. An PLT entry is usually a jump instruction, which
+	initially points to some trampoline code which transfers control to the AOT loader, which
+	will compile the called method, and patch the PLT entry so that further calls are made
+	directly to the called method.
+	If the called method is in the same assembly, and does not need initialization (i.e. it
+    doesn't have GOT slots etc), then the call is made directly, bypassing the PLT.
+
+* Implementation
+----------------
+
+** The Precompiled File Format
 -----------------------------
 	
 	We use the native object format of the platform. That way it
@@ -90,6 +118,21 @@ Mono Ahead Of Time Compiler
 	library (on Linux its ".so" to the generated file). 
 
 	For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
+
+	To avoid symbol lookup overhead	and to save space, some things like the 
+	compiled code of the individual methods are not identified by specific symbols
+    like method_code_1234. Instead, they are stored in one big array and the
+	offsets inside this array are stored in another array, requiring just two
+	symbols. The offsets array is usually named 'FOO_offsets', where FOO is the
+	array the offsets refer to, like 'methods', and 'method_offsets'.
+
+	Generating code using an assembler and linker has some disadvantages:
+	- it requires GNU binutils or an equivalent package to be installed on the
+	  machine running the aot compilation.
+	- it is slow.
+
+	There is some support in the aot compiler for directly emitting elf files, but
+	its not complete (yet).
 	
 	The following things are saved in the object file and can be
 	looked up using the equivalent to dlsym:
@@ -125,49 +168,226 @@ Mono Ahead Of Time Compiler
 	
 			A list of assemblies referenced by this AOT
 			module.
-	
+
+		methods
+			
+			The precompiled code itself.
+			
 		method_offsets
 	
-			The equivalent to a procedure linkage table. 
-	
+			Maps method indexes to offsets in the methods array.
+
+		ex_info
+
+			Contains information about methods which is rarely used during normal execution, 
+			like exception and debug info.
+
+		ex_info_offsets
+
+			Maps method indexes to offsets in the ex_info array.
+
+		class_info
+
+			Contains precomputed metadata used to speed up various runtime functions.
+
+		class_info_offsets
+
+			Maps class indexes to offsets in the class_info array.
+
+		class_name_table
+
+			A hash table mapping class names to class indexes. Used to speed up 
+			mono_class_from_name ().
+
+		plt
+
+			The Program Linkage Table
+
+		plt_info
+
+			Contains information needed to find the method belonging to a given PLT entry.
+
+** Source file structure
+-----------------------------
+
+	The AOT infrastructure is split into two files, aot-compiler.c and 
+	aot-runtime.c. aot-compiler.c contains the AOT compiler which is invoked by
+	--aot, while aot-runtime.c contains the runtime support needed for loading
+	code and other things from the aot files.
+
+** Compilation process
+----------------------------
+
+	AOT compilation consists of the following stages:
+	- collecting the methods to be compiled.
+	- compiling them using the JIT.
+	- emitting the JITted code and other information into an assembly file (.s).
+	- assembling the file using the system assembler.
+	- linking the resulting object file into a shared library using the system
+	  linker.
+
+** Handling compiled code
+----------------------------
+
+	  Each method is identified by a method index. For normal methods, this is
+	equivalent to its index in the METHOD metadata table. For runtime generated
+	methods (wrappers), it is an arbitrary number.
+	  Compiled code is created by invoking the JIT, requesting it to created AOT
+	code instead of normal code. This is done by the compile_method () function.
+	The output of the JIT is compiled code and a set of patches (relocations). Each 
+	relocation specifies an offset inside the compiled code, and a runtime object 
+	whose address is accessed at that offset.
+	Patches are described by a MonoJumpInfo structure. From the perspective
+	of the AOT compiler, there are two kinds of patches:
+	- calls, which require an entry in the PLT table.
+	- everything else, which require an entry in the GOT table.
+	How patches is handled is described in the next section.
+	  After all the method are compiled, they are emitted into the output file into
+	  a byte array called 'methods', The emission
+	is done by the emit_method_code () and emit_and_reloc_code () functions. Each
+	piece of compiled code is identified by the local symbol .Lm_<method index>. 
+	While compiled code is emitted, all the locations which have an associated patch
+	are rewritten using a platform specific process so the final generated code will
+	refer to the plt and got entries belonging to the patches.
+	The compiled code array 
+can be accessed using the 'methods' global symbol. 
+
+** Handling patches
+----------------------------
+
+	  Before a piece of AOTed code can be used, the GOT entries used by it must be
+	filled out with the addresses of runtime objects. Those objects are identified
+	by MonoJumpInfo structures. These stuctures are saved in a serialized form in
+	the AOT file, so the AOT loader can deconstruct them. The serialization is done
+	by the encode_patch () function, while the deserialization is done by the
+	decode_patch_info () function.
+	Every method has an associated method info blob inside the 'method_info' byte
+	array in the AOT file. This contains all the information required to load the
+	method at runtime:
+	- the first got entry used by the method.
+	- the number of got entries used by the method.
+	- the serialized patch info for the got entries.
+	Some patches, like vtables, icalls are very common, so instead of emitting their
+	info every time they are used by a method, we emit the info only once into a 
+	byte array named 'got_info', and only emit an index into this array for every
+	access.
+
+** The Procedure Linkage Table (PLT)
+------------------------------------
+
+	Our PLT is similar to the elf PLT, it is used to handle calls between methods.
+	If method A needs to call method B, then an entry is allocated in the PLT for
+	method B, and A calls that entry instead of B directly. This is useful because
+	in some cases the runtime needs to do some processing the first time B is 
+	called.
+	There are two cases:
+	- if B is in another assembly, then it needs to be looked up, then JITted or the
+	corresponding AOT code needs to be found.
+	- if B is in the same assembly, but has got slots, then the got slots need to be
+	initialized.
+	If none of these cases is true, then the PLT is not used, and the call is made
+	directly to the native code of the target method.
+	A PLT entry is usually implemented by a jump though a jump table, where the
+	jump table entries are initially filled up with the address of a trampoline so
+	the runtime can get control, and after the native code of the called method is
+	created/found, the jump table entry is changed to point to the native code. 
+	All PLT entries also embed a integer offset after the jump which indexes into
+	the 'plt_info' table, which stores the information required to find the called
+	method. The PLT is emitted by the emit_plt () function.
+
+** Exception/Debug info
+----------------------------
+
+	Each compiled method has some additional info generated by the JIT, usable 
+	for debugging (IL offset-native offset maps) and exception handling 
+	(saved registers, native offsets of try/catch clauses). Since this info is
+	rarely needed, it is saved into a separate byte array called 'ex_info'.
+
+** Cached metadata
+---------------------------
+
+	When the runtime loads a class, it needs to compute a variety of information
+	which is not readily available in the metadata, like the instance size,
+	vtable, whenever the class has a finalizer/type initializer etc. Computing this
+	information requires a lot of time, causes the loading of lots of metadata,
+	and it usually involves the creation of many runtime data structures 
+	(MonoMethod/MonoMethodSignature etc), which are long living, and usually persist
+	for the lifetime of the app. To avoid this, we compute the required information
+	at aot compilation time, and save it into the aot image, into an array called
+	'class_info'. The runtime can query this information using the 
+	mono_aot_get_cached_class_info () function, and if the information is available,
+	it can avoid computing it.
+
+** Full AOT mode
+-------------------------
+
+	Some platforms like the iphone prohibit JITted code, using technical and/or
+	legal means. This is a significant problem for the mono runtime, since it 
+	generates a lot of code dynamically, using either the JIT or more low-level
+	code generation macros. To solve this, the AOT compiler is able to function in
+	full-aot or aot-only mode, where it generates and saves all the neccesary code
+	in the aot image, so at runtime, no code needs to be generated.
+	There are two kinds of code which needs to be considered:
+	- wrapper methods, that is methods whose IL is generated dynamically by the
+	  runtime. They are handled by generating them in the add_wrappers () function,
+	  then emitting them the same way as the 'normal' methods. The only problem is
+	  that these methods do not have a methoddef token, so we need a separate table
+	  in the aot image ('wrapper_info') to find their method index.
+	- trampolines and other small hand generated pieces of code. They are handled
+	  in an ad-hoc way in the emit_trampolines () function.
+
 * Performance considerations
 ----------------------------
 
-Using AOT code is a trade-off which might lead to higher or slower performance,
-depending on a lot of circumstances. Some of these are:
-
-- AOT code needs to be loaded from disk before being used, so cold startup of
-  an application using AOT code MIGHT be slower than using JITed code. Warm
-  startup (when the code is already in the machines cache) should be faster.
-  Also, JITing code takes time, and the JIT compiler also need to load 
-  additional metadata for the method from the disk, so startup can be faster
-  even in the cold startup case.
-- AOT code is usually compiled with all optimizations turned on, while JITted
-  code is usually compiled with default optimizations, so the generated code
-  in the AOT case should be faster.
-- JITted code can directly access runtime data structures and helper functions,
-  while AOT code needs to go through an indirection (the GOT) to access them,
-  so it will be slower and somewhat bigger as well.
-- When JITting code, the JIT compiler needs to load a lot of metadata about
-  methods and types into memory.
-- JITted code has better locality, meaning that if A method calls B, then
-  the native code for A and B is usually quite close in memory, leading to
-  better cache behaviour thus improved performance. In contrast, the native
-  code of methods inside the AOT file is in a somewhat random order.
+	Using AOT code is a trade-off which might lead to higher or
+	slower performance, depending on a lot of circumstances. Some
+	of these are:
+	
+	- AOT code needs to be loaded from disk before being used, so
+	  cold startup of an application using AOT code MIGHT be
+	  slower than using JITed code. Warm startup (when the code is
+	  already in the machines cache) should be faster.  Also,
+	  JITing code takes time, and the JIT compiler also need to
+	  load additional metadata for the method from the disk, so
+	  startup can be faster even in the cold startup case.
+
+	- AOT code is usually compiled with all optimizations turned
+	  on, while JITted code is usually compiled with default
+	  optimizations, so the generated code in the AOT case should
+	  be faster.
+
+	- JITted code can directly access runtime data structures and
+	  helper functions, while AOT code needs to go through an
+	  indirection (the GOT) to access them, so it will be slower
+	  and somewhat bigger as well.
+
+	- When JITting code, the JIT compiler needs to load a lot of
+	  metadata about methods and types into memory.
 
+	- JITted code has better locality, meaning that if A method
+	  calls B, then the native code for A and B is usually quite
+	  close in memory, leading to better cache behaviour thus
+	  improved performance. In contrast, the native code of
+	  methods inside the AOT file is in a somewhat random order.
+	
 * Future Work
 -------------
 
-- Currently, the runtime needs to setup some data structures and fill out
-  GOT entries before a method is first called. This means that even calls to
-  a method whose code is in the same AOT image need to go through the GOT,
-  instead of using a direct call.
-- On x86, the generated code uses call 0, pop REG, add GOTOFFSET, REG to 
-  materialize the GOT address. Newer versions of gcc use a separate function
-  to do this, maybe we need to do the same.
-- Currently, we get vtable addresses from the GOT. Another solution would be
-  to store the data from the vtables in the .bss section, so accessing them
-  would involve less indirection.
+	- Currently, when an AOT module is loaded, all of its
+	  dependent assemblies are also loaded eagerly, and these
+	  assemblies need to be exactly the same as the ones loaded
+	  when the AOT module was created ('hard binding'). Non-hard
+	  binding should be allowed.
+
+	- On x86, the generated code uses call 0, pop REG, add
+	  GOTOFFSET, REG to materialize the GOT address. Newer
+	  versions of gcc use a separate function to do this, maybe we
+	  need to do the same.
 
+	- Currently, we get vtable addresses from the GOT. Another
+	  solution would be to store the data from the vtables in the
+	  .bss section, so accessing them would involve less
+	  indirection.
+