X-Git-Url: http://wien.tomnetworks.com/gitweb/?a=blobdiff_plain;f=docs%2Fmini-doc.txt;h=ba19c1031f112256c83295ebe53f2ecf6dfb8680;hb=1e3585a07b795e05a4b7f83c8ea383c1884878f4;hp=cfe1a91d365b1ccb52ee51376966bd7df55910fc;hpb=5a710fa941e8d9c64a6228fe173cf50976eed05e;p=mono.git

diff --git a/docs/mini-doc.txt b/docs/mini-doc.txt
index cfe1a91d365..ba19c1031f1 100644
--- a/docs/mini-doc.txt
+++ b/docs/mini-doc.txt
@@ -2,19 +2,36 @@
 	       A new JIT compiler for the Mono Project
 
 	   Miguel de Icaza (miguel@{ximian.com,gnome.org}),
+	   Paolo Molaro (lupus@{ximian.com,debian.org})
+
+	This documents overall design of the Mono JIT up to version
+	2.0.   After Mono 2.0 the JIT engine was upgraded from
+	a tree-based intermediate representation to a linear
+	intermediate representation.
+
+	The Linear IL is documented here:
+
+	    http://www.mono-project.com/Linear_IL
 
-  
 * Abstract
 
 	Mini is a new compilation engine for the Mono runtime.  The
 	new engine is designed to bring new code generation
-	optimizations, portability and precompilation. 
+	optimizations, portability and pre-compilation. 
 
 	In this document we describe the design decisions and the
 	architecture of the new compilation engine. 
 
 * Introduction
 
+	Mono is a Open Source implementation of the .NET Framework: it
+	is made up of a runtime engine that implements the ECMA Common
+	Language Infrastructure (CLI), a set of compilers that target
+	the CLI and a large collection of class libraries.
+
+	This article discusses the new code generation facilities that
+	have been added to the Mono runtime.  
+
 	First we discuss the overall architecture of the Mono runtime,
 	and how code generation fits into it; Then we discuss the
 	development and basic architecture of our first JIT compiler
@@ -61,6 +78,24 @@
 	CIL code on the fly to marshal parameters, and then this
 	code is in turned processed by the JIT engine.
 
+	Mono has now gone through three different JIT engines, these
+	are:
+
+	* Original JIT engine: 2002, hard to port, hard to
+	  implement optimizations.
+
+	* Second JIT engine, used up until Mono 2.0, very
+          portable, many new optimizations.
+
+	* Third JIT engine, replaced the code generation layer from
+	  being based on a tree representation to be based on a linear
+	  representation.
+
+        For more information on the code generation changes, see our
+	web site for the details on the Linear IL:
+
+	    http://www.mono-project.com/Linear_IL
+
 * Previous Experiences
 
 	Mono has built a JIT engine, which has been used to bootstrap
@@ -119,7 +154,7 @@
         inferred. 
 
 	At this point the JIT would pass the constructed forest of
-	trees to the architecture-dependant JIT compiler.  
+	trees to the architecture-dependent JIT compiler.  
 
 	The architecture dependent code then performed register
 	allocation (optionally using linear scan allocation for
@@ -153,7 +188,7 @@
 	application domains, or generate code that will not be shared
 	across application domains.
 
-* Objectives of the new JIT engine.
+* Second Generation JIT engine.
 
 	We wanted to support a number of features that were missing:
 
@@ -197,7 +232,7 @@
 	     necessary that the JIT engine works in most of today's
 	     commercial hardware platforms. 
 
-* Features of the new JIT engine.
+* Features of the Second JIT engine.
 
 	The new JIT engine was architected by Dietmar Maurer and Paolo
 	Molaro, based on the new objectives.
@@ -366,13 +401,14 @@
 	variables to registers.  The linear scan pass uses the
 	information that was previously gathered by the loop nesting
 	and loop structure computation to favor variables in inner
-	loops. 
+	loops.   This process updates the basic block `nesting' field
+	which is later used during liveness analysis.
 
 	Stack space is then reserved for the local variables and any
 	temporary variables generated during the various
 	optimizations.
 
-** Instruction selection 
+** Instruction selection: Only used up until Mono 2.0
 
 	At this point, the BURS instruction selector is invoked to
 	transform the tree-based representation into a list of
@@ -480,7 +516,9 @@
 	operations if possible.
 
 	The rest of the code generation is fairly simple: a switch
-	statement is used to generate code for each of the MonoInsts
+	statement is used to generate code for each of the MonoInsts,
+	in the mono/mini/mini-ARCH.c files, the method is called
+	"mono_arch_output_basic_block".
 
 	We always try to allocate code in sequence, instead of just using
 	malloc. This way we increase spatial locality which gives a massive
@@ -496,7 +534,7 @@
 	The difference is on the set of optimizations that are turned
 	on for each mode: Just-in-Time compilation should be as fast
 	as possible, while Ahead-of-Time compilation can take as long
-	as required, because this is not done at a time criticial
+	as required, because this is not done at a time critical
 	time. 
 
 	With AOT compilation, we can afford to turn all of the
@@ -509,7 +547,7 @@
 	assembler, which generates a loadable module.
 
 	At execution time, when an assembly is loaded from the disk,
-	the runtime engine will probe for the existance of a
+	the runtime engine will probe for the existence of a
 	pre-compiled image.  If the pre-compiled image exists, then it
 	is loaded, and the method invocations are resolved to the code
 	contained in the loaded module.
@@ -525,18 +563,48 @@
 
 * SSA-based optimizations
 
-	SSA form simplifies many optimization because each variable has exactly
-	one definition site. All uses of a variable are "dominated" by its
-	definition, which enables us to implement algorithm like:
+	SSA form simplifies many optimization because each variable
+	has exactly one definition site.  This means that each
+	variable is only initialized once.  
 
-		* conditional constant propagation
+	For example, code like this:
 
-		* array bound check removal
+	    a = 1
+	    ..
+	    a = 2
+	    call (a)
 
-		* dead code elimination
+	Is internally turned into:
 
-	And we can implement those algorithm in a efficient way using SSA. 
+	    a1 = 1
+	    ..
+	    a2 = 2
+	    call (a2)
 
+	In the presence of branches, like:
+
+	    if (x)
+	         a = 1
+	    else
+		 a = 2
+
+            call (a)
+
+	The code is turned into:
+
+	    if (x)
+	         a1 = 1;
+	    else
+	         a2 = 2;
+	    a3 = phi (a1, a2)
+	    call (a3)
+
+	All uses of a variable are "dominated" by its definition
+
+	This representation is useful as it simplifies the
+	implementation of a number of optimizations like conditional
+	constant propagation, array bounds check removal and dead code
+	elimination. 
 
 * Register allocation.
 
@@ -554,7 +622,7 @@
 
         2) liveness information for the variables
 
-        3) (optionally) loop info to favour variables that are used in
+        3) (optionally) loop info to favor variables that are used in
         inner loops.
 
 	During instruction selection phase, symbolic registers are
@@ -567,20 +635,7 @@
 	registers, fixed registers and clobbered registers by each
 	operation.
 
-
-----------
-* Bootstrap 
-
-	The Mini bootstrap parses the arguments passed on the command
-	line, and initializes the JIT runtime. Each time the
-	mini_init() routine is invoked, a new Application Domain will
-	be returned.
-
-* Signal handlers
-
-	mono_runtime_install_handlers
-
-* BURG Code Generator Generator
+* BURG Code Generator Generator: Only used up to Mono 2.0
 
        monoburg was written by Dietmar Maurer. It is based on the
        papers from Christopher W. Fraser, Robert R. Henry and Todd
@@ -594,6 +649,120 @@
        JIT. This simplifies the code because we can directly pass DAGs and
        don't need to convert them to trees.
 
+* Adding IL opcodes: an excercise (from a post by Paolo Molaro)
+
+	mini.c is the file that read the IL code stream and decides
+	how any single IL instruction is implemented
+	(mono_method_to_ir () func), so you always have to add an
+	entry to the big switch inside the function: there are plenty
+	of examples in that file.
+
+	An IL opcode can be implemented in a number of ways, depending
+	on what it does and how it needs to do it.
+	
+	Some opcodes are implemented using a helper function: one of
+	the simpler examples is the CEE_STELEM_REF implementation.
+
+	In this case the opcode implementation is written in a C
+	function.  You will need to register the function with the jit
+	before you can use it (mono_register_jit_call) and you need to
+	emit the call to the helper using the mono_emit_jit_icall()
+	function.  
+
+	This is the simpler way to add a new opcode and it doesn't
+	require any arch-specific change (though it's limited to what
+	you can do in C code and the performance may be limited by the
+	function call).
+	
+	Other opcodes can be implemented with one or more of the already
+	implemented low-level instructions. 
+
+	An example is the OP_STRLEN opcode which implements
+	String.Length using a simple load from memory.  In this case
+	you need to add a rule to the appropriate burg file,
+	describing what are the arguments of the opcode and what is,
+	if any, it's 'return' value.
+
+	The OP_STRLEN case is:
+	
+	reg: OP_STRLEN (reg) {  
+		MONO_EMIT_LOAD_MEMBASE_OP (s, tree, OP_LOADI4_MEMBASE, state->reg1, 
+			state->left->reg1, G_STRUCT_OFFSET (MonoString, length));
+	}
+
+	The above means: the OP_STRLEN takes a register as an argument
+	and returns its value in a register.  And the implementation
+	of this is included in the braces.
+	
+	The opcode returns a value in an integer register
+	(state->reg1) by performing a int32 load of the length field
+	of the MonoString represented by the input register
+	(state->left->reg1): before the burg rules are applied, the
+	internal representation is based on trees, so you get the
+	left/right pointers (state->left and state->right
+	respectively, the result is stored in state->reg1).
+
+	This instruction implementation doesn't require arch-specific
+	changes (it is using the MONO_EMIT_LOAD_MEMBASE_OP which is
+	available on all platforms), and usually the produced code is
+	fast.
+	
+	Next we have opcodes that must be implemented with new low-level
+	architecture specific instructions (either because of performance
+	considerations or because the functionality can't get implemented in
+	other ways).  
+
+	You also need a burg rule in this case, too. For example,
+	consider the OP_CHECK_THIS opcode (used to raise an exception
+	if the this pointer is null). The burg rule simply reads:
+	
+	stmt: OP_CHECK_THIS (reg) {
+		mono_bblock_add_inst (s->cbb, tree);
+	}
+	
+	Note that this opcode does not return a value (hence the
+	"stmt") and it takes a register as input.
+
+	mono_bblock_add_inst (s->cbb, tree) just adds the instruction
+	(the tree variable) to the current basic block (s->cbb). In
+	mini this is the place where the internal representation
+	switches from the tree format to the low-level format (the
+	list of simple instructions).
+
+	In this case the actual opcode implementation is delegated to
+	the arch-specific code.  A low-level opcode needs an entry in
+	the machine description (the *.md files in mini/). This entry
+	describes what kind of registers are used if any by the
+	instruction, as well as other details such as constraints or
+	other hints to the low-level engine which are architecture
+	specific.  
+
+	cpu-pentium.md, for example has the following entry:
+	
+	checkthis: src1:b len:3
+	
+	This means the instruction uses an integer register as a base
+	pointer (basically a load or store is done on it) and it takes
+	3 bytes of native code to implement it.
+
+	Now you just need to provide the low-level implementation for
+	the opcode in one of the mini-$arch.c files, in the
+	mono_arch_output_basic_block() function. There is a big switch
+	here too. The x86 implementation is:
+
+		case OP_CHECK_THIS:
+			/* ensure ins->sreg1 is not NULL */
+			x86_alu_membase_imm (code, X86_CMP, ins->sreg1, 0, 0);
+			break;
+	
+	If the $arch-codegen.h header file doesn't have the code to
+	emit the low-level native code, you'll need to write that as
+	well.  
+
+	Complex opcodes with register constraints may require other
+	changes to the local register allocator, but usually they are
+	not needed.
+		
 * Future
 
         Profile-based optimization is something that we are very
@@ -625,4 +794,4 @@
 	processors, and some of the framework exists today in our
 	register allocator and the instruction selector to cope with
 	this, but has not been finished.  The instruction selection
-	would happen at the same time as local register allocation. 
\ No newline at end of file
+	would happen at the same time as local register allocation. <