X-Git-Url: http://wien.tomnetworks.com/gitweb/?a=blobdiff_plain;f=docs%2Fmini-doc.txt;h=ba19c1031f112256c83295ebe53f2ecf6dfb8680;hb=1e3585a07b795e05a4b7f83c8ea383c1884878f4;hp=cfe1a91d365b1ccb52ee51376966bd7df55910fc;hpb=5a710fa941e8d9c64a6228fe173cf50976eed05e;p=mono.git diff --git a/docs/mini-doc.txt b/docs/mini-doc.txt index cfe1a91d365..ba19c1031f1 100644 --- a/docs/mini-doc.txt +++ b/docs/mini-doc.txt @@ -2,19 +2,36 @@ A new JIT compiler for the Mono Project Miguel de Icaza (miguel@{ximian.com,gnome.org}), + Paolo Molaro (lupus@{ximian.com,debian.org}) + + This documents overall design of the Mono JIT up to version + 2.0. After Mono 2.0 the JIT engine was upgraded from + a tree-based intermediate representation to a linear + intermediate representation. + + The Linear IL is documented here: + + http://www.mono-project.com/Linear_IL - * Abstract Mini is a new compilation engine for the Mono runtime. The new engine is designed to bring new code generation - optimizations, portability and precompilation. + optimizations, portability and pre-compilation. In this document we describe the design decisions and the architecture of the new compilation engine. * Introduction + Mono is a Open Source implementation of the .NET Framework: it + is made up of a runtime engine that implements the ECMA Common + Language Infrastructure (CLI), a set of compilers that target + the CLI and a large collection of class libraries. + + This article discusses the new code generation facilities that + have been added to the Mono runtime. + First we discuss the overall architecture of the Mono runtime, and how code generation fits into it; Then we discuss the development and basic architecture of our first JIT compiler @@ -61,6 +78,24 @@ CIL code on the fly to marshal parameters, and then this code is in turned processed by the JIT engine. + Mono has now gone through three different JIT engines, these + are: + + * Original JIT engine: 2002, hard to port, hard to + implement optimizations. + + * Second JIT engine, used up until Mono 2.0, very + portable, many new optimizations. + + * Third JIT engine, replaced the code generation layer from + being based on a tree representation to be based on a linear + representation. + + For more information on the code generation changes, see our + web site for the details on the Linear IL: + + http://www.mono-project.com/Linear_IL + * Previous Experiences Mono has built a JIT engine, which has been used to bootstrap @@ -119,7 +154,7 @@ inferred. At this point the JIT would pass the constructed forest of - trees to the architecture-dependant JIT compiler. + trees to the architecture-dependent JIT compiler. The architecture dependent code then performed register allocation (optionally using linear scan allocation for @@ -153,7 +188,7 @@ application domains, or generate code that will not be shared across application domains. -* Objectives of the new JIT engine. +* Second Generation JIT engine. We wanted to support a number of features that were missing: @@ -197,7 +232,7 @@ necessary that the JIT engine works in most of today's commercial hardware platforms. -* Features of the new JIT engine. +* Features of the Second JIT engine. The new JIT engine was architected by Dietmar Maurer and Paolo Molaro, based on the new objectives. @@ -366,13 +401,14 @@ variables to registers. The linear scan pass uses the information that was previously gathered by the loop nesting and loop structure computation to favor variables in inner - loops. + loops. This process updates the basic block `nesting' field + which is later used during liveness analysis. Stack space is then reserved for the local variables and any temporary variables generated during the various optimizations. -** Instruction selection +** Instruction selection: Only used up until Mono 2.0 At this point, the BURS instruction selector is invoked to transform the tree-based representation into a list of @@ -480,7 +516,9 @@ operations if possible. The rest of the code generation is fairly simple: a switch - statement is used to generate code for each of the MonoInsts + statement is used to generate code for each of the MonoInsts, + in the mono/mini/mini-ARCH.c files, the method is called + "mono_arch_output_basic_block". We always try to allocate code in sequence, instead of just using malloc. This way we increase spatial locality which gives a massive @@ -496,7 +534,7 @@ The difference is on the set of optimizations that are turned on for each mode: Just-in-Time compilation should be as fast as possible, while Ahead-of-Time compilation can take as long - as required, because this is not done at a time criticial + as required, because this is not done at a time critical time. With AOT compilation, we can afford to turn all of the @@ -509,7 +547,7 @@ assembler, which generates a loadable module. At execution time, when an assembly is loaded from the disk, - the runtime engine will probe for the existance of a + the runtime engine will probe for the existence of a pre-compiled image. If the pre-compiled image exists, then it is loaded, and the method invocations are resolved to the code contained in the loaded module. @@ -525,18 +563,48 @@ * SSA-based optimizations - SSA form simplifies many optimization because each variable has exactly - one definition site. All uses of a variable are "dominated" by its - definition, which enables us to implement algorithm like: + SSA form simplifies many optimization because each variable + has exactly one definition site. This means that each + variable is only initialized once. - * conditional constant propagation + For example, code like this: - * array bound check removal + a = 1 + .. + a = 2 + call (a) - * dead code elimination + Is internally turned into: - And we can implement those algorithm in a efficient way using SSA. + a1 = 1 + .. + a2 = 2 + call (a2) + In the presence of branches, like: + + if (x) + a = 1 + else + a = 2 + + call (a) + + The code is turned into: + + if (x) + a1 = 1; + else + a2 = 2; + a3 = phi (a1, a2) + call (a3) + + All uses of a variable are "dominated" by its definition + + This representation is useful as it simplifies the + implementation of a number of optimizations like conditional + constant propagation, array bounds check removal and dead code + elimination. * Register allocation. @@ -554,7 +622,7 @@ 2) liveness information for the variables - 3) (optionally) loop info to favour variables that are used in + 3) (optionally) loop info to favor variables that are used in inner loops. During instruction selection phase, symbolic registers are @@ -567,20 +635,7 @@ registers, fixed registers and clobbered registers by each operation. - ----------- -* Bootstrap - - The Mini bootstrap parses the arguments passed on the command - line, and initializes the JIT runtime. Each time the - mini_init() routine is invoked, a new Application Domain will - be returned. - -* Signal handlers - - mono_runtime_install_handlers - -* BURG Code Generator Generator +* BURG Code Generator Generator: Only used up to Mono 2.0 monoburg was written by Dietmar Maurer. It is based on the papers from Christopher W. Fraser, Robert R. Henry and Todd @@ -594,6 +649,120 @@ JIT. This simplifies the code because we can directly pass DAGs and don't need to convert them to trees. +* Adding IL opcodes: an excercise (from a post by Paolo Molaro) + + mini.c is the file that read the IL code stream and decides + how any single IL instruction is implemented + (mono_method_to_ir () func), so you always have to add an + entry to the big switch inside the function: there are plenty + of examples in that file. + + An IL opcode can be implemented in a number of ways, depending + on what it does and how it needs to do it. + + Some opcodes are implemented using a helper function: one of + the simpler examples is the CEE_STELEM_REF implementation. + + In this case the opcode implementation is written in a C + function. You will need to register the function with the jit + before you can use it (mono_register_jit_call) and you need to + emit the call to the helper using the mono_emit_jit_icall() + function. + + This is the simpler way to add a new opcode and it doesn't + require any arch-specific change (though it's limited to what + you can do in C code and the performance may be limited by the + function call). + + Other opcodes can be implemented with one or more of the already + implemented low-level instructions. + + An example is the OP_STRLEN opcode which implements + String.Length using a simple load from memory. In this case + you need to add a rule to the appropriate burg file, + describing what are the arguments of the opcode and what is, + if any, it's 'return' value. + + The OP_STRLEN case is: + + reg: OP_STRLEN (reg) { + MONO_EMIT_LOAD_MEMBASE_OP (s, tree, OP_LOADI4_MEMBASE, state->reg1, + state->left->reg1, G_STRUCT_OFFSET (MonoString, length)); + } + + The above means: the OP_STRLEN takes a register as an argument + and returns its value in a register. And the implementation + of this is included in the braces. + + The opcode returns a value in an integer register + (state->reg1) by performing a int32 load of the length field + of the MonoString represented by the input register + (state->left->reg1): before the burg rules are applied, the + internal representation is based on trees, so you get the + left/right pointers (state->left and state->right + respectively, the result is stored in state->reg1). + + This instruction implementation doesn't require arch-specific + changes (it is using the MONO_EMIT_LOAD_MEMBASE_OP which is + available on all platforms), and usually the produced code is + fast. + + Next we have opcodes that must be implemented with new low-level + architecture specific instructions (either because of performance + considerations or because the functionality can't get implemented in + other ways). + + You also need a burg rule in this case, too. For example, + consider the OP_CHECK_THIS opcode (used to raise an exception + if the this pointer is null). The burg rule simply reads: + + stmt: OP_CHECK_THIS (reg) { + mono_bblock_add_inst (s->cbb, tree); + } + + Note that this opcode does not return a value (hence the + "stmt") and it takes a register as input. + + mono_bblock_add_inst (s->cbb, tree) just adds the instruction + (the tree variable) to the current basic block (s->cbb). In + mini this is the place where the internal representation + switches from the tree format to the low-level format (the + list of simple instructions). + + In this case the actual opcode implementation is delegated to + the arch-specific code. A low-level opcode needs an entry in + the machine description (the *.md files in mini/). This entry + describes what kind of registers are used if any by the + instruction, as well as other details such as constraints or + other hints to the low-level engine which are architecture + specific. + + cpu-pentium.md, for example has the following entry: + + checkthis: src1:b len:3 + + This means the instruction uses an integer register as a base + pointer (basically a load or store is done on it) and it takes + 3 bytes of native code to implement it. + + Now you just need to provide the low-level implementation for + the opcode in one of the mini-$arch.c files, in the + mono_arch_output_basic_block() function. There is a big switch + here too. The x86 implementation is: + + case OP_CHECK_THIS: + /* ensure ins->sreg1 is not NULL */ + x86_alu_membase_imm (code, X86_CMP, ins->sreg1, 0, 0); + break; + + If the $arch-codegen.h header file doesn't have the code to + emit the low-level native code, you'll need to write that as + well. + + Complex opcodes with register constraints may require other + changes to the local register allocator, but usually they are + not needed. + * Future Profile-based optimization is something that we are very @@ -625,4 +794,4 @@ processors, and some of the framework exists today in our register allocator and the instruction selector to cope with this, but has not been finished. The instruction selection - would happen at the same time as local register allocation. \ No newline at end of file + would happen at the same time as local register allocation. <