X-Git-Url: http://wien.tomnetworks.com/gitweb/?a=blobdiff_plain;f=docs%2Fjit-thoughts;h=106469dc13e392ea0765f28dd9b8e20e5fff1cbd;hb=52e98abd6e5c6d10ddea91a529f7b1b2336e0696;hp=5d209f042311707fa274e8e48a5e621e5c44a078;hpb=c93716ea1994eb1140ebca099a25511b6854d60c;p=mono.git diff --git a/docs/jit-thoughts b/docs/jit-thoughts index 5d209f04231..106469dc13e 100644 --- a/docs/jit-thoughts +++ b/docs/jit-thoughts @@ -11,50 +11,6 @@ We are designing a JIT compiler, so we have to consider two things: The current approach is to keep the JITer as simple as possible, and thus as fast as possible. The generated code quality will suffer from that. -We do not map local variables to registers at the moment, and this makes the -whole JIT much easier, for example we do not need to identify basic block -boundaries or the lifetime of local variables, or select the variables which -are worth to put into a register. - -Register allocation is thus done only inside the trees of the forest, and each -tree can use the full set of registers. We simply split a tree if we get out of -registers, for example the following tree: - - - add(R0) - / \ - / \ - a(R0) add(R1) - / \ - / \ - b(R1) add(R2) - / \ - / \ - c(R2) b(R3) - -can be transformed to: - - - stloc(t1) add(R0) - | / \ - | / \ - add(R0) a(R0) add(R1) - / \ / \ - / \ / \ - c(R0) b(R1) b(R1) t1(R2) - - -Please notice that the split trees use less registers than the original -tree. - -Triggering JIT compilation: -=========================== - -The current approach is to call functions indirectly. The address to call is -stored in the MonoMethod structure. For each method we create a trampoline -function. When called, this function does the JIT compilation and replaces the -trampoline with the compiled method address. - Register Allocation: ==================== @@ -63,16 +19,6 @@ allocation. For example this is needed by call, which return the value always in EAX on x86. The current implementation works without such system, due to special forest generation. - -X86 Register Allocation: -======================== - -We can use 8bit or 16bit registers on the x86. If we use that feature we have -more registers to allocate, which maybe prevents some register spills. We -currently ignore that ability and always allocate 32 bit registers, because I -think we would gain very little from that optimisation and it would complicate -the code. - Different Register Sets: ======================== @@ -89,26 +35,44 @@ call class methods for ALU operations like add or sub. Sure, this method will be be a bit inefficient. The more performant solution is to allocate two 32bit registers for each 64bit -value. We add a new non terminal to the monoburg grammar called long_reg. The +value. We add a new non terminal to the monoburg grammar called "lreg". The register allocation routines takes care of this non terminal and allocates two -registers for them. - +32 bit registers for them. Forest generation: ================== -It seems that trees generated from the CIL language have some special -properties, i.e. the trees already represents basic blocks, so there can be no -branches to the inside of such a tree. All results of those trees are stored to -memory. +Consider the following code: + +OPCODE: STACK LOCALS +LDLOC.0 (5) [5,0] +LDC.1 (5,1) [5,0] +STLOC.0 (5) [1,0] +STLOC.1 () [1,5] + +A simple forest generation generates: + +STLOC.0(LDC.1) +STLOC.1(LDLOC.0) + +Which is wrong, since it stores the wrong value (1 instead of 5). Instead we +must generate something like: + +STLOC.TMP(LDLOC.0) +STLOC.0(LDC.1) +STLOC.1(LDLOC.TMP) + +Where STLOC.TMP saves the value into a new temporary variable. -One idea was to drive the code generation directly from the CIL code, without -generating an intermediate forest of trees. I think this is not possible, -because you always have to gather some attributes and attach it to the -instruction (for example the register allocation info). So I thing generating a -tree is the right thing and that also works perfectly with monoburg. IMO we -would not get any benefit from trying to feed monoburg directly with CIL -instructions. +We also need a similar solution for basic block boundaries when the stack depth +is not zero. We can simply save those values to new temporary values. Consider +the following basic block with one instruction: + +LDLOC.1 +This should generate a tree like: + +STLOC.TMP(LDLOC.1) Please notice that an intelligent register allocator can +still allocate registers for those new variables. DAG handling: ============= @@ -141,11 +105,21 @@ STLOC(ADD (LDLOC, LDLOC)) This is what lcc is doing, if I understood 12.8, page 342, 343? -Value Types: -============ +Possible Optimisations: +======================= + +Miguel said ORP does some optimisation on IL level, for example moving array +bounds checking out of loops: + +for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; } + +id transformed to: + +if (in_range (a, 0, N)) { for (i = 0; i < N; i++) a[i] = X; } +else for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; } + +The "else" is only to keep original semantics (exception handling). -The only CLI instructions which can handle value types are loads and stores, -either to local variable, to the stack or to array elements. Value types with a -size smaller than sizeof(int) are handled like any other basic type. For other -value types we load the base address and emit block copies to store them. +We need loop detection logic in order to implement this (dominator tree). +AFAIK CACAO also implements this. \ No newline at end of file