Fix LinearGradientMode parameter validation to match corefx (#5672)

[mono.git] / docs / jit-thoughts
diff --git a/docs/jit-thoughts b/docs/jit-thoughts

index 4d70b189bcdf1c68315ae22c024322ba8ad7f40a..106469dc13e392ea0765f28dd9b8e20e5fff1cbd 100644 (file)
--- a/docs/jit-thoughts
+++ b/docs/jit-thoughts
@@ -1,18 +1,82 @@
  Just some thoughts for the JITer:
  
-X86 register allocation:
+General issues:
+===============
+
+We are designing a JIT compiler, so we have to consider two things:
+
+- the quality of the generated code
+- the time needed to generate that code
+
+The current approach is to keep the JITer as simple as possible, and thus as
+fast as possible. The generated code quality will suffer from that.
+
+Register Allocation:
+====================
+
+With lcc you can assign a fixed register to a tree before register
+allocation. For example this is needed by call, which return the value always
+in EAX on x86. The current implementation works without such system, due to
+special forest generation.
+
+Different Register Sets:
  ========================
  
-We can use 8bit or 16bit registers on the x86. If we use that feature we have
-more registers to allocate, which maybe prevents some register spills. We
-currently ignore that ability and always allocate 32 bit registers, because I
-think we would gain very little from that optimisation and it would complicate
-the code.
+Most processors have more that one register set, at least one for floating
+point values, and one for integers. Should we support architectures with more
+that two sets? Does someone knows such an architecture?
+
+64bit Integer Values:
+=====================
  
+I can imagine two different implementation. On possibility would be to treat
+long (64bit) values simply like any other value type. This implies that we
+call class methods for ALU operations like add or sub. Sure, this method will
+be be a bit inefficient.
+
+The more performant solution is to allocate two 32bit registers for each 64bit
+value. We add a new non terminal to the monoburg grammar called "lreg". The
+register allocation routines takes care of this non terminal and allocates two
+32 bit registers for them.
  
  Forest generation:
  ==================
  
+Consider the following code: 
+
+OPCODE:                STACK           LOCALS
+LDLOC.0                (5)             [5,0]
+LDC.1          (5,1)           [5,0]
+STLOC.0                (5)             [1,0]
+STLOC.1                ()              [1,5]
+
+A simple forest generation generates: 
+
+STLOC.0(LDC.1)
+STLOC.1(LDLOC.0)
+
+Which is wrong, since it stores the wrong value (1 instead of 5). Instead we
+must generate something like:
+
+STLOC.TMP(LDLOC.0)
+STLOC.0(LDC.1)
+STLOC.1(LDLOC.TMP)
+
+Where STLOC.TMP saves the value into a new temporary variable. 
+
+We also need a similar solution for basic block boundaries when the stack depth
+is not zero. We can simply save those values to new temporary values. Consider
+the following basic block with one instruction:
+
+LDLOC.1 
+This should generate a tree like: 
+
+STLOC.TMP(LDLOC.1) Please notice that an intelligent register allocator can
+still allocate registers for those new variables.
+
+DAG handling:
+=============
+
  Monoburg can't handle DAGs, instead we need real trees as input for
  the code generator. So we have two problems:
  
@@ -41,3 +105,21 @@ STLOC(ADD (LDLOC, LDLOC))
  
  This is what lcc is doing, if I understood 12.8, page 342, 343?
  
+Possible Optimisations:
+=======================
+
+Miguel said ORP does some optimisation on IL level, for example moving array
+bounds checking out of loops:
+
+for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; }
+
+id transformed to:
+
+if (in_range (a, 0, N)) { for (i = 0; i < N; i++) a[i] = X; }  
+else for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; }
+
+The "else" is only to keep original semantics (exception handling).
+
+We need loop detection logic in order to implement this (dominator tree).
+
+AFAIK CACAO also implements this.
+\ No newline at end of file