doc/design_onstack_replacement.txt

   1 ONSTACK REPLACEMENT IN CACAO
   2 ============================
   3 Author: Edwin Steiner
   4
   5 ** This is a design sketch! **
   6
   7
   8 Overview
   9 --------
  10
  11 On the whole, method replacement works like this:
  12
  13   * a method foo is compiled into a replaceable version fooV1.  fooV1
  14   has a set of "replacement points" and for each replacement point it
  15   has a description (either a data structure or generated code) how to
  16   map the "execution state" at this point to the "source state".
  17
  18   when the compiler decides to replace fooV2, it does the
  19   following:
  20
  21   * a new version fooV2 of the method is compiled. Together with fooV2
  22   the compiler creates a "replacement mapping" that maps each
  23   replacement point of fooV1 to a point in fooV2 and specifies how the
  24   source state has to be mapped to the execution state at this point.
  25
  26   * fooV2 replaces fooV1 in the vftbl. From now on all threads that
  27   enter foo will got to fooV2.
  28
  29   IF (per-replacement-point replacement-out code is needed)
  30       * If not already done, the compiler creates a chunk of
  31       "replacement-out code" for each replacement point of fooV1. If
  32       there is pre-generated replacement-out code it may have fields
  33       that are set now.
  34   ENDIF
  35
  36   * each replacement point in fooV1 is patched with a jump (or bsr)
  37     to the replacement-out code.
  38
  39   * eventually threads will reach replacement points in fooV1 and be
  40   "beamed" to fooV2.
  41
  42   * fooV1 has to be kept around undefinitely because it is not possible
  43   (not feasible?) to determine if there is a thread that may reach fooV1
  44   code in the future.
  45
  46
  47 Replacement Points
  48 ------------------
  49
  50 Replacement points must be placed in a way such that there is a _static_
  51 upper bound to the number of instructions that a thread must execute
  52 _inside the method body_ until it reaches a replacement point.
  53
  54 NOTE that this does _not_ place an upper bound on the _time_ it takes to
  55 reach a replacement point, because execution can enter a blocking
  56 instruction or a method call at any point and take a possibly unlimited
  57 amount of time to complete that.
  58
  59 CACAO will place a replacement point at method entry and at each target
  60 of a backward branch.
  61
  62 XXX Is the replacement point at the method entry necessary?
  63 Replacement-at-entry is performed by the vftbl anyway.
  64
  65
  66 Execution State
  67 ---------------
  68
  69 The execution state that has to be transformed during
  70 onstack replacement consists of:
  71
  72    1) the program counter
  73
  74    2) live CPU registers
  75       XXX caution: BSR return values!!
  76
  77    3a) live local variables and spill slots
  78       on the stack
  79       XXX caution: BSR return values!!
  80
  81    3b) the locked object of the method if it is synchronized
  82
  83    3c) the return address to the caller and the
  84       link to the previous activation record
  85
  86    3d) saved callee-save registers
  87
  88
  89 ad 1) The program counter is in 1-to-1 correspondance to the replacement
  90 point reached.
  91
  92 ad 2) the register allocator knows which registers are live at the
  93 replacement point. This information can be encoded either in a special
  94 data structure or be transformed into generated replacement-out code
  95 that is created at the same time the method is compiled.
  96
  97 ad 3*) These data are similarily organized for all replacement points in
  98 a method so we probably want to treat these with common code in order to
  99 save space.
 100
 101
 102 Source State
 103 ------------
 104
 105 The source state is a virtual state comprising:
 106
 107    1) the Java Bytecode program counter
 108
 109    2) the values on the Java stack
 110
 111    3) the values in Java local variables
 112
 113    4) synchronization state and locked object of the method
 114
 115    5) the frames (activation records) of Java methods
 116       on the VM stack
 117
 118 The source state is "virtual" in the sense that there does not exists a
 119 real data structure containing these data. However, the source state
 120 must be reconstructable from the execution state at each replacement
 121 point.
 122
 123 It does not matter if the source state values are actually constructed
 124 or if there is a direct translation (execution state V1) -> (execution
 125 state V2) that uses the source state as a conceptual link
 126 between these states.
 127
 128
 129 One-Step vs. Multi-Step Replacement
 130 -----------------------------------
 131
 132 Conceptually the actions from reaching the replacement point in fooV1
 133 to continuing execution in fooV2 are the following:
 134
 135    1) read the execution state of the fooV1 activation
 136
 137    2) remove the activation record corresponding to this invocation
 138       of fooV1
 139
 140    3) build an activation record for the (faked)
 141       invocation of fooV2.
 142
 143    4) write the execution state of the fooV2 activation
 144
 145    5) jump to the fooV2 code
 146
 147
 148 It would be possible to perform all these actions with a single
 149 big chunk of generated code for each replacement point. I think,
 150 however, that a clean separation of the phases has many
 151 advantages.
 152
 153 Phases:
 154
 155         1) done by generated replacement-out code (possibly
 156            a replacement-point specific part and a method
 157            specific part) or by a general replacement-out
 158            function written in assembler.
 159
 160         2-3) implemented in C. Some parts are architecture
 161            specific, however, as they depend on the structure
 162            of the stack frames.
 163
 164         4-5) done by generated replacement-in code or a
 165            general replacement-in function written in assembler.
 166
 167 This separation minimizes platform specific assembler code
 168 and opens possibilities for debugging/logging that will
 169 probably be invaluable when implementing this stuff.
 170
 171
 172 Replacement-Out Code
 173 --------------------
 174
 175 OPTION ROC-1: per-replacement-point code
 176
 177 Replacement-out code could look like this:
 178
 179   replacement_point_jumps_here:
 180         sub   %esp,SIZE_OF_STRUCT
 181         mov   REPLACEMENT_POINT_INDEX,ofsindex(%esp)
 182         mov   %regX,ofsX(%esp)
 183         mov   %regY,ofsX(%esp)
 184         jmp   method_specific_code
 185
 186    /* code for other replacement points */
 187
 188    method_specific_code:
 189         mov   %esp,%eax
 190         add   %eax,OFFSET_OF_STACKFRAME_FROM_CURRENT_ESP
 191         mov   %eax,ofsframe(%esp)
 192         /* other method specific copying ... */
 193         push  %esp /* arg0 */
 194         call  method_replace(execution_state *es)
 195
 196 Note that the current method can be found in the
 197 stack frame, so we do not have to explicitely pass that
 198 info. Information on the new code (fooV2) will be
 199 in the methodinfo.
 200
 201 OPTION ROC-2: general replacement-out function
 202
 203 This option uses a single platform-specific function for
 204 all replacement points. This requires that the replacement
 205 points are patched with a _bsr_ instead of a jump, so the
 206 PC of the replacement point can be found without special
 207 per-replacement-point code.
 208
 209 per-replacement-point data is stored in a hash indexed
 210 by the PC of the replacement point.
 211
 212    PC --indexto--> rplpointinfo
 213                       \--ptrto-> codeinfo
 214                                     \--ptrto-> methodinfo
 215
 216 The replacement point PC is a safe index because old code
 217 will never be freed anyway.
 218
 219 Splitting methodinfo and codeinfo
 220 ---------------------------------
 221
 222 Recompilation will require that cacao deal with
 223 a method (repr. by a `methodinfo`) that has several
 224 realizations in JIT code (`codeinfo` or something
 225 like that). There will always be a current codeinfo
 226 and zero or more old codeinfos that have to be
 227 kept around. (For example we may have to do stack
 228 traces from pcs in old code.)
 229
 230
 231 vim: tw=72
 232