docs/abc-removal.txt

   1
   2 Here "abc" stays for "array bounds check", or "array bound checks", or
   3 some combination of the two...
   4
   5 ------------------------------------------------------------------------------
   6 USAGE
   7
   8 Simply use the "abcrem" optimization invoking mono.
   9
  10 To see if bound checks are actually removed, use "mono -v" and grep for
  11 "ARRAY-ACCESS" in the output, there should be a message for each check
  12 that has been removed.
  13
  14 To trace the algorithm execution, use "-v -v -v", and be prepared to be
  15 totally submersed by debugging messages...
  16
  17 ------------------------------------------------------------------------------
  18 EFFECTIVENESS
  19
  20 The abc removal code can now always remove bound checks from "clean"
  21 array scans in loops, and generally anyway there are clear conditions that
  22 already state that the index is "safe".
  23
  24 To be clearer, and give an idea of what the algorithm can and cannot do
  25 without describing it in detail... keep in mind that only "redunant" checks
  26 cannot be removed. By "redundant", I mean "already explicitly checked" in
  27 the method code.
  28
  29 Unfortunately, analyzing complex expressions is not so easy (see below for
  30 details), so the capabilities of this "abc remover" are limited.
  31
  32 These are simple guidelines:
  33 - Only expressions of the following kinds are handled:
  34   - constant
  35   - variable [+/- constant]
  36 - Only comparisons between "handled" expressions are understood.
  37 - "switch" statements are not yet handled.
  38
  39 This means that code like this will be handled well:
  40
  41 for (int i = 0; i < a.Length; i++) {
  42         a [i] = .....
  43 }
  44
  45 The "i" variable could be declared out of the "for", the "for" could be a
  46 "while", and maybe even implemented with "goto", the array could be scanned
  47 in reverse order, and everything would still work.
  48 I could have a temporary variable storing the array length, and check on
  49 it inside the loop, and the abc removal would still occurr, like this:
  50
  51 int i = 0;
  52 int l = a.Length;
  53 while ( i < l ) {
  54         a [i] ......
  55 }
  56
  57 or this:
  58
  59 int l = a.Length;
  60 for (int i = l; i > 0; i--) {
  61         a [i] = .....
  62 }
  63
  64 The following two examples would work:
  65
  66 for (int i = 0; i < (a.Length -1); i++) .....
  67 for (int i = 0; i < a.Length; i += 2) .....
  68
  69 But just something like this
  70
  71 int delta = -1;
  72 for (int i = 0; i < (a.Length + delta); i++) .....
  73
  74 or like this
  75
  76 int delta = +2;
  77 for (int i = 0; i < a.Length; i += delta) .....
  78
  79 would not work, the check would stay there, sorry :-(
  80 (unless a combination of cfold, consprop and copyprop is used, too, which
  81 would make the constant value of "delta" explicit).
  82
  83 Just to make you understand how things are tricky... this would work!
  84
  85 int limit = a.Length - 1;
  86 for (int i = 0; i < limit; i++) {
  87         a [i] = .....
  88 }
  89
  90
  91 A detailed explanation of the reason why things are done like this is given
  92 below...
  93
  94 ------------------------------------------------------------------------------
  95 THE ALGORITHM
  96
  97 This array bound check removal (abc removal) algorithm is based on
  98 symbolic execution concepts. I handle the removal like an "unreachable
  99 code elimination" (and in fact the optimization could be extended to remove
 100 also other unreachable sections of code, due to branches that "always go the
 101 same way").
 102
 103 In symbolic execution, variables do not have actual (numeric) values, but
 104 instead symbolic expressions (like "a", or "x+y").
 105 Also, branch conditions are handled like symbolic conditions (like "i<k"),
 106 which state relations between variable values.
 107
 108 The SSA representation inside mini is somewhat close to a symbolic
 109 representation of the execution of the compiled method.
 110 Particularly, the representation of variable values is exactly a symbolic
 111 one. It is enough to find all CEE_STIND_* instructions which store to a
 112 local variable, and their second argument is exactly the variable value.
 113 Actually, "cfg->vars [<variable-index>]->def" should contain exactly
 114 those store instructions, and the "get_variable_value_from_ssa_store"
 115 function extracts the variable value from there.
 116
 117 On the other hand, the conditions under which each basic block is
 118 executed are not made fully explicit.
 119
 120 However, it is not difficult to make them so.
 121 Each BB that has more than one exit BB, in practice must end either with
 122 a conditional branch instruction or with a switch instruction.
 123 In the first case, the BB has exactly two exit BBs, and their execution
 124 conditions are easy to get from the condition of the branch (see the
 125 "get_relation_from_branch_instruction" function, and expecially the end of
 126 "analyze_block" in abcremoval.c.
 127 If there is a switch, the jump condition of every exit BB is the equality
 128 of the switch argument with the particular index associated with its case
 129 (but the current implementation does not handle switch statements yet).
 130
 131 These individual conditions are in practice associated to each arc that
 132 connects BBs in the CFG (with the simple case that unconditional branches
 133 have a "TRUE" condition, because they always happen).
 134
 135 So, for each BB, its *proper* entry condition is the union of all the
 136 conditions associated to arcs that enter the BB. The "union" is like a
 137 logical "or", in the sense that either of the condition could be true,
 138 they are not necessarily all true. This means that if I can enter a BB
 139 in two ways, and in one case I know that "x>0", and in the other that
 140 "x==0", actually in the BB I know that "x>=0", which is a weaker
 141 condition (the union of the two).
 142
 143 Also, the *complete* entry condition for a BB is the "intersection" of all
 144 the entry conditions of its dominators. This is true because each dominator
 145 is the only way to reach the BB, so the entry condition of each dominator
 146 must be true if the control flow reached the BB. This translates to the
 147 logical "and" of all the "proper" conditions of the BBs met walking up in the
 148 dominator tree. So, if one says "x>0", and another "x==1", then I know
 149 that "x==1", which is a stronger condition (the intersection of the two).
 150 Note that, if the two conditions were "x>0" and "x==0", then the block would
 151 be unreachable (the intersection is empty), because some branch is impossible.
 152
 153 Another observation is that, inside each BB, every variable is subject to the
 154 complete entry condition of that very same BB, and not the one in which it is
 155 defined (with the "complete entry condition" being the thing I defined before,
 156 sorry if these terms "proper" and "complete" are strange, I found nothing
 157 better).
 158 This happens because the branch conditions are related to the control flow.
 159 I can define "i=a", and if I am in a BB where "a>0", then "i>0", but not
 160 otherwise.
 161
 162 So, intuitively, if the available conditions say "i>=0", and i is used as an
 163 index in an array access, then the lower bound check can be omitted.
 164 If the condition also says "(i>=0)&&(i<array.length)", the abc removal can
 165 occur.
 166
 167 So, a complete solution to the problem of abc removal would be the following:
 168 for each array access, build a system of equations containing:
 169 [1] all the symbolic variable definitions
 170 [2] the complete entry condition of the BB in which the array access occurs
 171 [3] the two "goal functions" ("index >=0" and "index < array.length")
 172 If the system is valid for *each possible* variable value, then the goal
 173 functions are always true, and the abc can be removed.
 174
 175 All this discussion is useful to give a precise specification to the problem
 176 we are trying to solve.
 177 The trouble is that, in the general case, the resulting system of equations
 178 is like a predicate in first order logic, which is semi-decidable, and its
 179 general solution is anyway too complex to be attempted in a JIT compiler
 180 (which should not contain a full fledged theorem prover).
 181
 182 Therefore, we must cut some corner.
 183
 184
 185 By the way, there is also another big problem, which is caused by "recursive"
 186 symbolic definitions. These definition can (and generally do) happen every
 187 time there is a loop. For instance, in the following piece of code
 188
 189 for ( int i = 0; i < array.length; i++ ) {
 190         Console.WriteLine( "array [i] = " + array [i] );
 191 }
 192
 193 one of the definitions of i is a PHI that can be either 0 or "i + 1".
 194
 195 Now, we know that mathematically "i = i + 1" does not make sense, and in
 196 fact symbolic values are not "equations", they are "symbolic definitions".
 197
 198 The actual symbolic value of i is a generic "n", where "n" is the number of
 199 iterations of the loop, but this is terrible to handle (and in more complex
 200 examples the symbolic value of i simply cannot be written, because i is
 201 calculated in an iterative way).
 202
 203 However, the definition "i = i + 1" tells us something about i: it tells us
 204 that i "grows". So (from the PHI definition) we know that i is either 0, or
 205 "grows". This is enough to tell that "i>=0", which is what we want!
 206 It is important to note that recursive definitions can only occurr inside
 207 PHI definitions, because actually a variable cannot be defined *only* in terms
 208 of itself!
 209
 210
 211 At this point, I can explain which corners I want to cut to make the
 212 problem solvable. It will not remove all the abc that could theoretically
 213 be removed, but at least it will work.
 214
 215 The easiest way to cut corners is to only handle expressions which are
 216 "reasonably simple", and ignore the rest.
 217 Keep in mind that ignoring an expression is not harmful in itself.
 218 The algorithm will be simply "less powerful", because it will ignore
 219 conditions that could have caused to the removal of an abc, but will
 220 not remove checks "by mistake" (so the resulting code will be in any case
 221 correct).
 222
 223 The expressions we handle are the following (all of integer type):
 224 - constant
 225 - variable
 226 - variable + constant
 227 - constant + variable
 228 - variable - constant
 229
 230 And, of course, PHI definitions.
 231 Any other expression causes the introduction of an "any" value in the
 232 evaluation, which makes all values that depend from it unknown as well.
 233
 234 We will call these kind of definitions "summarizable" definitions.
 235
 236 In a first attempt, we can consider only branch conditions that have the
 237 simplest possible form (the comparison of two summarizable expressions).
 238
 239 We can also simplify the effect of variable definitions, keeping only
 240 what is relevant to know: their value range with respect to zero and with
 241 respect to the length of the array we are currently handling.
 242
 243 One particular note on PHI functions: they work (obviously) like the logical
 244 "or" of their definitions, and therefore are equivalent to the "logical or"
 245 of the summarization of their definitions.
 246
 247 About recursive definitions (which, believe me, are the worst thing in all
 248 this mess), we handle only "monotonic" ones. That is, we try to understand
 249 if the recursive definition (which, as we said above, must happen because
 250 of a loop) always "grows" or "gets smaller". In all other cases, we decide
 251 we cannot handle it.
 252
 253
 254 One critical thing, once we have defined all these data structures, is how
 255 the evaluation is actually performed.
 256
 257 In a first attempt I coded a "brute force" approach, which for each BB
 258 tried to examine all possible conditions between all variables, filling
 259 a sort of "evaluation matrix". The problem was that the complexity of this
 260 evaluation was quadratic (or worse) on the number of variables, and that
 261 many wariables were examined even if they were not involved in any array
 262 access.
 263
 264 Following the ABCD paper (http://citeseer.ist.psu.edu/bodik00abcd.html),
 265 I rewrote the algorithm in a more "sparse" way.
 266 Now, the main data structure is a graph of relations between variables, and
 267 each attempt to remove a check performs a traversal of the graph, looking
 268 for a path from the index to the array length that satisfies the properties
 269 "index >= 0" and "index < length". If such a path is found, the check is
 270 removed. It is true that in theory *each* traversal has a complexity which
 271 is exponential on the number of variables, but in practice the graph is not
 272 very connected, so the traversal terminates quickly.
 273
 274
 275 Then, the algorithm to optimize one method looks like this:
 276
 277 [1] Preparation:
 278     [1a] Build the SSA representation.
 279     [1b] Prepare the evaluation graph (empty)
 280     [1b] Summarize each varible definition, and put the resulting relations
 281          in the evaluation graph
 282 [2] Analyze each BB, starting from the entry point and following the
 283     dominator tree:
 284     [2a] Summarize its entry condition, and put the resulting relations
 285          in the evaluation graph (this is the reason why the BBs are examined
 286          following the dominator tree, so that conditions are added to the
 287          graph in a "cumulative" way)
 288     [2b] Scan the BB instructions, and for each array access perform step [3]
 289     [2c] Process children BBs following the dominator tree (step [2])
 290     [2d] Remove from the evaluation area the conditions added at step [2a]
 291          (so that backtracking along the tree the area is properly cleared)
 292 [3] Attempt the removal:
 293     [3a] Summarize the index expression, to see if we can handle it; there
 294          are three cases: the index is either a constant, or a variable (with
 295          an optional delta) or cannot be handled (is a "any")
 296     [3b] If the index can be handled, traverse the evaluation area searching
 297          a path from the index variable to the array length (if the index is
 298          a constant, just examine the array length to see if it has some
 299          relation with this constant)
 300     [3c] Use the results of step [3b] to decide if the check is redundant
 301
 302