docs/abc-removal.txt

   1
   2                 Arrays Bounds Check Elimination (ABC)
   3                          in the Mono Runtime
   4
   5                Massimiliano Mantione (mass@ximian.com)
   6
   7 Here "abc" stays for "array bounds check", or "array bound checks", or
   8 some combination of the two.
   9
  10 * Usage
  11
  12         Simply use the "abcrem" optimization invoking mono.
  13
  14         To see if bound checks are actually removed, use "mono -v" and
  15         grep for "ARRAY-ACCESS" in the output, there should be a
  16         message for each check that has been removed.
  17
  18         To trace the algorithm execution, use "-v -v -v", and be
  19         prepared to be totally submersed by debugging messages...
  20
  21 * Effectiveness
  22
  23         The abc removal code can now always remove bound checks from
  24         "clean" array scans in loops, and generally anyway there are
  25         clear conditions that already state that the index is "safe".
  26
  27         To be clearer, and give an idea of what the algorithm can and
  28         cannot do without describing it in detail... keep in mind that
  29         only "redundant" checks cannot be removed. By "redundant", I
  30         mean "already explicitly checked" in the method code.
  31
  32         Unfortunately, analyzing complex expressions is not so easy
  33         (see below for details), so the capabilities of this "abc
  34         remover" are limited.
  35
  36         These are simple guidelines:
  37
  38         - Only expressions of the following kinds are handled:
  39           - constant
  40           - variable [+/- constant]
  41         - Only comparisons between "handled" expressions are understood.
  42         - "switch" statements are not yet handled.
  43
  44         This means that code like this will be handled well:
  45
  46                 for (int i = 0; i < a.Length; i++) {
  47                         a [i] = .....
  48                 }
  49
  50         The "i" variable could be declared out of the "for", the "for"
  51         could be a "while", and maybe even implemented with "goto",
  52         the array could be scanned in reverse order, and everything
  53         would still work.
  54
  55         I could have a temporary variable storing the array length,
  56         and check on it inside the loop, and the abc removal would
  57         still occurr, like this:
  58
  59                 int i = 0;
  60                 int l = a.Length;
  61                 while ( i < l ) {
  62                         a [i] ......
  63                 }
  64
  65         or this:
  66
  67                 int l = a.Length;
  68                 for (int i = l; i > 0; i--) {
  69                         a [i] = .....
  70                 }
  71
  72         The following two examples would work:
  73
  74                 for (int i = 0; i < (a.Length -1); i++) .....
  75                 for (int i = 0; i < a.Length; i += 2) .....
  76
  77         But just something like this
  78
  79                 int delta = -1;
  80                 for (int i = 0; i < (a.Length + delta); i++) .....
  81
  82         or like this
  83
  84                 int delta = +2;
  85                 for (int i = 0; i < a.Length; i += delta) .....
  86
  87         would not work, the check would stay there. (unless
  88         a combination of cfold, consprop and copyprop is used, too,
  89         which would make the constant value of "delta" explicit).
  90
  91         Just to make you understand how things are tricky... this would work!
  92
  93                 int limit = a.Length - 1;
  94                 for (int i = 0; i < limit; i++) {
  95                         a [i] = .....
  96                 }
  97
  98         A detailed explanation of the reason why things are done like
  99         this is given below.
 100
 101 * The Algorithm
 102
 103         This array bound check removal (abc removal) algorithm is
 104         based on symbolic execution concepts. I handle the removal
 105         like an "unreachable code elimination" (and in fact the
 106         optimization could be extended to remove also other
 107         unreachable sections of code, due to branches that "always go
 108         the same way").
 109
 110         In symbolic execution, variables do not have actual (numeric)
 111         values, but instead symbolic expressions (like "a", or "x+y").
 112         Also, branch conditions are handled like symbolic conditions
 113         (like "i<k"), which state relations between variable values.
 114
 115         The SSA representation inside mini is somewhat close to a
 116         symbolic representation of the execution of the compiled
 117         method.
 118
 119         Particularly, the representation of variable values is exactly
 120         a symbolic one. It is enough to find all CEE_STIND_*
 121         instructions which store to a local variable, and their second
 122         argument is exactly the variable value.  Actually, "cfg->vars
 123         [<variable-index>]->def" should contain exactly those store
 124         instructions, and the "get_variable_value_from_ssa_store"
 125         function extracts the variable value from there.
 126
 127         On the other hand, the conditions under which each basic block
 128         is executed are not made fully explicit.
 129
 130         However, it is not difficult to make them so.
 131
 132         Each BB that has more than one exit BB, in practice must end
 133         either with a conditional branch instruction or with a switch
 134         instruction.
 135
 136         In the first case, the BB has exactly two exit BBs, and their
 137         execution conditions are easy to get from the condition of the
 138         branch (see the "get_relation_from_branch_instruction"
 139         function, and especially the end of "analyze_block" in
 140         abcremoval.c.
 141
 142         If there is a switch, the jump condition of every exit BB is
 143         the equality of the switch argument with the particular index
 144         associated with its case (but the current implementation does
 145         not handle switch statements yet).
 146
 147         These individual conditions are in practice associated to each
 148         arc that connects BBs in the CFG (with the simple case that
 149         unconditional branches have a "TRUE" condition, because they
 150         always happen).
 151
 152         So, for each BB, its *proper* entry condition is the union of
 153         all the conditions associated to arcs that enter the BB. The
 154         "union" is like a logical "or", in the sense that either of
 155         the condition could be true, they are not necessarily all
 156         true. This means that if I can enter a BB in two ways, and in
 157         one case I know that "x>0", and in the other that "x==0",
 158         actually in the BB I know that "x>=0", which is a weaker
 159         condition (the union of the two).
 160
 161         Also, the *complete* entry condition for a BB is the
 162         "intersection" of all the entry conditions of its
 163         dominators. This is true because each dominator is the only
 164         way to reach the BB, so the entry condition of each dominator
 165         must be true if the control flow reached the BB. This
 166         translates to the logical "and" of all the "proper" conditions
 167         of the BBs met walking up in the dominator tree. So, if one
 168         says "x>0", and another "x==1", then I know that "x==1", which
 169         is a stronger condition (the intersection of the two).
 170
 171         Note that, if the two conditions were "x>0" and "x==0", then
 172         the block would be unreachable (the intersection is empty),
 173         because some branch is impossible.
 174
 175         Another observation is that, inside each BB, every variable is
 176         subject to the complete entry condition of that very same BB,
 177         and not the one in which it is defined (with the "complete
 178         entry condition" being the thing I defined before, sorry if
 179         these terms "proper" and "complete" are strange, I found
 180         nothing better).
 181
 182         This happens because the branch conditions are related to the
 183         control flow.  I can define "i=a", and if I am in a BB where
 184         "a>0", then "i>0", but not otherwise.
 185
 186         So, intuitively, if the available conditions say "i>=0", and i
 187         is used as an index in an array access, then the lower bound
 188         check can be omitted.  If the condition also says
 189         "(i>=0)&&(i<array.length)", the abc removal can occur.
 190
 191         So, a complete solution to the problem of abc removal would be
 192         the following: for each array access, build a system of
 193         equations containing:
 194
 195                 [1] all the symbolic variable definitions
 196
 197                 [2] the complete entry condition of the BB in which
 198                 the array access occurs
 199
 200                 [3] the two "goal functions" ("index >=0" and "index <
 201                 array.length")
 202
 203         If the system is valid for *each possible* variable value, then the goal
 204         functions are always true, and the abc can be removed.
 205
 206         All this discussion is useful to give a precise specification
 207         to the problem we are trying to solve.
 208
 209         The trouble is that, in the general case, the resulting system
 210         of equations is like a predicate in first order logic, which
 211         is semi-decidable, and its general solution is anyway too
 212         complex to be attempted in a JIT compiler (which should not
 213         contain a full fledged theorem prover).
 214
 215         Therefore, we must cut some corner.
 216
 217         There is also another big problem, which is caused by
 218         "recursive" symbolic definitions. These definition can (and
 219         generally do) happen every time there is a loop. For instance,
 220         in the following piece of code:
 221
 222                 for ( int i = 0; i < array.length; i++ ) {
 223                         Console.WriteLine( "array [i] = " + array [i] );
 224                 }
 225
 226         one of the definitions of i is a PHI that can be either 0 or
 227         "i + 1".
 228
 229         Now, we know that mathematically "i = i + 1" does not make
 230         sense, and in fact symbolic values are not "equations", they
 231         are "symbolic definitions".
 232
 233         The actual symbolic value of i is a generic "n", where "n" is
 234         the number of iterations of the loop, but this is terrible to
 235         handle (and in more complex examples the symbolic value of i
 236         simply cannot be written, because i is calculated in an
 237         iterative way).
 238
 239         However, the definition "i = i + 1" tells us something about
 240         i: it tells us that i "grows". So (from the PHI definition) we
 241         know that i is either 0, or "grows". This is enough to tell
 242         that "i>=0", which is what we want!  It is important to note
 243         that recursive definitions can only occurr inside PHI
 244         definitions, because actually a variable cannot be defined
 245         *only* in terms of itself!
 246
 247         At this point, I can explain which corners I want to cut to
 248         make the problem solvable. It will not remove all the abc that
 249         could theoretically be removed, but at least it will work.
 250
 251         The easiest way to cut corners is to only handle expressions
 252         which are "reasonably simple", and ignore the rest.
 253
 254         Keep in mind that ignoring an expression is not harmful in
 255         itself.  The algorithm will be simply "less powerful", because
 256         it will ignore conditions that could have caused to the
 257         removal of an abc, but will not remove checks "by mistake" (so
 258         the resulting code will be in any case correct).
 259
 260         The expressions we handle are the following (all of integer
 261         type):
 262
 263                 - constant
 264                 - variable
 265                 - variable + constant
 266                 - constant + variable
 267                 - variable - constant
 268
 269         And, of course, PHI definitions.
 270
 271         Any other expression causes the introduction of an "any" value
 272         in the evaluation, which makes all values that depend from it
 273         unknown as well.
 274
 275         We will call these kind of definitions "summarizable"
 276         definitions.
 277
 278         In a first attempt, we can consider only branch conditions
 279         that have the simplest possible form (the comparison of two
 280         summarizable expressions).
 281
 282         We can also simplify the effect of variable definitions,
 283         keeping only what is relevant to know: their value range with
 284         respect to zero and with respect to the length of the array we
 285         are currently handling.
 286
 287         One particular note on PHI functions: they work (obviously)
 288         like the logical "or" of their definitions, and therefore are
 289         equivalent to the "logical or" of the summarization of their
 290         definitions.
 291
 292         About recursive definitions (which, believe me, are the worst
 293         thing in all this mess), we handle only "monotonic" ones. That
 294         is, we try to understand if the recursive definition (which,
 295         as we said above, must happen because of a loop) always
 296         "grows" or "gets smaller". In all other cases, we decide we
 297         cannot handle it.
 298
 299         One critical thing, once we have defined all these data
 300         structures, is how the evaluation is actually performed.
 301
 302         In a first attempt I coded a "brute force" approach, which for
 303         each BB tried to examine all possible conditions between all
 304         variables, filling a sort of "evaluation matrix". The problem
 305         was that the complexity of this evaluation was quadratic (or
 306         worse) on the number of variables, and that many variables
 307         were examined even if they were not involved in any array
 308         access.
 309
 310         Following the ABCD paper:
 311
 312                   http://citeseer.ist.psu.edu/bodik00abcd.html
 313
 314         I rewrote the algorithm in a more "sparse" way.
 315
 316         Now, the main data structure is a graph of relations between
 317         variables, and each attempt to remove a check performs a
 318         traversal of the graph, looking for a path from the index to
 319         the array length that satisfies the properties "index >= 0"
 320         and "index < length". If such a path is found, the check is
 321         removed. It is true that in theory *each* traversal has a
 322         complexity which is exponential on the number of variables,
 323         but in practice the graph is not very connected, so the
 324         traversal terminates quickly.
 325
 326
 327         Then, the algorithm to optimize one method looks like this:
 328
 329                 [1] Preparation:
 330
 331                     [1a] Build the SSA representation.
 332
 333                     [1b] Prepare the evaluation graph (empty)
 334
 335                     [1b] Summarize each variable definition, and put
 336                          the resulting relations in the evaluation
 337                          graph
 338
 339                 [2] Analyze each BB, starting from the entry point and
 340                     following the dominator tree:
 341
 342                     [2a] Summarize its entry condition, and put the resulting relations
 343                          in the evaluation graph (this is the reason
 344                          why the BBs are examined following the
 345                          dominator tree, so that conditions are added
 346                          to the graph in a "cumulative" way)
 347
 348                     [2b] Scan the BB instructions, and for each array
 349                          access perform step [3]
 350
 351                     [2c] Process children BBs following the dominator
 352                          tree (step [2])
 353
 354                     [2d] Remove from the evaluation area the conditions added at step [2a]
 355                          (so that backtracking along the tree the area
 356                          is properly cleared)
 357
 358                 [3] Attempt the removal:
 359
 360                     [3a] Summarize the index expression, to see if we can handle it; there
 361                          are three cases: the index is either a
 362                          constant, or a variable (with an optional
 363                          delta) or cannot be handled (is a "any")
 364
 365                     [3b] If the index can be handled, traverse the evaluation area searching
 366                          a path from the index variable to the array
 367                          length (if the index is a constant, just
 368                          examine the array length to see if it has
 369                          some relation with this constant)
 370
 371                     [3c] Use the results of step [3b] to decide if the check is redundant
 372
 373