docs/abc-removal.txt

   1
   2 Here "abc" stays for "array bounds check", or "array bound checks", or
   3 some combination of the two...
   4
   5 ------------------------------------------------------------------------------
   6 USAGE
   7
   8 Simply use the "abcrem" optimization invoking mono.
   9
  10 To see if bound checks are actually removed, use "mono -v" and grep for
  11 "ARRAY-ACCESS" in the output, there should be a message for each check
  12 that has been removed.
  13
  14 To trace the algorithm execution, use "-v -v -v", and be prepared to be
  15 totally submersed by debugging messages...
  16
  17 ------------------------------------------------------------------------------
  18 EFFECTIVENESS
  19
  20 The abc removal code can now always remove bound checks from "clean"
  21 array scans in loops, and generally anyway there are clear conditions that
  22 already state that the index is "safe".
  23
  24 To be clearer, and give an idea of what the algorithm can and cannot do
  25 without describing it in detail... keep in mind that only "redunant" checks
  26 cannot be removed. By "redundant", I mean "already explicitly checked" in
  27 the method code.
  28
  29 Unfortunately, analyzing complex expressions is not so easy (see below for
  30 details), so the capabilities of this "abc remover" are limited.
  31
  32 These are simple guidelines:
  33 - Only simple comparisons between variables are understood.
  34 - Only increment-decrement operations are handled.
  35 - "switch" statements are not yet handled.
  36
  37 This means that code like this will be handled well:
  38
  39 for (int i = 0; i < a.Length; i++) {
  40         a [i] = .....
  41 }
  42
  43 The "i" variable could be declared out of the "for", the "for" could be a
  44 "while", and maybe even implemented with "goto", the array could be scanned
  45 in reverse order, and everything would still work.
  46 I could have a temporary variable storing the array length, and check on
  47 it inside the loop, and the abc removal would still occurr, like this:
  48
  49 int i = 0;
  50 int l = a.Length;
  51 while ( i < l ) {
  52         a [i] ......
  53 }
  54
  55 or this:
  56
  57 int l = a.Length;
  58 for (int i = l; i > 0; i--) {
  59         a [i] = .....
  60 }
  61
  62
  63 But just something like this
  64
  65 for (int i = 0; i < (a.Length -1); i++) .....
  66
  67 or like this
  68
  69 for (int i = 0; i < a.Length; i += 2) .....
  70
  71 would not work, the check would stay there, sorry :-(
  72
  73 Just to make you understand how things are tricky... this would work!
  74
  75 int limit = a.Length - 1;
  76 for (int i = 0; i < limit; i++) {
  77         a [i] = .....
  78 }
  79
  80
  81 A detailed explanation of the reason why things are done like this is given
  82 below...
  83
  84 All in all, the compiling phase is generally non so fast, and the abc removed
  85 are so few, that there is hardly a performance gain in using the "abcrem"
  86 flag in the compiling options for short programs like the ones I tried.
  87
  88 Anyway, various bound checks *are* removed, and this is good :-)
  89
  90 ------------------------------------------------------------------------------
  91 THE ALGORITHM
  92
  93 Unfortunately, this time google has not been my friend, and I did not
  94 find an algorithm I could reasonably implement in the time frame I had,
  95 so I had to invent one. The two cleasses of algorithms I found were either
  96 simply an analisys of variable ranges (which is not very different from
  97 what I'm doing), or they were based on type systems richer than the .NET
  98 one (particularly, they required integer types to have an explicit range,
  99 which could be related to array ranges). By the way, it seems that the
 100 gcj people do not have an array bound check removal optimization (they
 101 just have a compiler flag that unconditionally removes all the checks,
 102 but this is not what we want).
 103
 104 This array bound check removal (abc removal) algorithm is based on
 105 symbolic execution concepts. I handle the removal like an "unreachable
 106 code elimination" (and in fact the optimization could be extended to remove
 107 also other unreachable sections of code, due to branches that "always go the
 108 same way").
 109
 110 In symbolic execution, variables do not have actual (numeric) values, but
 111 instead symbolic expressions (like "a", or "x+y").
 112 Also, branch conditions are handled like symbolic conditions (like "i<k"),
 113 which state relations between variable values.
 114
 115 The SSA representation inside mini is somewhat close to a symbolic
 116 representation of the execution of the compiled method.
 117 Particularly, the representation of variable values is exactly a symbolic
 118 one. It is enough to find all CEE_STIND_* instructions which store to a
 119 local variable, and their second argument is exactly the variable value.
 120 Actually, "cfg->vars [<variable-index>]->def" should contain exactly
 121 those store instructions, and the "get_variable_value_from_ssa_store"
 122 function extracts the variable value from there.
 123
 124 On the other hand, the conditions under which each basic block is
 125 executed are not made fully explicit.
 126
 127 However, it is not difficult to make them so.
 128 Each BB that has more than one exit BB, in practice must end either with
 129 a conditional branch instruction or with a switch instruction.
 130 In the first case, the BB has exactly two exit BBs, and their execution
 131 conditions are easy to get from the condition of the branch (see the
 132 "get_relation_from_branch_instruction" function, and expecially the end of
 133 "analyze_block" in abcremoval.c.
 134 If there is a switch, the jump condition of every exit BB is the equality
 135 of the switch argument with the particular index associated with its case
 136 (but the current implementation does not handle switch statements yet).
 137
 138 These individual conditions are in practice associated to each arc that
 139 connects BBs in the CFG (with the simple case that unconditional branches
 140 have a "TRUE" condition, because they always happen).
 141
 142 So, for each BB, its *proper* entry condition is the union of all the
 143 conditions associated to arcs that enter the BB. The "union" is like a
 144 logical "or", in the sense that either of the condition could be true,
 145 they are not necessarily all true. This means that if I can enter a BB
 146 in two ways, and in one case I know that "x>0", and in the other that
 147 "x==0", actually in the BB I know that "x>=0", which is a weaker
 148 condition (the union of the two).
 149
 150 Also, the *complete* entry condition for a BB is the "intersection" of all
 151 the entry conditions of its dominators. This is true because each dominator
 152 is the only way to reach the BB, so the entry condition of each dominator
 153 must be true if the control flow reached the BB. This translates to the
 154 logical "and" of all the "proper" conditions of the BBs met walking up in the
 155 dominator tree. So, if one says "x>0", and another "x==1", then I know
 156 that "x==1", which is a stronger condition (the intersection of the two).
 157 Note that, if the two conditions were "x>0" and "x==0", then the block would
 158 be unreachable (the intersection is empty), because some branch is impossible.
 159
 160 Another observation is that, inside each BB, every variable is subject to the
 161 complete entry condition of that very same BB, and not the one in which it is
 162 defined (with the "complete entry condition" being the thing I defined before,
 163 sorry if these terms "proper" and "complete" are strange, I found nothing
 164 better).
 165 This happens because the branch conditions are related to the control flow.
 166 I can define "i=a", and if I am in a BB where "a>0", then "i>0", but not
 167 otherwise.
 168
 169 So, intuitively, if the available conditions say "i>=0", and i is used as an
 170 index in an array access, then the lower bound check can be omitted.
 171 If the condition also says "(i>=0)&&(i<array.length)", the abc removal can
 172 occur.
 173
 174 So, a complete solution to the problem of abc removal would be the following:
 175 for each array access, build a system of equations containing:
 176 [1] all the symbolic variable definitions
 177 [2] the complete entry condition of the BB in which the array access occurs
 178 [3] the two "goal functions" ("index >=0" and "index < array.length")
 179 If the system is valid for *each possible* variable value, then the goal
 180 functions are always true, and the abc can be removed.
 181
 182 All this discussion is useful to give a precise specification to the problem
 183 we are trying to solve.
 184 The trouble is that, in the general case, the resulting system of equations
 185 is like a predicate in first order logic, which is semi-decidable, and its
 186 general solution is anyway too complex to be attempted in a JIT compiler
 187 (which should not contain a full fledged theorem prover).
 188
 189 Therefore, we must cut some corner.
 190
 191
 192 By the way, there is also another big problem, which is caused by "recursive"
 193 symbolic definitions. These definition can (and generally do) happen every
 194 time there is a loop. For instance, in the following piece of code
 195
 196 for ( int i = 0; i < array.length; i++ ) {
 197         Console.WriteLine( "array [i] = " + array [i] );
 198 }
 199
 200 one of the definitions of i is a PHI that can be either 0 or "i + 1".
 201
 202 Now, we know that mathematically "i = i + 1" does not make sense, and in
 203 fact symbolic values are not "equations", they are "symbolic definitions".
 204
 205 The symbolic value of i is a generic "n", where "n" is the number of
 206 iterations of the loop, but this is terrible to handle (and in more complex
 207 examples the symbolic definition of i simply cannot be written, because i is
 208 calculated in an iterative way).
 209
 210 However, the definition "i = i + 1" tells us something about i: it tells us
 211 that i "grows". So (from the PHI definition) we know that i is either 0, or
 212 "grows". This is enough to tell that "i>=0", which is what we want!
 213 It is important to note that recursive definitions can only occurr inside
 214 PHI definitions, because actually a variable cannot be defined *only* in terms
 215 of itself!
 216
 217
 218 At this point, I can explain which corners I want to cut to make the
 219 problem solvable. It will not remove all the abc that could theoretically
 220 be removed, but at least it will work.
 221
 222 The easiest way to cut corners is to only handle expressions which are
 223 "reasonably simple", and ignore the rest.
 224 Keep in mind that ignoring an expression is not harmful in itself.
 225 The algorithm will be simply "less powerful", because it will ignore
 226 conditions that could have caused to the removal of an abc, but will
 227 not remove checks "by mistake" (so the resulting code will be in any case
 228 correct).
 229
 230 In a first attempt, we can consider only conditions that have the simple
 231 possible form, which is "(compare simpleExpression simpleExpression)",
 232 where "simpleExpression" is either "(ldind.* local[*])" or
 233 "(ldlen (ldind.ref local[*]))" (or a constant, of course).
 234
 235 We can also "summarize" variable definitions, keeping only what is relevant
 236 to know: if they are greater or smaller than zero or other variables.
 237
 238 One particular note on PHI functions: they work (obviously) like the logical
 239 "or" of their definitions, and therefore are equivalent to the "logical or"
 240 of the summarization of their definitions.
 241
 242 About recursive definitions (which, believe me, are the worst thing in all
 243 this mess), we handle only "monotonic" ones. That is, we try to understand
 244 if the recursive definition (which, as we said above, must happen because
 245 of a loop) always "grows" or "gets smaller". In all other cases, we decide
 246 we cannot handle it.
 247
 248 So, the algorithm to optimize one method could be something like:
 249
 250 [1] Build the SSA representation.
 251 [2] Analyze each BB. Particularly, do the following:
 252     [2a] summarize its exit conditions
 253     [2b] summarize all the variable definitions occurring in the BB
 254     [2c] check if it contains array accesses (only as optimization, so that
 255          step 3 will be performed only when necessary)
 256 [3] Perform the removal. For each BB that contains array accesses, do:
 257     [3a] build an evaluation area where for each variable we keep track
 258          of its relations with zero and with each other variable.
 259     [3b] populate the evaluation area with the initial data (the variable
 260          definitions and the complete entry condition of the BB)
 261     [3c] propagate the relations between variables (so that from "a<b" and
 262          "b<c" we know that "a<c") in the evaluation area
 263     [3d] for each array access in the BB, use the evaluation area to "match
 264          conditions", to see if the goal functions are satisfied
 265
 266
 267 ------------------------------------------------------------------------------
 268 LOW LEVEL DESIGN DECISIONS
 269
 270 One of the problems I had to solve was the representation of the relations
 271 between symbolic values.
 272 The "canonical" solution would be to have a complex, potentially recursive
 273 data structure, somewhat similar to MonoInst, and use it all the time (maybe
 274 MonoInst could be used "as is").
 275 The problem is that, at this point, the software would have to "play" with
 276 these structures, for instance to deduce "b < a-1" from "a > b+1", because we
 277 are interested in what value b has, and not a. And maybe the software should
 278 be able to do this without modifying the original condition, because it was
 279 used elsewhere (or could be reused).
 280 Then, the software should also be able to apply logical relations to
 281 conditions (like "(a>0)||(c<1)"), and manipulate them, too.
 282
 283 In my opinion, an optimizer written in this way risks to be too complex, and
 284 its performance not acceptable in a JIT compiler.
 285 Therefore, I decided to represent values in a "summarized" way.
 286 For instance, "x = a + 1" becomes "x > a" (and, by the way, "x = x + 1"
 287 would look like "x > x", which means that "x grows").
 288 So, each variable value is simply a "relation" of the variable with other
 289 variables or constants. Anything more complex is ignored.
 290 Relations are represented with flags (EQ, LT and GT), so that it is easy
 291 to combine them logically (it is enough to perform the bitwise operations
 292 on the flags). The condition (EQ|LT|GT) means we know nothing special, any
 293 value is possible. The empty condition would mean an unreachable statement
 294 (because no value is acceptable).
 295
 296 There is still the problem of identifying variables, and where to store all
 297 these relations. After some thinking, I decided that a "brute force" approach
 298 is the easier, and probably also the fastest. In practice, I keep in memory
 299 an array of "variable relations", where for each variable I record its
 300 relation with the constants 0 and 1 and an array of its relations with all
 301 other variables. The evaluation area, therefore, looks like a square matrix
 302 as large as the number of variables. I *know* that the matrix is rather sparse
 303 (in the sense that only few of its cells are significant), but I decided that
 304 this data structure was the simplest and more efficient anyway. After all,
 305 each cell takes just one byte, and any other data structure must be some
 306 kind of associative array (like a g_hash_table). Such a data structue is
 307 difficult to use with a MonoMemPool, and in the end is slower than a plain
 308 array, and has the storage overhead of all the pointers... in the end, maybe
 309 the full matrix is not a waste at all.
 310 <note>
 311 After a few benchmarks, I clearly see that the matrix works well if the
 312 variables are not too much, otherwise it gets "too sparse", and much time
 313 is wasted in the propagation phase. Maybe the data structure could be changed
 314 after a more careful evaluation of the problem...
 315 </note>
 316
 317 With these assumptions, all the memory used to represent values, conditions
 318 and the evaluation area can be allocated in advance (from a MonoMemPool), and
 319 used for all the analysis of one method. Moreover, the performance of logical
 320 operations is high (they are simple bitwise operations on bytes).
 321 <note>
 322 For the propagation phase, there was a function I could not easily code
 323 with bitwise operations. As that phase is a real pain for the performance
 324 (the complexity is quadratic, and it must be executed iteratively until
 325 nothing changes), I decided to code it in perl, and generate a precomputed
 326 table then used by the C code (just 64 bytes). I think this is faster than
 327 all those branches... Really, I could think again to phases 3c and 3d (see
 328 above), maybe there is a better way to do them.
 329 </note>
 330
 331 Of course not all expressions can be handled by the summarization, so the
 332 optimizer is not as effective as a full blown theorem prover...
 333
 334 Another thing related to design... the implementation is as less intrusive
 335 as possible.
 336 I "patched" the main files to add an option for abc removal, and handle it
 337 just like any other option, and then wrote all the relevant code in two
 338 separated files (one header for data structures, and a C file for the code).
 339 If the patch turns out to be useful also for other pourposes (like a more
 340 generic "unreachable code elimination", that detects branches that will never
 341 be taken using the analysis of variable values), it will not be difficult
 342 to integrate the branch summarizations into the BasicBlock data structures,
 343 and so on.
 344
 345 One last thing... in some places, I knowingly coded in an inefficient way.
 346 For instance, "remove_abc_from_block" does something only if the flag
 347 "has_array_access_instructions" is true, but I call it anyway, letting it
 348 do nothing (calling it conditionally would be better). There are several
 349 things like this in the code, but I was in a hurry, and preferred to make
 350 the code more readable (it is already difficult to understand as it is).
 351
 352 ------------------------------------------------------------------------------
 353 WHEN IS ABC REMOVAL POSSIBLE? WHEN IS IT USEFUL?
 354
 355 Warning! Random ideas ahead!
 356
 357 In general, managed code should be safe, so abc removal is possible only if
 358 there already is some branch in the method that does the check.
 359 So, one check on the index is "mandatory", the abc removal can only avoid a
 360 second one (which is useless *because* of the first one).
 361 If in the code there is not a conditional branch that "avoids" to access
 362 an array out of its bounds, in general the abc removal cannot occur.
 363
 364 It could happen, however, that one method is called from another one, and
 365 the check on the index is performed in the caller.
 366
 367 For instance:
 368
 369 void
 370 caller( int[] a, int n ) {
 371         if ( n <= a.Length ) {
 372                 for ( int i = 0; i < n; i++ ) {
 373                         callee( a, i );
 374                 }
 375         }
 376 }
 377
 378 void callee( int[] a, int i ) {
 379         Console.WriteLine( "a [i] = " + a [i] );
 380 }
 381
 382
 383 In this case, it should be possible to have two versions of the callee, one
 384 without checks, to be called from methods that do the check themselves, and
 385 another to be called otherwise.
 386 This kind of optimization could be profile driven: if one method is executed
 387 several times, wastes CPU time for bound checks, and is mostly called from
 388 another one, it could make sense to analyze the situation in detail.
 389
 390 However, one simple way to perform this optimization would be to inline
 391 the callee (so that the abc removal algorithm can see both methods at once).
 392
 393 The biggest benefits of abc removal can be seen for array accesses that
 394 occur "often", like the ones inside inner loops. This is why I tried to
 395 handle "recursive" variable definitions, because without them you cannot
 396 optimize inside loops, which is also where you need it most.
 397
 398 Another possible optimization (alwais related to loops) is the following:
 399
 400 void method( int[] a, int n ) {
 401         for ( int i = 0; i < n; i++ ) {
 402                 Console.WriteLine( "a [i] = " + a [i] );
 403         }
 404 }
 405
 406 In this case, the check on n is missing, so the abc could not be removed.
 407 However, this would mean that i will be checked twice at each loop iteration,
 408 one against n, and the other against a.Length. The code could be transformed
 409 like this:
 410
 411 void method( int[] a, int n ) {
 412         if ( n < a.Length ) {
 413                 for ( int i = 0; i < n; i++ ) {
 414                         Console.WriteLine( "a [i] = " + a [i] ); // Remove abc
 415                 }
 416         } else {
 417                 for ( int i = 0; i < n; i++ ) {
 418                         Console.WriteLine( "a [i] = " + a [i] ); // Leave abc
 419                 }
 420         }
 421 }
 422
 423 This could result in performance improvements (again, probably this
 424 optimization should be profile driven).
 425
 426 ------------------------------------------------------------------------------
 427 OPEN ISSUES
 428
 429 There are several issues that should still be addressed...
 430
 431 One is related to aliasing. For now, I decided to operate only on local
 432 variables, and ignore aliasing problems (and alias analisys has not yet
 433 been implemented anyway).
 434 Actually, I identified the local arrays with their length, because I
 435 totally ignore the contents of arrays (and objects in general).
 436
 437 Also, in several places in the code I only handle some cases, and ignore
 438 other more complex ones, which could anyway work (there are comments where
 439 this happens). Anyway, this is not harmful (the code is not as effective
 440 as it could be).
 441
 442 Another possible improvement is the explicit handling of all constants.
 443 For now, code like
 444
 445 void
 446 method () {
 447         int[] a = new int [16];
 448         for ( int i = 0; i < 16; i++ ) {
 449                 a [i] = i;
 450         }
 451 }
 452
 453 is not handled, because the two constants "16" are lost when variable
 454 relations are summarized (actually they are not lost, they are stored in the
 455 summarized values, but they are not yet used correctly).
 456
 457 The worst thing, anyway, is that for now I fail completely in distinguishing
 458 between signed and unsigned variables/operations/conditions, and between
 459 different integer sizes (in bytes). I did this on pourpose, just for lack of
 460 time, but this can turn out to be terribly wrong.
 461 The problem is caused by the fact that I handle arithmetical operations and
 462 conditions as "ideal" operations, but actually they can overflow and/or
 463 underflow, giving "surprising" results.
 464
 465 For instance, look at the following two methods:
 466
 467 public static void testSignedOverflow()
 468 {
 469   Console.WriteLine( "Testing signed overflow..." );
 470   int[] ia = new int[70000];
 471   int i = 1;
 472   while ( i <ia.Length )
 473   {
 474     try
 475     {
 476       Console.WriteLine( " i = " + i );
 477       ia[i] = i;
 478     }
 479     catch ( IndexOutOfRangeException e )
 480     {
 481       Console.WriteLine( "Yes, we had an overflow (i = " + i + ")" );
 482     }
 483     i *= 60000;
 484   }
 485   Console.WriteLine( "Testing signed overflow done." );
 486 }
 487
 488 public static void testUnsignedOverflow()
 489 {
 490   Console.WriteLine( "Testing unsigned overflow..." );
 491   int[] ia = new int[70000];
 492   uint i = 1;
 493   while ( i <ia.Length )
 494   {
 495     try
 496     {
 497       Console.WriteLine( " i = " + i );
 498       ia[i] = (int) i;
 499     }
 500     catch ( IndexOutOfRangeException e )
 501     {
 502       Console.WriteLine( "Yes, we had an overflow (i = " + i + ")" );
 503     }
 504     i *= 60000;
 505   }
 506   Console.WriteLine( "Testing unsigned overflow done." );
 507 }
 508
 509 In the "testSignedOverflow" method, the exception will be thrown, while
 510 in the "testUnsignedOverflow" everything will go on smoothly.
 511 The problem is that the test "i <ia.Length" is a signed comparison in the
 512 first case, and will not exit from the loop if the index is negative.
 513 Therefore, in the first method, the abc should not be removed.
 514
 515 In practice, the abc removal code should try to predict if any given
 516 expression could over/under-flow, and act accordingly.
 517 For now, I made so that the only accepted expressions are plain increments
 518 and decrements of variables (which cannot over/under-flow without being
 519 trapped by the existing conditions). Anyway, this issue should be analyzed
 520 better.
 521 By the way, in summarizations there is a field that keeps track of the
 522 relation with "1", which I planned to use exactly to summarize products and
 523 quotients, but it is never used (as I said, only increment and decrement
 524 operations are properly summarized).
 525