X-Git-Url: http://wien.tomnetworks.com/gitweb/?a=blobdiff_plain;f=docs%2Fabc-removal.txt;h=8c10579eaceefa09824ad445bc220a343d8a7c79;hb=0ced2df1c60b6585cdae28a4f53c436088cfadf1;hp=82bad22330e083c5314bb8d12b4109d53ebc1b91;hpb=7ff8f29ff29fa3f08ef305ac43ef079097323286;p=mono.git diff --git a/docs/abc-removal.txt b/docs/abc-removal.txt index 82bad22330e..8c10579eace 100755 --- a/docs/abc-removal.txt +++ b/docs/abc-removal.txt @@ -30,8 +30,10 @@ Unfortunately, analyzing complex expressions is not so easy (see below for details), so the capabilities of this "abc remover" are limited. These are simple guidelines: -- Only simple comparisons between variables are understood. -- Only increment-decrement operations are handled. +- Only expressions of the following kinds are handled: + - constant + - variable [+/- constant] +- Only comparisons between "handled" expressions are understood. - "switch" statements are not yet handled. This means that code like this will be handled well: @@ -59,16 +61,24 @@ for (int i = l; i > 0; i--) { a [i] = ..... } +The following two examples would work: + +for (int i = 0; i < (a.Length -1); i++) ..... +for (int i = 0; i < a.Length; i += 2) ..... But just something like this -for (int i = 0; i < (a.Length -1); i++) ..... +int delta = -1; +for (int i = 0; i < (a.Length + delta); i++) ..... or like this -for (int i = 0; i < a.Length; i += 2) ..... +int delta = +2; +for (int i = 0; i < a.Length; i += delta) ..... would not work, the check would stay there, sorry :-( +(unless a combination of cfold, consprop and copyprop is used, too, which +would make the constant value of "delta" explicit). Just to make you understand how things are tricky... this would work! @@ -81,26 +91,9 @@ for (int i = 0; i < limit; i++) { A detailed explanation of the reason why things are done like this is given below... -All in all, the compiling phase is generally non so fast, and the abc removed -are so few, that there is hardly a performance gain in using the "abcrem" -flag in the compiling options for short programs like the ones I tried. - -Anyway, various bound checks *are* removed, and this is good :-) - ------------------------------------------------------------------------------ THE ALGORITHM -Unfortunately, this time google has not been my friend, and I did not -find an algorithm I could reasonably implement in the time frame I had, -so I had to invent one. The two cleasses of algorithms I found were either -simply an analisys of variable ranges (which is not very different from -what I'm doing), or they were based on type systems richer than the .NET -one (particularly, they required integer types to have an explicit range, -which could be related to array ranges). By the way, it seems that the -gcj people do not have an array bound check removal optimization (they -just have a compiler flag that unconditionally removes all the checks, -but this is not what we want). - This array bound check removal (abc removal) algorithm is based on symbolic execution concepts. I handle the removal like an "unreachable code elimination" (and in fact the optimization could be extended to remove @@ -202,9 +195,9 @@ one of the definitions of i is a PHI that can be either 0 or "i + 1". Now, we know that mathematically "i = i + 1" does not make sense, and in fact symbolic values are not "equations", they are "symbolic definitions". -The symbolic value of i is a generic "n", where "n" is the number of +The actual symbolic value of i is a generic "n", where "n" is the number of iterations of the loop, but this is terrible to handle (and in more complex -examples the symbolic definition of i simply cannot be written, because i is +examples the symbolic value of i simply cannot be written, because i is calculated in an iterative way). However, the definition "i = i + 1" tells us something about i: it tells us @@ -227,13 +220,25 @@ conditions that could have caused to the removal of an abc, but will not remove checks "by mistake" (so the resulting code will be in any case correct). -In a first attempt, we can consider only conditions that have the simple -possible form, which is "(compare simpleExpression simpleExpression)", -where "simpleExpression" is either "(ldind.* local[*])" or -"(ldlen (ldind.ref local[*]))" (or a constant, of course). +The expressions we handle are the following (all of integer type): +- constant +- variable +- variable + constant +- constant + variable +- variable - constant + +And, of course, PHI definitions. +Any other expression causes the introduction of an "any" value in the +evaluation, which makes all values that depend from it unknown as well. + +We will call these kind of definitions "summarizable" definitions. -We can also "summarize" variable definitions, keeping only what is relevant -to know: if they are greater or smaller than zero or other variables. +In a first attempt, we can consider only branch conditions that have the +simplest possible form (the comparison of two summarizable expressions). + +We can also simplify the effect of variable definitions, keeping only +what is relevant to know: their value range with respect to zero and with +respect to the length of the array we are currently handling. One particular note on PHI functions: they work (obviously) like the logical "or" of their definitions, and therefore are equivalent to the "logical or" @@ -245,281 +250,53 @@ if the recursive definition (which, as we said above, must happen because of a loop) always "grows" or "gets smaller". In all other cases, we decide we cannot handle it. -So, the algorithm to optimize one method could be something like: - -[1] Build the SSA representation. -[2] Analyze each BB. Particularly, do the following: - [2a] summarize its exit conditions - [2b] summarize all the variable definitions occurring in the BB - [2c] check if it contains array accesses (only as optimization, so that - step 3 will be performed only when necessary) -[3] Perform the removal. For each BB that contains array accesses, do: - [3a] build an evaluation area where for each variable we keep track - of its relations with zero and with each other variable. - [3b] populate the evaluation area with the initial data (the variable - definitions and the complete entry condition of the BB) - [3c] propagate the relations between variables (so that from "a b+1", because we -are interested in what value b has, and not a. And maybe the software should -be able to do this without modifying the original condition, because it was -used elsewhere (or could be reused). -Then, the software should also be able to apply logical relations to -conditions (like "(a>0)||(c<1)"), and manipulate them, too. - -In my opinion, an optimizer written in this way risks to be too complex, and -its performance not acceptable in a JIT compiler. -Therefore, I decided to represent values in a "summarized" way. -For instance, "x = a + 1" becomes "x > a" (and, by the way, "x = x + 1" -would look like "x > x", which means that "x grows"). -So, each variable value is simply a "relation" of the variable with other -variables or constants. Anything more complex is ignored. -Relations are represented with flags (EQ, LT and GT), so that it is easy -to combine them logically (it is enough to perform the bitwise operations -on the flags). The condition (EQ|LT|GT) means we know nothing special, any -value is possible. The empty condition would mean an unreachable statement -(because no value is acceptable). - -There is still the problem of identifying variables, and where to store all -these relations. After some thinking, I decided that a "brute force" approach -is the easier, and probably also the fastest. In practice, I keep in memory -an array of "variable relations", where for each variable I record its -relation with the constants 0 and 1 and an array of its relations with all -other variables. The evaluation area, therefore, looks like a square matrix -as large as the number of variables. I *know* that the matrix is rather sparse -(in the sense that only few of its cells are significant), but I decided that -this data structure was the simplest and more efficient anyway. After all, -each cell takes just one byte, and any other data structure must be some -kind of associative array (like a g_hash_table). Such a data structue is -difficult to use with a MonoMemPool, and in the end is slower than a plain -array, and has the storage overhead of all the pointers... in the end, maybe -the full matrix is not a waste at all. - -After a few benchmarks, I clearly see that the matrix works well if the -variables are not too much, otherwise it gets "too sparse", and much time -is wasted in the propagation phase. Maybe the data structure could be changed -after a more careful evaluation of the problem... - - -With these assumptions, all the memory used to represent values, conditions -and the evaluation area can be allocated in advance (from a MonoMemPool), and -used for all the analysis of one method. Moreover, the performance of logical -operations is high (they are simple bitwise operations on bytes). - -For the propagation phase, there was a function I could not easily code -with bitwise operations. As that phase is a real pain for the performance -(the complexity is quadratic, and it must be executed iteratively until -nothing changes), I decided to code it in perl, and generate a precomputed -table then used by the C code (just 64 bytes). I think this is faster than -all those branches... Really, I could think again to phases 3c and 3d (see -above), maybe there is a better way to do them. - - -Of course not all expressions can be handled by the summarization, so the -optimizer is not as effective as a full blown theorem prover... - -Another thing related to design... the implementation is as less intrusive -as possible. -I "patched" the main files to add an option for abc removal, and handle it -just like any other option, and then wrote all the relevant code in two -separated files (one header for data structures, and a C file for the code). -If the patch turns out to be useful also for other pourposes (like a more -generic "unreachable code elimination", that detects branches that will never -be taken using the analysis of variable values), it will not be difficult -to integrate the branch summarizations into the BasicBlock data structures, -and so on. - -One last thing... in some places, I knowingly coded in an inefficient way. -For instance, "remove_abc_from_block" does something only if the flag -"has_array_access_instructions" is true, but I call it anyway, letting it -do nothing (calling it conditionally would be better). There are several -things like this in the code, but I was in a hurry, and preferred to make -the code more readable (it is already difficult to understand as it is). - ------------------------------------------------------------------------------- -WHEN IS ABC REMOVAL POSSIBLE? WHEN IS IT USEFUL? - -Warning! Random ideas ahead! - -In general, managed code should be safe, so abc removal is possible only if -there already is some branch in the method that does the check. -So, one check on the index is "mandatory", the abc removal can only avoid a -second one (which is useless *because* of the first one). -If in the code there is not a conditional branch that "avoids" to access -an array out of its bounds, in general the abc removal cannot occur. - -It could happen, however, that one method is called from another one, and -the check on the index is performed in the caller. - -For instance: -void -caller( int[] a, int n ) { - if ( n <= a.Length ) { - for ( int i = 0; i < n; i++ ) { - callee( a, i ); - } - } -} - -void callee( int[] a, int i ) { - Console.WriteLine( "a [i] = " + a [i] ); -} - - -In this case, it should be possible to have two versions of the callee, one -without checks, to be called from methods that do the check themselves, and -another to be called otherwise. -This kind of optimization could be profile driven: if one method is executed -several times, wastes CPU time for bound checks, and is mostly called from -another one, it could make sense to analyze the situation in detail. - -However, one simple way to perform this optimization would be to inline -the callee (so that the abc removal algorithm can see both methods at once). - -The biggest benefits of abc removal can be seen for array accesses that -occur "often", like the ones inside inner loops. This is why I tried to -handle "recursive" variable definitions, because without them you cannot -optimize inside loops, which is also where you need it most. - -Another possible optimization (alwais related to loops) is the following: - -void method( int[] a, int n ) { - for ( int i = 0; i < n; i++ ) { - Console.WriteLine( "a [i] = " + a [i] ); - } -} - -In this case, the check on n is missing, so the abc could not be removed. -However, this would mean that i will be checked twice at each loop iteration, -one against n, and the other against a.Length. The code could be transformed -like this: - -void method( int[] a, int n ) { - if ( n < a.Length ) { - for ( int i = 0; i < n; i++ ) { - Console.WriteLine( "a [i] = " + a [i] ); // Remove abc - } - } else { - for ( int i = 0; i < n; i++ ) { - Console.WriteLine( "a [i] = " + a [i] ); // Leave abc - } - } -} - -This could result in performance improvements (again, probably this -optimization should be profile driven). - ------------------------------------------------------------------------------- -OPEN ISSUES - -There are several issues that should still be addressed... - -One is related to aliasing. For now, I decided to operate only on local -variables, and ignore aliasing problems (and alias analisys has not yet -been implemented anyway). -Actually, I identified the local arrays with their length, because I -totally ignore the contents of arrays (and objects in general). - -Also, in several places in the code I only handle some cases, and ignore -other more complex ones, which could anyway work (there are comments where -this happens). Anyway, this is not harmful (the code is not as effective -as it could be). - -Another possible improvement is the explicit handling of all constants. -For now, code like - -void -method () { - int[] a = new int [16]; - for ( int i = 0; i < 16; i++ ) { - a [i] = i; - } -} - -is not handled, because the two constants "16" are lost when variable -relations are summarized (actually they are not lost, they are stored in the -summarized values, but they are not yet used correctly). - -The worst thing, anyway, is that for now I fail completely in distinguishing -between signed and unsigned variables/operations/conditions, and between -different integer sizes (in bytes). I did this on pourpose, just for lack of -time, but this can turn out to be terribly wrong. -The problem is caused by the fact that I handle arithmetical operations and -conditions as "ideal" operations, but actually they can overflow and/or -underflow, giving "surprising" results. - -For instance, look at the following two methods: - -public static void testSignedOverflow() -{ - Console.WriteLine( "Testing signed overflow..." ); - int[] ia = new int[70000]; - int i = 1; - while ( i = 0" and "index < length". If such a path is found, the check is +removed. It is true that in theory *each* traversal has a complexity which +is exponential on the number of variables, but in practice the graph is not +very connected, so the traversal terminates quickly. + + +Then, the algorithm to optimize one method looks like this: + +[1] Preparation: + [1a] Build the SSA representation. + [1b] Prepare the evaluation graph (empty) + [1b] Summarize each varible definition, and put the resulting relations + in the evaluation graph +[2] Analyze each BB, starting from the entry point and following the + dominator tree: + [2a] Summarize its entry condition, and put the resulting relations + in the evaluation graph (this is the reason why the BBs are examined + following the dominator tree, so that conditions are added to the + graph in a "cumulative" way) + [2b] Scan the BB instructions, and for each array access perform step [3] + [2c] Process children BBs following the dominator tree (step [2]) + [2d] Remove from the evaluation area the conditions added at step [2a] + (so that backtracking along the tree the area is properly cleared) +[3] Attempt the removal: + [3a] Summarize the index expression, to see if we can handle it; there + are three cases: the index is either a constant, or a variable (with + an optional delta) or cannot be handled (is a "any") + [3b] If the index can be handled, traverse the evaluation area searching + a path from the index variable to the array length (if the index is + a constant, just examine the array length to see if it has some + relation with this constant) + [3c] Use the results of step [3b] to decide if the check is redundant -In the "testSignedOverflow" method, the exception will be thrown, while -in the "testUnsignedOverflow" everything will go on smoothly. -The problem is that the test "i