X-Git-Url: http://wien.tomnetworks.com/gitweb/?a=blobdiff_plain;f=docs%2Fabc-removal.txt;h=5c2ce91fd5f294e7a96b347bdb5f69200022c2c9;hb=edbc5c2334e10836479d1cc528c68d4ad5b47440;hp=8c10579eaceefa09824ad445bc220a343d8a7c79;hpb=53e266903ec6b2d822cf5b0c566f6374df5307a4;p=mono.git diff --git a/docs/abc-removal.txt b/docs/abc-removal.txt index 8c10579eace..5c2ce91fd5f 100755 --- a/docs/abc-removal.txt +++ b/docs/abc-removal.txt @@ -1,302 +1,373 @@ -Here "abc" stays for "array bounds check", or "array bound checks", or -some combination of the two... - ------------------------------------------------------------------------------- -USAGE - -Simply use the "abcrem" optimization invoking mono. - -To see if bound checks are actually removed, use "mono -v" and grep for -"ARRAY-ACCESS" in the output, there should be a message for each check -that has been removed. - -To trace the algorithm execution, use "-v -v -v", and be prepared to be -totally submersed by debugging messages... - ------------------------------------------------------------------------------- -EFFECTIVENESS - -The abc removal code can now always remove bound checks from "clean" -array scans in loops, and generally anyway there are clear conditions that -already state that the index is "safe". + Arrays Bounds Check Elimination (ABC) + in the Mono Runtime -To be clearer, and give an idea of what the algorithm can and cannot do -without describing it in detail... keep in mind that only "redunant" checks -cannot be removed. By "redundant", I mean "already explicitly checked" in -the method code. - -Unfortunately, analyzing complex expressions is not so easy (see below for -details), so the capabilities of this "abc remover" are limited. - -These are simple guidelines: -- Only expressions of the following kinds are handled: - - constant - - variable [+/- constant] -- Only comparisons between "handled" expressions are understood. -- "switch" statements are not yet handled. - -This means that code like this will be handled well: - -for (int i = 0; i < a.Length; i++) { - a [i] = ..... -} - -The "i" variable could be declared out of the "for", the "for" could be a -"while", and maybe even implemented with "goto", the array could be scanned -in reverse order, and everything would still work. -I could have a temporary variable storing the array length, and check on -it inside the loop, and the abc removal would still occurr, like this: - -int i = 0; -int l = a.Length; -while ( i < l ) { - a [i] ...... -} - -or this: - -int l = a.Length; -for (int i = l; i > 0; i--) { - a [i] = ..... -} - -The following two examples would work: - -for (int i = 0; i < (a.Length -1); i++) ..... -for (int i = 0; i < a.Length; i += 2) ..... - -But just something like this - -int delta = -1; -for (int i = 0; i < (a.Length + delta); i++) ..... - -or like this - -int delta = +2; -for (int i = 0; i < a.Length; i += delta) ..... - -would not work, the check would stay there, sorry :-( -(unless a combination of cfold, consprop and copyprop is used, too, which -would make the constant value of "delta" explicit). - -Just to make you understand how things are tricky... this would work! - -int limit = a.Length - 1; -for (int i = 0; i < limit; i++) { - a [i] = ..... -} - - -A detailed explanation of the reason why things are done like this is given -below... - ------------------------------------------------------------------------------- -THE ALGORITHM - -This array bound check removal (abc removal) algorithm is based on -symbolic execution concepts. I handle the removal like an "unreachable -code elimination" (and in fact the optimization could be extended to remove -also other unreachable sections of code, due to branches that "always go the -same way"). - -In symbolic execution, variables do not have actual (numeric) values, but -instead symbolic expressions (like "a", or "x+y"). -Also, branch conditions are handled like symbolic conditions (like "ivars []->def" should contain exactly -those store instructions, and the "get_variable_value_from_ssa_store" -function extracts the variable value from there. - -On the other hand, the conditions under which each basic block is -executed are not made fully explicit. - -However, it is not difficult to make them so. -Each BB that has more than one exit BB, in practice must end either with -a conditional branch instruction or with a switch instruction. -In the first case, the BB has exactly two exit BBs, and their execution -conditions are easy to get from the condition of the branch (see the -"get_relation_from_branch_instruction" function, and expecially the end of -"analyze_block" in abcremoval.c. -If there is a switch, the jump condition of every exit BB is the equality -of the switch argument with the particular index associated with its case -(but the current implementation does not handle switch statements yet). - -These individual conditions are in practice associated to each arc that -connects BBs in the CFG (with the simple case that unconditional branches -have a "TRUE" condition, because they always happen). - -So, for each BB, its *proper* entry condition is the union of all the -conditions associated to arcs that enter the BB. The "union" is like a -logical "or", in the sense that either of the condition could be true, -they are not necessarily all true. This means that if I can enter a BB -in two ways, and in one case I know that "x>0", and in the other that -"x==0", actually in the BB I know that "x>=0", which is a weaker -condition (the union of the two). - -Also, the *complete* entry condition for a BB is the "intersection" of all -the entry conditions of its dominators. This is true because each dominator -is the only way to reach the BB, so the entry condition of each dominator -must be true if the control flow reached the BB. This translates to the -logical "and" of all the "proper" conditions of the BBs met walking up in the -dominator tree. So, if one says "x>0", and another "x==1", then I know -that "x==1", which is a stronger condition (the intersection of the two). -Note that, if the two conditions were "x>0" and "x==0", then the block would -be unreachable (the intersection is empty), because some branch is impossible. - -Another observation is that, inside each BB, every variable is subject to the -complete entry condition of that very same BB, and not the one in which it is -defined (with the "complete entry condition" being the thing I defined before, -sorry if these terms "proper" and "complete" are strange, I found nothing -better). -This happens because the branch conditions are related to the control flow. -I can define "i=a", and if I am in a BB where "a>0", then "i>0", but not -otherwise. - -So, intuitively, if the available conditions say "i>=0", and i is used as an -index in an array access, then the lower bound check can be omitted. -If the condition also says "(i>=0)&&(i=0" and "index < array.length") -If the system is valid for *each possible* variable value, then the goal -functions are always true, and the abc can be removed. - -All this discussion is useful to give a precise specification to the problem -we are trying to solve. -The trouble is that, in the general case, the resulting system of equations -is like a predicate in first order logic, which is semi-decidable, and its -general solution is anyway too complex to be attempted in a JIT compiler -(which should not contain a full fledged theorem prover). - -Therefore, we must cut some corner. - - -By the way, there is also another big problem, which is caused by "recursive" -symbolic definitions. These definition can (and generally do) happen every -time there is a loop. For instance, in the following piece of code - -for ( int i = 0; i < array.length; i++ ) { - Console.WriteLine( "array [i] = " + array [i] ); -} - -one of the definitions of i is a PHI that can be either 0 or "i + 1". - -Now, we know that mathematically "i = i + 1" does not make sense, and in -fact symbolic values are not "equations", they are "symbolic definitions". - -The actual symbolic value of i is a generic "n", where "n" is the number of -iterations of the loop, but this is terrible to handle (and in more complex -examples the symbolic value of i simply cannot be written, because i is -calculated in an iterative way). - -However, the definition "i = i + 1" tells us something about i: it tells us -that i "grows". So (from the PHI definition) we know that i is either 0, or -"grows". This is enough to tell that "i>=0", which is what we want! -It is important to note that recursive definitions can only occurr inside -PHI definitions, because actually a variable cannot be defined *only* in terms -of itself! - - -At this point, I can explain which corners I want to cut to make the -problem solvable. It will not remove all the abc that could theoretically -be removed, but at least it will work. - -The easiest way to cut corners is to only handle expressions which are -"reasonably simple", and ignore the rest. -Keep in mind that ignoring an expression is not harmful in itself. -The algorithm will be simply "less powerful", because it will ignore -conditions that could have caused to the removal of an abc, but will -not remove checks "by mistake" (so the resulting code will be in any case -correct). - -The expressions we handle are the following (all of integer type): -- constant -- variable -- variable + constant -- constant + variable -- variable - constant - -And, of course, PHI definitions. -Any other expression causes the introduction of an "any" value in the -evaluation, which makes all values that depend from it unknown as well. - -We will call these kind of definitions "summarizable" definitions. - -In a first attempt, we can consider only branch conditions that have the -simplest possible form (the comparison of two summarizable expressions). - -We can also simplify the effect of variable definitions, keeping only -what is relevant to know: their value range with respect to zero and with -respect to the length of the array we are currently handling. - -One particular note on PHI functions: they work (obviously) like the logical -"or" of their definitions, and therefore are equivalent to the "logical or" -of the summarization of their definitions. - -About recursive definitions (which, believe me, are the worst thing in all -this mess), we handle only "monotonic" ones. That is, we try to understand -if the recursive definition (which, as we said above, must happen because -of a loop) always "grows" or "gets smaller". In all other cases, we decide -we cannot handle it. - - -One critical thing, once we have defined all these data structures, is how -the evaluation is actually performed. - -In a first attempt I coded a "brute force" approach, which for each BB -tried to examine all possible conditions between all variables, filling -a sort of "evaluation matrix". The problem was that the complexity of this -evaluation was quadratic (or worse) on the number of variables, and that -many wariables were examined even if they were not involved in any array -access. - -Following the ABCD paper (http://citeseer.ist.psu.edu/bodik00abcd.html), -I rewrote the algorithm in a more "sparse" way. -Now, the main data structure is a graph of relations between variables, and -each attempt to remove a check performs a traversal of the graph, looking -for a path from the index to the array length that satisfies the properties -"index >= 0" and "index < length". If such a path is found, the check is -removed. It is true that in theory *each* traversal has a complexity which -is exponential on the number of variables, but in practice the graph is not -very connected, so the traversal terminates quickly. - - -Then, the algorithm to optimize one method looks like this: - -[1] Preparation: - [1a] Build the SSA representation. - [1b] Prepare the evaluation graph (empty) - [1b] Summarize each varible definition, and put the resulting relations - in the evaluation graph -[2] Analyze each BB, starting from the entry point and following the - dominator tree: - [2a] Summarize its entry condition, and put the resulting relations - in the evaluation graph (this is the reason why the BBs are examined - following the dominator tree, so that conditions are added to the - graph in a "cumulative" way) - [2b] Scan the BB instructions, and for each array access perform step [3] - [2c] Process children BBs following the dominator tree (step [2]) - [2d] Remove from the evaluation area the conditions added at step [2a] - (so that backtracking along the tree the area is properly cleared) -[3] Attempt the removal: - [3a] Summarize the index expression, to see if we can handle it; there - are three cases: the index is either a constant, or a variable (with - an optional delta) or cannot be handled (is a "any") - [3b] If the index can be handled, traverse the evaluation area searching - a path from the index variable to the array length (if the index is - a constant, just examine the array length to see if it has some - relation with this constant) - [3c] Use the results of step [3b] to decide if the check is redundant + Massimiliano Mantione (mass@ximian.com) +Here "abc" stays for "array bounds check", or "array bound checks", or +some combination of the two. + +* Usage + + Simply use the "abcrem" optimization invoking mono. + + To see if bound checks are actually removed, use "mono -v" and + grep for "ARRAY-ACCESS" in the output, there should be a + message for each check that has been removed. + + To trace the algorithm execution, use "-v -v -v", and be + prepared to be totally submersed by debugging messages... + +* Effectiveness + + The abc removal code can now always remove bound checks from + "clean" array scans in loops, and generally anyway there are + clear conditions that already state that the index is "safe". + + To be clearer, and give an idea of what the algorithm can and + cannot do without describing it in detail... keep in mind that + only "redundant" checks cannot be removed. By "redundant", I + mean "already explicitly checked" in the method code. + + Unfortunately, analyzing complex expressions is not so easy + (see below for details), so the capabilities of this "abc + remover" are limited. + + These are simple guidelines: + + - Only expressions of the following kinds are handled: + - constant + - variable [+/- constant] + - Only comparisons between "handled" expressions are understood. + - "switch" statements are not yet handled. + + This means that code like this will be handled well: + + for (int i = 0; i < a.Length; i++) { + a [i] = ..... + } + + The "i" variable could be declared out of the "for", the "for" + could be a "while", and maybe even implemented with "goto", + the array could be scanned in reverse order, and everything + would still work. + + I could have a temporary variable storing the array length, + and check on it inside the loop, and the abc removal would + still occurr, like this: + + int i = 0; + int l = a.Length; + while ( i < l ) { + a [i] ...... + } + + or this: + + int l = a.Length; + for (int i = l; i > 0; i--) { + a [i] = ..... + } + + The following two examples would work: + + for (int i = 0; i < (a.Length -1); i++) ..... + for (int i = 0; i < a.Length; i += 2) ..... + + But just something like this + + int delta = -1; + for (int i = 0; i < (a.Length + delta); i++) ..... + + or like this + + int delta = +2; + for (int i = 0; i < a.Length; i += delta) ..... + + would not work, the check would stay there. (unless + a combination of cfold, consprop and copyprop is used, too, + which would make the constant value of "delta" explicit). + + Just to make you understand how things are tricky... this would work! + + int limit = a.Length - 1; + for (int i = 0; i < limit; i++) { + a [i] = ..... + } + + A detailed explanation of the reason why things are done like + this is given below. + +* The Algorithm + + This array bound check removal (abc removal) algorithm is + based on symbolic execution concepts. I handle the removal + like an "unreachable code elimination" (and in fact the + optimization could be extended to remove also other + unreachable sections of code, due to branches that "always go + the same way"). + + In symbolic execution, variables do not have actual (numeric) + values, but instead symbolic expressions (like "a", or "x+y"). + Also, branch conditions are handled like symbolic conditions + (like "ivars + []->def" should contain exactly those store + instructions, and the "get_variable_value_from_ssa_store" + function extracts the variable value from there. + + On the other hand, the conditions under which each basic block + is executed are not made fully explicit. + + However, it is not difficult to make them so. + + Each BB that has more than one exit BB, in practice must end + either with a conditional branch instruction or with a switch + instruction. + + In the first case, the BB has exactly two exit BBs, and their + execution conditions are easy to get from the condition of the + branch (see the "get_relation_from_branch_instruction" + function, and especially the end of "analyze_block" in + abcremoval.c. + + If there is a switch, the jump condition of every exit BB is + the equality of the switch argument with the particular index + associated with its case (but the current implementation does + not handle switch statements yet). + + These individual conditions are in practice associated to each + arc that connects BBs in the CFG (with the simple case that + unconditional branches have a "TRUE" condition, because they + always happen). + + So, for each BB, its *proper* entry condition is the union of + all the conditions associated to arcs that enter the BB. The + "union" is like a logical "or", in the sense that either of + the condition could be true, they are not necessarily all + true. This means that if I can enter a BB in two ways, and in + one case I know that "x>0", and in the other that "x==0", + actually in the BB I know that "x>=0", which is a weaker + condition (the union of the two). + + Also, the *complete* entry condition for a BB is the + "intersection" of all the entry conditions of its + dominators. This is true because each dominator is the only + way to reach the BB, so the entry condition of each dominator + must be true if the control flow reached the BB. This + translates to the logical "and" of all the "proper" conditions + of the BBs met walking up in the dominator tree. So, if one + says "x>0", and another "x==1", then I know that "x==1", which + is a stronger condition (the intersection of the two). + + Note that, if the two conditions were "x>0" and "x==0", then + the block would be unreachable (the intersection is empty), + because some branch is impossible. + + Another observation is that, inside each BB, every variable is + subject to the complete entry condition of that very same BB, + and not the one in which it is defined (with the "complete + entry condition" being the thing I defined before, sorry if + these terms "proper" and "complete" are strange, I found + nothing better). + + This happens because the branch conditions are related to the + control flow. I can define "i=a", and if I am in a BB where + "a>0", then "i>0", but not otherwise. + + So, intuitively, if the available conditions say "i>=0", and i + is used as an index in an array access, then the lower bound + check can be omitted. If the condition also says + "(i>=0)&&(i=0" and "index < + array.length") + + If the system is valid for *each possible* variable value, then the goal + functions are always true, and the abc can be removed. + + All this discussion is useful to give a precise specification + to the problem we are trying to solve. + + The trouble is that, in the general case, the resulting system + of equations is like a predicate in first order logic, which + is semi-decidable, and its general solution is anyway too + complex to be attempted in a JIT compiler (which should not + contain a full fledged theorem prover). + + Therefore, we must cut some corner. + + There is also another big problem, which is caused by + "recursive" symbolic definitions. These definition can (and + generally do) happen every time there is a loop. For instance, + in the following piece of code: + + for ( int i = 0; i < array.length; i++ ) { + Console.WriteLine( "array [i] = " + array [i] ); + } + + one of the definitions of i is a PHI that can be either 0 or + "i + 1". + + Now, we know that mathematically "i = i + 1" does not make + sense, and in fact symbolic values are not "equations", they + are "symbolic definitions". + + The actual symbolic value of i is a generic "n", where "n" is + the number of iterations of the loop, but this is terrible to + handle (and in more complex examples the symbolic value of i + simply cannot be written, because i is calculated in an + iterative way). + + However, the definition "i = i + 1" tells us something about + i: it tells us that i "grows". So (from the PHI definition) we + know that i is either 0, or "grows". This is enough to tell + that "i>=0", which is what we want! It is important to note + that recursive definitions can only occurr inside PHI + definitions, because actually a variable cannot be defined + *only* in terms of itself! + + At this point, I can explain which corners I want to cut to + make the problem solvable. It will not remove all the abc that + could theoretically be removed, but at least it will work. + + The easiest way to cut corners is to only handle expressions + which are "reasonably simple", and ignore the rest. + + Keep in mind that ignoring an expression is not harmful in + itself. The algorithm will be simply "less powerful", because + it will ignore conditions that could have caused to the + removal of an abc, but will not remove checks "by mistake" (so + the resulting code will be in any case correct). + + The expressions we handle are the following (all of integer + type): + + - constant + - variable + - variable + constant + - constant + variable + - variable - constant + + And, of course, PHI definitions. + + Any other expression causes the introduction of an "any" value + in the evaluation, which makes all values that depend from it + unknown as well. + + We will call these kind of definitions "summarizable" + definitions. + + In a first attempt, we can consider only branch conditions + that have the simplest possible form (the comparison of two + summarizable expressions). + + We can also simplify the effect of variable definitions, + keeping only what is relevant to know: their value range with + respect to zero and with respect to the length of the array we + are currently handling. + + One particular note on PHI functions: they work (obviously) + like the logical "or" of their definitions, and therefore are + equivalent to the "logical or" of the summarization of their + definitions. + + About recursive definitions (which, believe me, are the worst + thing in all this mess), we handle only "monotonic" ones. That + is, we try to understand if the recursive definition (which, + as we said above, must happen because of a loop) always + "grows" or "gets smaller". In all other cases, we decide we + cannot handle it. + + One critical thing, once we have defined all these data + structures, is how the evaluation is actually performed. + + In a first attempt I coded a "brute force" approach, which for + each BB tried to examine all possible conditions between all + variables, filling a sort of "evaluation matrix". The problem + was that the complexity of this evaluation was quadratic (or + worse) on the number of variables, and that many variables + were examined even if they were not involved in any array + access. + + Following the ABCD paper: + + http://citeseer.ist.psu.edu/bodik00abcd.html + + I rewrote the algorithm in a more "sparse" way. + + Now, the main data structure is a graph of relations between + variables, and each attempt to remove a check performs a + traversal of the graph, looking for a path from the index to + the array length that satisfies the properties "index >= 0" + and "index < length". If such a path is found, the check is + removed. It is true that in theory *each* traversal has a + complexity which is exponential on the number of variables, + but in practice the graph is not very connected, so the + traversal terminates quickly. + + + Then, the algorithm to optimize one method looks like this: + + [1] Preparation: + + [1a] Build the SSA representation. + + [1b] Prepare the evaluation graph (empty) + + [1b] Summarize each variable definition, and put + the resulting relations in the evaluation + graph + + [2] Analyze each BB, starting from the entry point and + following the dominator tree: + + [2a] Summarize its entry condition, and put the resulting relations + in the evaluation graph (this is the reason + why the BBs are examined following the + dominator tree, so that conditions are added + to the graph in a "cumulative" way) + + [2b] Scan the BB instructions, and for each array + access perform step [3] + + [2c] Process children BBs following the dominator + tree (step [2]) + + [2d] Remove from the evaluation area the conditions added at step [2a] + (so that backtracking along the tree the area + is properly cleared) + + [3] Attempt the removal: + + [3a] Summarize the index expression, to see if we can handle it; there + are three cases: the index is either a + constant, or a variable (with an optional + delta) or cannot be handled (is a "any") + + [3b] If the index can be handled, traverse the evaluation area searching + a path from the index variable to the array + length (if the index is a constant, just + examine the array length to see if it has + some relation with this constant) + + [3c] Use the results of step [3b] to decide if the check is redundant +