2010-01-18 Zoltan Varga <vargaz@gmail.com>

[mono.git] / docs / abc-removal.txt
diff --git a/docs/abc-removal.txt b/docs/abc-removal.txt

index 8c10579eaceefa09824ad445bc220a343d8a7c79..b44089c0de9ac4d5a5269fd91a6a4c8301f9c950 100755 (executable)
--- a/docs/abc-removal.txt
+++ b/docs/abc-removal.txt
@@ -1,302 +1,373 @@
  
-Here "abc" stays for "array bounds check", or "array bound checks", or
-some combination of the two...
-
-------------------------------------------------------------------------------
-USAGE
-
-Simply use the "abcrem" optimization invoking mono.
-
-To see if bound checks are actually removed, use "mono -v" and grep for
-"ARRAY-ACCESS" in the output, there should be a message for each check
-that has been removed.
-
-To trace the algorithm execution, use "-v -v -v", and be prepared to be
-totally submersed by debugging messages...
-
-------------------------------------------------------------------------------
-EFFECTIVENESS
-
-The abc removal code can now always remove bound checks from "clean"
-array scans in loops, and generally anyway there are clear conditions that
-already state that the index is "safe".
+               Arrays Bounds Check Elimination (ABC)
+                        in the Mono Runtime
  
-To be clearer, and give an idea of what the algorithm can and cannot do
-without describing it in detail... keep in mind that only "redunant" checks
-cannot be removed. By "redundant", I mean "already explicitly checked" in
-the method code.
-
-Unfortunately, analyzing complex expressions is not so easy (see below for
-details), so the capabilities of this "abc remover" are limited.
-
-These are simple guidelines:
-- Only expressions of the following kinds are handled:
-  - constant
-  - variable [+/- constant]
-- Only comparisons between "handled" expressions are understood.
-- "switch" statements are not yet handled.
-
-This means that code like this will be handled well:
-
-for (int i = 0; i < a.Length; i++) {
-       a [i] = .....
-}
-
-The "i" variable could be declared out of the "for", the "for" could be a
-"while", and maybe even implemented with "goto", the array could be scanned
-in reverse order, and everything would still work.
-I could have a temporary variable storing the array length, and check on
-it inside the loop, and the abc removal would still occurr, like this:
-
-int i = 0;
-int l = a.Length;
-while ( i < l ) {
-       a [i] ......
-}
-
-or this:
-
-int l = a.Length;
-for (int i = l; i > 0; i--) {
-       a [i] = .....
-}
-
-The following two examples would work:
-
-for (int i = 0; i < (a.Length -1); i++) .....
-for (int i = 0; i < a.Length; i += 2) .....
-
-But just something like this
-
-int delta = -1;
-for (int i = 0; i < (a.Length + delta); i++) .....
-
-or like this
-
-int delta = +2;
-for (int i = 0; i < a.Length; i += delta) .....
-
-would not work, the check would stay there, sorry :-(
-(unless a combination of cfold, consprop and copyprop is used, too, which
-would make the constant value of "delta" explicit).
-
-Just to make you understand how things are tricky... this would work!
-
-int limit = a.Length - 1;
-for (int i = 0; i < limit; i++) {
-       a [i] = .....
-}
-
-
-A detailed explanation of the reason why things are done like this is given
-below...
-
-------------------------------------------------------------------------------
-THE ALGORITHM
-
-This array bound check removal (abc removal) algorithm is based on
-symbolic execution concepts. I handle the removal like an "unreachable
-code elimination" (and in fact the optimization could be extended to remove
-also other unreachable sections of code, due to branches that "always go the
-same way").
-
-In symbolic execution, variables do not have actual (numeric) values, but
-instead symbolic expressions (like "a", or "x+y").
-Also, branch conditions are handled like symbolic conditions (like "i<k"),
-which state relations between variable values.
-
-The SSA representation inside mini is somewhat close to a symbolic
-representation of the execution of the compiled method.
-Particularly, the representation of variable values is exactly a symbolic
-one. It is enough to find all CEE_STIND_* instructions which store to a
-local variable, and their second argument is exactly the variable value.
-Actually, "cfg->vars [<variable-index>]->def" should contain exactly
-those store instructions, and the "get_variable_value_from_ssa_store"
-function extracts the variable value from there.
-
-On the other hand, the conditions under which each basic block is
-executed are not made fully explicit.
-
-However, it is not difficult to make them so.
-Each BB that has more than one exit BB, in practice must end either with
-a conditional branch instruction or with a switch instruction.
-In the first case, the BB has exactly two exit BBs, and their execution
-conditions are easy to get from the condition of the branch (see the
-"get_relation_from_branch_instruction" function, and expecially the end of
-"analyze_block" in abcremoval.c.
-If there is a switch, the jump condition of every exit BB is the equality
-of the switch argument with the particular index associated with its case
-(but the current implementation does not handle switch statements yet).
-
-These individual conditions are in practice associated to each arc that
-connects BBs in the CFG (with the simple case that unconditional branches
-have a "TRUE" condition, because they always happen).
-
-So, for each BB, its *proper* entry condition is the union of all the
-conditions associated to arcs that enter the BB. The "union" is like a
-logical "or", in the sense that either of the condition could be true,
-they are not necessarily all true. This means that if I can enter a BB
-in two ways, and in one case I know that "x>0", and in the other that
-"x==0", actually in the BB I know that "x>=0", which is a weaker
-condition (the union of the two).
-
-Also, the *complete* entry condition for a BB is the "intersection" of all
-the entry conditions of its dominators. This is true because each dominator
-is the only way to reach the BB, so the entry condition of each dominator
-must be true if the control flow reached the BB. This translates to the
-logical "and" of all the "proper" conditions of the BBs met walking up in the
-dominator tree. So, if one says "x>0", and another "x==1", then I know
-that "x==1", which is a stronger condition (the intersection of the two).
-Note that, if the two conditions were "x>0" and "x==0", then the block would
-be unreachable (the intersection is empty), because some branch is impossible.
-
-Another observation is that, inside each BB, every variable is subject to the
-complete entry condition of that very same BB, and not the one in which it is
-defined (with the "complete entry condition" being the thing I defined before,
-sorry if these terms "proper" and "complete" are strange, I found nothing
-better).
-This happens because the branch conditions are related to the control flow.
-I can define "i=a", and if I am in a BB where "a>0", then "i>0", but not
-otherwise.
-
-So, intuitively, if the available conditions say "i>=0", and i is used as an
-index in an array access, then the lower bound check can be omitted.
-If the condition also says "(i>=0)&&(i<array.length)", the abc removal can
-occur.
-
-So, a complete solution to the problem of abc removal would be the following:
-for each array access, build a system of equations containing:
-[1] all the symbolic variable definitions
-[2] the complete entry condition of the BB in which the array access occurs
-[3] the two "goal functions" ("index >=0" and "index < array.length")
-If the system is valid for *each possible* variable value, then the goal
-functions are always true, and the abc can be removed.
-
-All this discussion is useful to give a precise specification to the problem
-we are trying to solve.
-The trouble is that, in the general case, the resulting system of equations
-is like a predicate in first order logic, which is semi-decidable, and its
-general solution is anyway too complex to be attempted in a JIT compiler
-(which should not contain a full fledged theorem prover).
-
-Therefore, we must cut some corner.
-
-
-By the way, there is also another big problem, which is caused by "recursive"
-symbolic definitions. These definition can (and generally do) happen every
-time there is a loop. For instance, in the following piece of code
-
-for ( int i = 0; i < array.length; i++ ) {
-       Console.WriteLine( "array [i] = " + array [i] );
-}
-
-one of the definitions of i is a PHI that can be either 0 or "i + 1".
-
-Now, we know that mathematically "i = i + 1" does not make sense, and in
-fact symbolic values are not "equations", they are "symbolic definitions".
-
-The actual symbolic value of i is a generic "n", where "n" is the number of
-iterations of the loop, but this is terrible to handle (and in more complex
-examples the symbolic value of i simply cannot be written, because i is
-calculated in an iterative way).
-
-However, the definition "i = i + 1" tells us something about i: it tells us
-that i "grows". So (from the PHI definition) we know that i is either 0, or
-"grows". This is enough to tell that "i>=0", which is what we want!
-It is important to note that recursive definitions can only occurr inside
-PHI definitions, because actually a variable cannot be defined *only* in terms
-of itself!
-
-
-At this point, I can explain which corners I want to cut to make the
-problem solvable. It will not remove all the abc that could theoretically
-be removed, but at least it will work.
-
-The easiest way to cut corners is to only handle expressions which are
-"reasonably simple", and ignore the rest.
-Keep in mind that ignoring an expression is not harmful in itself.
-The algorithm will be simply "less powerful", because it will ignore
-conditions that could have caused to the removal of an abc, but will
-not remove checks "by mistake" (so the resulting code will be in any case
-correct).
-
-The expressions we handle are the following (all of integer type):
-- constant
-- variable
-- variable + constant
-- constant + variable
-- variable - constant
-
-And, of course, PHI definitions.
-Any other expression causes the introduction of an "any" value in the
-evaluation, which makes all values that depend from it unknown as well.
-
-We will call these kind of definitions "summarizable" definitions.
-
-In a first attempt, we can consider only branch conditions that have the
-simplest possible form (the comparison of two summarizable expressions).
-
-We can also simplify the effect of variable definitions, keeping only
-what is relevant to know: their value range with respect to zero and with
-respect to the length of the array we are currently handling.
-
-One particular note on PHI functions: they work (obviously) like the logical
-"or" of their definitions, and therefore are equivalent to the "logical or"
-of the summarization of their definitions.
-
-About recursive definitions (which, believe me, are the worst thing in all
-this mess), we handle only "monotonic" ones. That is, we try to understand
-if the recursive definition (which, as we said above, must happen because
-of a loop) always "grows" or "gets smaller". In all other cases, we decide
-we cannot handle it.
-
-
-One critical thing, once we have defined all these data structures, is how
-the evaluation is actually performed.
-
-In a first attempt I coded a "brute force" approach, which for each BB
-tried to examine all possible conditions between all variables, filling
-a sort of "evaluation matrix". The problem was that the complexity of this
-evaluation was quadratic (or worse) on the number of variables, and that
-many wariables were examined even if they were not involved in any array
-access.
-
-Following the ABCD paper (http://citeseer.ist.psu.edu/bodik00abcd.html),
-I rewrote the algorithm in a more "sparse" way.
-Now, the main data structure is a graph of relations between variables, and
-each attempt to remove a check performs a traversal of the graph, looking
-for a path from the index to the array length that satisfies the properties
-"index >= 0" and "index < length". If such a path is found, the check is
-removed. It is true that in theory *each* traversal has a complexity which
-is exponential on the number of variables, but in practice the graph is not
-very connected, so the traversal terminates quickly.
-
-
-Then, the algorithm to optimize one method looks like this:
-
-[1] Preparation:
-    [1a] Build the SSA representation.
-    [1b] Prepare the evaluation graph (empty)
-    [1b] Summarize each varible definition, and put the resulting relations
-         in the evaluation graph
-[2] Analyze each BB, starting from the entry point and following the
-    dominator tree:
-    [2a] Summarize its entry condition, and put the resulting relations
-         in the evaluation graph (this is the reason why the BBs are examined
-        following the dominator tree, so that conditions are added to the
-        graph in a "cumulative" way)
-    [2b] Scan the BB instructions, and for each array access perform step [3]
-    [2c] Process children BBs following the dominator tree (step [2])
-    [2d] Remove from the evaluation area the conditions added at step [2a]
-         (so that backtracking along the tree the area is properly cleared)
-[3] Attempt the removal:
-    [3a] Summarize the index expression, to see if we can handle it; there
-         are three cases: the index is either a constant, or a variable (with
-        an optional delta) or cannot be handled (is a "any")
-    [3b] If the index can be handled, traverse the evaluation area searching
-         a path from the index variable to the array length (if the index is
-        a constant, just examine the array length to see if it has some
-        relation with this constant)
-    [3c] Use the results of step [3b] to decide if the check is redundant
+              Massimiliano Mantione (mass@ximian.com)
  
+Here "abc" stays for "array bounds check", or "array bound checks", or
+some combination of the two.
+
+* Usage
+
+       Simply use the "abcrem" optimization invoking mono.
+
+       To see if bound checks are actually removed, use "mono -v" and
+       grep for "ARRAY-ACCESS" in the output, there should be a
+       message for each check that has been removed.
+       
+       To trace the algorithm execution, use "-v -v -v", and be
+       prepared to be totally submersed by debugging messages...
+
+* Effectiveness
+
+       The abc removal code can now always remove bound checks from
+       "clean" array scans in loops, and generally anyway there are
+       clear conditions that already state that the index is "safe".
+       
+       To be clearer, and give an idea of what the algorithm can and
+       cannot do without describing it in detail... keep in mind that
+       only "redunant" checks cannot be removed. By "redundant", I
+       mean "already explicitly checked" in the method code.
+       
+       Unfortunately, analyzing complex expressions is not so easy
+       (see below for details), so the capabilities of this "abc
+       remover" are limited.
+       
+       These are simple guidelines:
+
+       - Only expressions of the following kinds are handled:
+         - constant
+         - variable [+/- constant]
+       - Only comparisons between "handled" expressions are understood.
+       - "switch" statements are not yet handled.
+       
+       This means that code like this will be handled well:
+       
+               for (int i = 0; i < a.Length; i++) {
+                       a [i] = .....
+               }
+               
+       The "i" variable could be declared out of the "for", the "for"
+       could be a "while", and maybe even implemented with "goto",
+       the array could be scanned in reverse order, and everything
+       would still work.
+
+       I could have a temporary variable storing the array length,
+       and check on it inside the loop, and the abc removal would
+       still occurr, like this:
+       
+               int i = 0;
+               int l = a.Length;
+               while ( i < l ) {
+                       a [i] ......
+               }
+               
+       or this:
+       
+               int l = a.Length;
+               for (int i = l; i > 0; i--) {
+                       a [i] = .....
+               }
+               
+       The following two examples would work:
+       
+               for (int i = 0; i < (a.Length -1); i++) .....
+               for (int i = 0; i < a.Length; i += 2) .....
+               
+       But just something like this
+       
+               int delta = -1;
+               for (int i = 0; i < (a.Length + delta); i++) .....
+               
+       or like this
+       
+               int delta = +2;
+               for (int i = 0; i < a.Length; i += delta) .....
+               
+       would not work, the check would stay there. (unless
+       a combination of cfold, consprop and copyprop is used, too,
+       which would make the constant value of "delta" explicit).
+       
+       Just to make you understand how things are tricky... this would work!
+       
+               int limit = a.Length - 1;
+               for (int i = 0; i < limit; i++) {
+                       a [i] = .....
+               }
+               
+       A detailed explanation of the reason why things are done like
+       this is given below.
+       
+* The Algorithm
+
+       This array bound check removal (abc removal) algorithm is
+       based on symbolic execution concepts. I handle the removal
+       like an "unreachable code elimination" (and in fact the
+       optimization could be extended to remove also other
+       unreachable sections of code, due to branches that "always go
+       the same way").
+       
+       In symbolic execution, variables do not have actual (numeric)
+       values, but instead symbolic expressions (like "a", or "x+y").
+       Also, branch conditions are handled like symbolic conditions
+       (like "i<k"), which state relations between variable values.
+       
+       The SSA representation inside mini is somewhat close to a
+       symbolic representation of the execution of the compiled
+       method.
+       
+       Particularly, the representation of variable values is exactly
+       a symbolic one. It is enough to find all CEE_STIND_*
+       instructions which store to a local variable, and their second
+       argument is exactly the variable value.  Actually, "cfg->vars
+       [<variable-index>]->def" should contain exactly those store
+       instructions, and the "get_variable_value_from_ssa_store"
+       function extracts the variable value from there.
+       
+       On the other hand, the conditions under which each basic block
+       is executed are not made fully explicit.
+       
+       However, it is not difficult to make them so.
+
+       Each BB that has more than one exit BB, in practice must end
+       either with a conditional branch instruction or with a switch
+       instruction.
+       
+       In the first case, the BB has exactly two exit BBs, and their
+       execution conditions are easy to get from the condition of the
+       branch (see the "get_relation_from_branch_instruction"
+       function, and expecially the end of "analyze_block" in
+       abcremoval.c.
+
+       If there is a switch, the jump condition of every exit BB is
+       the equality of the switch argument with the particular index
+       associated with its case (but the current implementation does
+       not handle switch statements yet).
+       
+       These individual conditions are in practice associated to each
+       arc that connects BBs in the CFG (with the simple case that
+       unconditional branches have a "TRUE" condition, because they
+       always happen).
+       
+       So, for each BB, its *proper* entry condition is the union of
+       all the conditions associated to arcs that enter the BB. The
+       "union" is like a logical "or", in the sense that either of
+       the condition could be true, they are not necessarily all
+       true. This means that if I can enter a BB in two ways, and in
+       one case I know that "x>0", and in the other that "x==0",
+       actually in the BB I know that "x>=0", which is a weaker
+       condition (the union of the two).
+       
+       Also, the *complete* entry condition for a BB is the
+       "intersection" of all the entry conditions of its
+       dominators. This is true because each dominator is the only
+       way to reach the BB, so the entry condition of each dominator
+       must be true if the control flow reached the BB. This
+       translates to the logical "and" of all the "proper" conditions
+       of the BBs met walking up in the dominator tree. So, if one
+       says "x>0", and another "x==1", then I know that "x==1", which
+       is a stronger condition (the intersection of the two).
+
+       Note that, if the two conditions were "x>0" and "x==0", then
+       the block would be unreachable (the intersection is empty),
+       because some branch is impossible.
+       
+       Another observation is that, inside each BB, every variable is
+       subject to the complete entry condition of that very same BB,
+       and not the one in which it is defined (with the "complete
+       entry condition" being the thing I defined before, sorry if
+       these terms "proper" and "complete" are strange, I found
+       nothing better).
+
+       This happens because the branch conditions are related to the
+       control flow.  I can define "i=a", and if I am in a BB where
+       "a>0", then "i>0", but not otherwise.
+       
+       So, intuitively, if the available conditions say "i>=0", and i
+       is used as an index in an array access, then the lower bound
+       check can be omitted.  If the condition also says
+       "(i>=0)&&(i<array.length)", the abc removal can occur.
+       
+       So, a complete solution to the problem of abc removal would be
+       the following: for each array access, build a system of
+       equations containing:
+
+               [1] all the symbolic variable definitions
+       
+               [2] the complete entry condition of the BB in which
+               the array access occurs
+               
+               [3] the two "goal functions" ("index >=0" and "index <
+               array.length")
+               
+       If the system is valid for *each possible* variable value, then the goal
+       functions are always true, and the abc can be removed.
+       
+       All this discussion is useful to give a precise specification
+       to the problem we are trying to solve.
+
+       The trouble is that, in the general case, the resulting system
+       of equations is like a predicate in first order logic, which
+       is semi-decidable, and its general solution is anyway too
+       complex to be attempted in a JIT compiler (which should not
+       contain a full fledged theorem prover).
+       
+       Therefore, we must cut some corner.
+       
+       There is also another big problem, which is caused by
+       "recursive" symbolic definitions. These definition can (and
+       generally do) happen every time there is a loop. For instance,
+       in the following piece of code:
+       
+               for ( int i = 0; i < array.length; i++ ) {
+                       Console.WriteLine( "array [i] = " + array [i] );
+               }
+               
+       one of the definitions of i is a PHI that can be either 0 or
+       "i + 1".
+       
+       Now, we know that mathematically "i = i + 1" does not make
+       sense, and in fact symbolic values are not "equations", they
+       are "symbolic definitions".
+       
+       The actual symbolic value of i is a generic "n", where "n" is
+       the number of iterations of the loop, but this is terrible to
+       handle (and in more complex examples the symbolic value of i
+       simply cannot be written, because i is calculated in an
+       iterative way).
+       
+       However, the definition "i = i + 1" tells us something about
+       i: it tells us that i "grows". So (from the PHI definition) we
+       know that i is either 0, or "grows". This is enough to tell
+       that "i>=0", which is what we want!  It is important to note
+       that recursive definitions can only occurr inside PHI
+       definitions, because actually a variable cannot be defined
+       *only* in terms of itself!
+       
+       At this point, I can explain which corners I want to cut to
+       make the problem solvable. It will not remove all the abc that
+       could theoretically be removed, but at least it will work.
+       
+       The easiest way to cut corners is to only handle expressions
+       which are "reasonably simple", and ignore the rest.
+
+       Keep in mind that ignoring an expression is not harmful in
+       itself.  The algorithm will be simply "less powerful", because
+       it will ignore conditions that could have caused to the
+       removal of an abc, but will not remove checks "by mistake" (so
+       the resulting code will be in any case correct).
+       
+       The expressions we handle are the following (all of integer
+       type):
+
+               - constant
+               - variable
+               - variable + constant
+               - constant + variable
+               - variable - constant
+               
+       And, of course, PHI definitions.
+
+       Any other expression causes the introduction of an "any" value
+       in the evaluation, which makes all values that depend from it
+       unknown as well.
+       
+       We will call these kind of definitions "summarizable"
+       definitions.
+       
+       In a first attempt, we can consider only branch conditions
+       that have the simplest possible form (the comparison of two
+       summarizable expressions).
+       
+       We can also simplify the effect of variable definitions,
+       keeping only what is relevant to know: their value range with
+       respect to zero and with respect to the length of the array we
+       are currently handling.
+       
+       One particular note on PHI functions: they work (obviously)
+       like the logical "or" of their definitions, and therefore are
+       equivalent to the "logical or" of the summarization of their
+       definitions.
+       
+       About recursive definitions (which, believe me, are the worst
+       thing in all this mess), we handle only "monotonic" ones. That
+       is, we try to understand if the recursive definition (which,
+       as we said above, must happen because of a loop) always
+       "grows" or "gets smaller". In all other cases, we decide we
+       cannot handle it.
+       
+       One critical thing, once we have defined all these data
+       structures, is how the evaluation is actually performed.
+       
+       In a first attempt I coded a "brute force" approach, which for
+       each BB tried to examine all possible conditions between all
+       variables, filling a sort of "evaluation matrix". The problem
+       was that the complexity of this evaluation was quadratic (or
+       worse) on the number of variables, and that many wariables
+       were examined even if they were not involved in any array
+       access.
+       
+       Following the ABCD paper:
+
+                 http://citeseer.ist.psu.edu/bodik00abcd.html
+
+       I rewrote the algorithm in a more "sparse" way.
+
+       Now, the main data structure is a graph of relations between
+       variables, and each attempt to remove a check performs a
+       traversal of the graph, looking for a path from the index to
+       the array length that satisfies the properties "index >= 0"
+       and "index < length". If such a path is found, the check is
+       removed. It is true that in theory *each* traversal has a
+       complexity which is exponential on the number of variables,
+       but in practice the graph is not very connected, so the
+       traversal terminates quickly.
+       
+       
+       Then, the algorithm to optimize one method looks like this:
+
+               [1] Preparation:
+
+                   [1a] Build the SSA representation.
+
+                   [1b] Prepare the evaluation graph (empty)
+
+                   [1b] Summarize each varible definition, and put
+                        the resulting relations in the evaluation
+                        graph
+
+               [2] Analyze each BB, starting from the entry point and
+                   following the dominator tree:
+
+                   [2a] Summarize its entry condition, and put the resulting relations
+                        in the evaluation graph (this is the reason
+                        why the BBs are examined following the
+                        dominator tree, so that conditions are added
+                        to the graph in a "cumulative" way)
+
+                   [2b] Scan the BB instructions, and for each array
+                        access perform step [3]
+
+                   [2c] Process children BBs following the dominator
+                        tree (step [2])
+
+                   [2d] Remove from the evaluation area the conditions added at step [2a]
+                        (so that backtracking along the tree the area
+                        is properly cleared)
+
+               [3] Attempt the removal:
+
+                   [3a] Summarize the index expression, to see if we can handle it; there
+                        are three cases: the index is either a
+                        constant, or a variable (with an optional
+                        delta) or cannot be handled (is a "any")
+
+                   [3b] If the index can be handled, traverse the evaluation area searching
+                        a path from the index variable to the array
+                        length (if the index is a constant, just
+                        examine the array length to see if it has
+                        some relation with this constant)
+
+                   [3c] Use the results of step [3b] to decide if the check is redundant
+