Here "abc" stays for "array bounds check", or "array bound checks", or
some combination of the two...

------------------------------------------------------------------------------
USAGE

Simply use the "abcrem" optimization invoking mono.

To see if bound checks are actually removed, use "mono -v" and grep for
"ARRAY-ACCESS" in the output, there should be a message for each check
that has been removed.

To trace the algorithm execution, use "-v -v -v", and be prepared to be
totally submersed by debugging messages...

------------------------------------------------------------------------------
EFFECTIVENESS

The abc removal code can now always remove bound checks from "clean"
array scans in loops, and generally anyway there are clear conditions that
already state that the index is "safe".

To be clearer, and give an idea of what the algorithm can and cannot do
without describing it in detail... keep in mind that only "redunant" checks
cannot be removed. By "redundant", I mean "already explicitly checked" in
the method code.

Unfortunately, analyzing complex expressions is not so easy (see below for
details), so the capabilities of this "abc remover" are limited.

These are simple guidelines:
- Only simple comparisons between variables are understood.
- Only increment-decrement operations are handled.
- "switch" statements are not yet handled.

This means that code like this will be handled well:

for (int i = 0; i < a.Length; i++) {
	a [i] = .....
}

The "i" variable could be declared out of the "for", the "for" could be a
"while", and maybe even implemented with "goto", the array could be scanned
in reverse order, and everything would still work.
I could have a temporary variable storing the array length, and check on
it inside the loop, and the abc removal would still occurr, like this:

int i = 0;
int l = a.Length;
while ( i < l ) {
	a [i] ......
}

or this:

int l = a.Length;
for (int i = l; i > 0; i--) {
	a [i] = .....
}


But just something like this

for (int i = 0; i < (a.Length -1); i++) .....

or like this

for (int i = 0; i < a.Length; i += 2) .....

would not work, the check would stay there, sorry :-(

Just to make you understand how things are tricky... this would work!

int limit = a.Length - 1;
for (int i = 0; i < limit; i++) {
	a [i] = .....
}


A detailed explanation of the reason why things are done like this is given
below...

All in all, the compiling phase is generally non so fast, and the abc removed
are so few, that there is hardly a performance gain in using the "abcrem"
flag in the compiling options for short programs like the ones I tried.

Anyway, various bound checks *are* removed, and this is good :-)

------------------------------------------------------------------------------
THE ALGORITHM

Unfortunately, this time google has not been my friend, and I did not
find an algorithm I could reasonably implement in the time frame I had,
so I had to invent one. The two cleasses of algorithms I found were either
simply an analisys of variable ranges (which is not very different from
what I'm doing), or they were based on type systems richer than the .NET
one (particularly, they required integer types to have an explicit range,
which could be related to array ranges). By the way, it seems that the
gcj people do not have an array bound check removal optimization (they
just have a compiler flag that unconditionally removes all the checks,
but this is not what we want).

This array bound check removal (abc removal) algorithm is based on
symbolic execution concepts. I handle the removal like an "unreachable
code elimination" (and in fact the optimization could be extended to remove
also other unreachable sections of code, due to branches that "always go the
same way").

In symbolic execution, variables do not have actual (numeric) values, but
instead symbolic expressions (like "a", or "x+y").
Also, branch conditions are handled like symbolic conditions (like "i<k"),
which state relations between variable values.

The SSA representation inside mini is somewhat close to a symbolic
representation of the execution of the compiled method.
Particularly, the representation of variable values is exactly a symbolic
one. It is enough to find all CEE_STIND_* instructions which store to a
local variable, and their second argument is exactly the variable value.
Actually, "cfg->vars [<variable-index>]->def" should contain exactly
those store instructions, and the "get_variable_value_from_ssa_store"
function extracts the variable value from there.

On the other hand, the conditions under which each basic block is
executed are not made fully explicit.

However, it is not difficult to make them so.
Each BB that has more than one exit BB, in practice must end either with
a conditional branch instruction or with a switch instruction.
In the first case, the BB has exactly two exit BBs, and their execution
conditions are easy to get from the condition of the branch (see the
"get_relation_from_branch_instruction" function, and expecially the end of
"analyze_block" in abcremoval.c.
If there is a switch, the jump condition of every exit BB is the equality
of the switch argument with the particular index associated with its case
(but the current implementation does not handle switch statements yet).

These individual conditions are in practice associated to each arc that
connects BBs in the CFG (with the simple case that unconditional branches
have a "TRUE" condition, because they always happen).

So, for each BB, its *proper* entry condition is the union of all the
conditions associated to arcs that enter the BB. The "union" is like a
logical "or", in the sense that either of the condition could be true,
they are not necessarily all true. This means that if I can enter a BB
in two ways, and in one case I know that "x>0", and in the other that
"x==0", actually in the BB I know that "x>=0", which is a weaker
condition (the union of the two).

Also, the *complete* entry condition for a BB is the "intersection" of all
the entry conditions of its dominators. This is true because each dominator
is the only way to reach the BB, so the entry condition of each dominator
must be true if the control flow reached the BB. This translates to the
logical "and" of all the "proper" conditions of the BBs met walking up in the
dominator tree. So, if one says "x>0", and another "x==1", then I know
that "x==1", which is a stronger condition (the intersection of the two).
Note that, if the two conditions were "x>0" and "x==0", then the block would
be unreachable (the intersection is empty), because some branch is impossible.

Another observation is that, inside each BB, every variable is subject to the
complete entry condition of that very same BB, and not the one in which it is
defined (with the "complete entry condition" being the thing I defined before,
sorry if these terms "proper" and "complete" are strange, I found nothing
better).
This happens because the branch conditions are related to the control flow.
I can define "i=a", and if I am in a BB where "a>0", then "i>0", but not
otherwise.

So, intuitively, if the available conditions say "i>=0", and i is used as an
index in an array access, then the lower bound check can be omitted.
If the condition also says "(i>=0)&&(i<array.length)", the abc removal can
occur.

So, a complete solution to the problem of abc removal would be the following:
for each array access, build a system of equations containing:
[1] all the symbolic variable definitions
[2] the complete entry condition of the BB in which the array access occurs
[3] the two "goal functions" ("index >=0" and "index < array.length")
If the system is valid for *each possible* variable value, then the goal
functions are always true, and the abc can be removed.

All this discussion is useful to give a precise specification to the problem
we are trying to solve.
The trouble is that, in the general case, the resulting system of equations
is like a predicate in first order logic, which is semi-decidable, and its
general solution is anyway too complex to be attempted in a JIT compiler
(which should not contain a full fledged theorem prover).

Therefore, we must cut some corner.


By the way, there is also another big problem, which is caused by "recursive"
symbolic definitions. These definition can (and generally do) happen every
time there is a loop. For instance, in the following piece of code

for ( int i = 0; i < array.length; i++ ) {
	Console.WriteLine( "array [i] = " + array [i] );
}

one of the definitions of i is a PHI that can be either 0 or "i + 1".

Now, we know that mathematically "i = i + 1" does not make sense, and in
fact symbolic values are not "equations", they are "symbolic definitions".

The symbolic value of i is a generic "n", where "n" is the number of
iterations of the loop, but this is terrible to handle (and in more complex
examples the symbolic definition of i simply cannot be written, because i is
calculated in an iterative way).

However, the definition "i = i + 1" tells us something about i: it tells us
that i "grows". So (from the PHI definition) we know that i is either 0, or
"grows". This is enough to tell that "i>=0", which is what we want!
It is important to note that recursive definitions can only occurr inside
PHI definitions, because actually a variable cannot be defined *only* in terms
of itself!


At this point, I can explain which corners I want to cut to make the
problem solvable. It will not remove all the abc that could theoretically
be removed, but at least it will work.

The easiest way to cut corners is to only handle expressions which are
"reasonably simple", and ignore the rest.
Keep in mind that ignoring an expression is not harmful in itself.
The algorithm will be simply "less powerful", because it will ignore
conditions that could have caused to the removal of an abc, but will
not remove checks "by mistake" (so the resulting code will be in any case
correct).

In a first attempt, we can consider only conditions that have the simple
possible form, which is "(compare simpleExpression simpleExpression)",
where "simpleExpression" is either "(ldind.* local[*])" or
"(ldlen (ldind.ref local[*]))" (or a constant, of course).

We can also "summarize" variable definitions, keeping only what is relevant
to know: if they are greater or smaller than zero or other variables.

One particular note on PHI functions: they work (obviously) like the logical
"or" of their definitions, and therefore are equivalent to the "logical or"
of the summarization of their definitions.

About recursive definitions (which, believe me, are the worst thing in all
this mess), we handle only "monotonic" ones. That is, we try to understand
if the recursive definition (which, as we said above, must happen because
of a loop) always "grows" or "gets smaller". In all other cases, we decide
we cannot handle it.

So, the algorithm to optimize one method could be something like:

[1] Build the SSA representation.
[2] Analyze each BB. Particularly, do the following:
    [2a] summarize its exit conditions
    [2b] summarize all the variable definitions occurring in the BB
    [2c] check if it contains array accesses (only as optimization, so that
         step 3 will be performed only when necessary)
[3] Perform the removal. For each BB that contains array accesses, do:
    [3a] build an evaluation area where for each variable we keep track
         of its relations with zero and with each other variable.
    [3b] populate the evaluation area with the initial data (the variable
         definitions and the complete entry condition of the BB)
    [3c] propagate the relations between variables (so that from "a<b" and
         "b<c" we know that "a<c") in the evaluation area
    [3d] for each array access in the BB, use the evaluation area to "match
         conditions", to see if the goal functions are satisfied


------------------------------------------------------------------------------
LOW LEVEL DESIGN DECISIONS

One of the problems I had to solve was the representation of the relations
between symbolic values.
The "canonical" solution would be to have a complex, potentially recursive
data structure, somewhat similar to MonoInst, and use it all the time (maybe
MonoInst could be used "as is").
The problem is that, at this point, the software would have to "play" with
these structures, for instance to deduce "b < a-1" from "a > b+1", because we
are interested in what value b has, and not a. And maybe the software should
be able to do this without modifying the original condition, because it was
used elsewhere (or could be reused).
Then, the software should also be able to apply logical relations to
conditions (like "(a>0)||(c<1)"), and manipulate them, too.

In my opinion, an optimizer written in this way risks to be too complex, and
its performance not acceptable in a JIT compiler.
Therefore, I decided to represent values in a "summarized" way.
For instance, "x = a + 1" becomes "x > a" (and, by the way, "x = x + 1"
would look like "x > x", which means that "x grows").
So, each variable value is simply a "relation" of the variable with other
variables or constants. Anything more complex is ignored.
Relations are represented with flags (EQ, LT and GT), so that it is easy
to combine them logically (it is enough to perform the bitwise operations
on the flags). The condition (EQ|LT|GT) means we know nothing special, any
value is possible. The empty condition would mean an unreachable statement
(because no value is acceptable).

There is still the problem of identifying variables, and where to store all
these relations. After some thinking, I decided that a "brute force" approach
is the easier, and probably also the fastest. In practice, I keep in memory
an array of "variable relations", where for each variable I record its
relation with the constants 0 and 1 and an array of its relations with all
other variables. The evaluation area, therefore, looks like a square matrix
as large as the number of variables. I *know* that the matrix is rather sparse
(in the sense that only few of its cells are significant), but I decided that
this data structure was the simplest and more efficient anyway. After all,
each cell takes just one byte, and any other data structure must be some
kind of associative array (like a g_hash_table). Such a data structue is
difficult to use with a MonoMemPool, and in the end is slower than a plain
array, and has the storage overhead of all the pointers... in the end, maybe
the full matrix is not a waste at all.
<note>
After a few benchmarks, I clearly see that the matrix works well if the
variables are not too much, otherwise it gets "too sparse", and much time
is wasted in the propagation phase. Maybe the data structure could be changed
after a more careful evaluation of the problem...
</note>

With these assumptions, all the memory used to represent values, conditions
and the evaluation area can be allocated in advance (from a MonoMemPool), and
used for all the analysis of one method. Moreover, the performance of logical
operations is high (they are simple bitwise operations on bytes).
<note>
For the propagation phase, there was a function I could not easily code
with bitwise operations. As that phase is a real pain for the performance
(the complexity is quadratic, and it must be executed iteratively until
nothing changes), I decided to code it in perl, and generate a precomputed
table then used by the C code (just 64 bytes). I think this is faster than
all those branches... Really, I could think again to phases 3c and 3d (see
above), maybe there is a better way to do them.
</note>

Of course not all expressions can be handled by the summarization, so the
optimizer is not as effective as a full blown theorem prover...

Another thing related to design... the implementation is as less intrusive
as possible.
I "patched" the main files to add an option for abc removal, and handle it
just like any other option, and then wrote all the relevant code in two
separated files (one header for data structures, and a C file for the code).
If the patch turns out to be useful also for other pourposes (like a more
generic "unreachable code elimination", that detects branches that will never
be taken using the analysis of variable values), it will not be difficult
to integrate the branch summarizations into the BasicBlock data structures,
and so on.

One last thing... in some places, I knowingly coded in an inefficient way.
For instance, "remove_abc_from_block" does something only if the flag
"has_array_access_instructions" is true, but I call it anyway, letting it
do nothing (calling it conditionally would be better). There are several
things like this in the code, but I was in a hurry, and preferred to make
the code more readable (it is already difficult to understand as it is).

------------------------------------------------------------------------------
WHEN IS ABC REMOVAL POSSIBLE? WHEN IS IT USEFUL?

Warning! Random ideas ahead!

In general, managed code should be safe, so abc removal is possible only if
there already is some branch in the method that does the check.
So, one check on the index is "mandatory", the abc removal can only avoid a
second one (which is useless *because* of the first one).
If in the code there is not a conditional branch that "avoids" to access
an array out of its bounds, in general the abc removal cannot occur.

It could happen, however, that one method is called from another one, and
the check on the index is performed in the caller.

For instance:

void
caller( int[] a, int n ) {
	if ( n <= a.Length ) {
		for ( int i = 0; i < n; i++ ) {
			callee( a, i );
		}
	}
}

void callee( int[] a, int i ) {
	Console.WriteLine( "a [i] = " + a [i] );
}


In this case, it should be possible to have two versions of the callee, one
without checks, to be called from methods that do the check themselves, and
another to be called otherwise.
This kind of optimization could be profile driven: if one method is executed
several times, wastes CPU time for bound checks, and is mostly called from
another one, it could make sense to analyze the situation in detail.

However, one simple way to perform this optimization would be to inline
the callee (so that the abc removal algorithm can see both methods at once).

The biggest benefits of abc removal can be seen for array accesses that
occur "often", like the ones inside inner loops. This is why I tried to
handle "recursive" variable definitions, because without them you cannot
optimize inside loops, which is also where you need it most.

Another possible optimization (alwais related to loops) is the following:

void method( int[] a, int n ) {
	for ( int i = 0; i < n; i++ ) {
		Console.WriteLine( "a [i] = " + a [i] );
	}
}

In this case, the check on n is missing, so the abc could not be removed.
However, this would mean that i will be checked twice at each loop iteration,
one against n, and the other against a.Length. The code could be transformed
like this:

void method( int[] a, int n ) {
	if ( n < a.Length ) {
		for ( int i = 0; i < n; i++ ) {
			Console.WriteLine( "a [i] = " + a [i] ); // Remove abc
		}
	} else {
		for ( int i = 0; i < n; i++ ) {
			Console.WriteLine( "a [i] = " + a [i] ); // Leave abc
		}
	}
}

This could result in performance improvements (again, probably this
optimization should be profile driven).

------------------------------------------------------------------------------
OPEN ISSUES

There are several issues that should still be addressed...

One is related to aliasing. For now, I decided to operate only on local
variables, and ignore aliasing problems (and alias analisys has not yet
been implemented anyway).
Actually, I identified the local arrays with their length, because I
totally ignore the contents of arrays (and objects in general).

Also, in several places in the code I only handle some cases, and ignore
other more complex ones, which could anyway work (there are comments where
this happens). Anyway, this is not harmful (the code is not as effective
as it could be).

Another possible improvement is the explicit handling of all constants.
For now, code like

void
method () {
	int[] a = new int [16];
	for ( int i = 0; i < 16; i++ ) {
		a [i] = i;
	}
}

is not handled, because the two constants "16" are lost when variable
relations are summarized (actually they are not lost, they are stored in the
summarized values, but they are not yet used correctly).

The worst thing, anyway, is that for now I fail completely in distinguishing
between signed and unsigned variables/operations/conditions, and between
different integer sizes (in bytes). I did this on pourpose, just for lack of
time, but this can turn out to be terribly wrong.
The problem is caused by the fact that I handle arithmetical operations and
conditions as "ideal" operations, but actually they can overflow and/or
underflow, giving "surprising" results.

For instance, look at the following two methods:

public static void testSignedOverflow()
{
  Console.WriteLine( "Testing signed overflow..." );
  int[] ia = new int[70000];
  int i = 1;
  while ( i <ia.Length )
  {
    try
    {
      Console.WriteLine( " i = " + i );
      ia[i] = i;
    }
    catch ( IndexOutOfRangeException e )
    {
      Console.WriteLine( "Yes, we had an overflow (i = " + i + ")" );
    }
    i *= 60000;
  }
  Console.WriteLine( "Testing signed overflow done." );
}

public static void testUnsignedOverflow()
{
  Console.WriteLine( "Testing unsigned overflow..." );
  int[] ia = new int[70000];
  uint i = 1;
  while ( i <ia.Length )
  {
    try
    {
      Console.WriteLine( " i = " + i );
      ia[i] = (int) i;
    }
    catch ( IndexOutOfRangeException e )
    {
      Console.WriteLine( "Yes, we had an overflow (i = " + i + ")" );
    }
    i *= 60000;
  }
  Console.WriteLine( "Testing unsigned overflow done." );
}

In the "testSignedOverflow" method, the exception will be thrown, while
in the "testUnsignedOverflow" everything will go on smoothly.
The problem is that the test "i <ia.Length" is a signed comparison in the
first case, and will not exit from the loop if the index is negative.
Therefore, in the first method, the abc should not be removed.

In practice, the abc removal code should try to predict if any given
expression could over/under-flow, and act accordingly.
For now, I made so that the only accepted expressions are plain increments
and decrements of variables (which cannot over/under-flow without being
trapped by the existing conditions). Anyway, this issue should be analyzed
better.
By the way, in summarizations there is a field that keeps track of the
relation with "1", which I planned to use exactly to summarize products and
quotients, but it is never used (as I said, only increment and decrement
operations are properly summarized).