How Control Flow Analysis works in MCS
        
                                Martin Baulig
                              (martin@gnome.org)
                                      2002

This document gives you a short overview how control flow analysis
works in MCS.

Control Flow Analysis is used to ensure that local variables are not
accessed before they have been initialized and that all `out'
parameters have been assigned before control leaves the current
method.  To do this, the compiler keeps two bitvectors - one for the
locals and one for the out parameters (the latter one isn't needed if
a method doesn't have any `out' parameters).

Each time the compiler encounters a branching of the program's control
flow, these bitvectors are duplicated for each of the possible
branches and then merged after the branching.

* Simple branchings

As a first rule, a local variable is only initialized if it's
initialized in all possible branches:

    1    int a;
    2    if (something)
    3       a = 3;
    4    else
    5       Console.WriteLine ("Hello World");
    6    Console.WriteLine (a);

In this example there's one branching of control flow (lines 2-5)
which has two branches: the `if' block in line 3 and the `else' block
in line 5.  In line 6, the local `a' hasn't been initialized because
it is only initialized in the `if' case, but not in the `else' case.

* Control may return from the current block

However, there's an exception to this rule: control may return from
the current block before it reaches its end:

    1    int a;
    2    if (something)
    3       a = 3;
    4    else
    5       return;
    6    Console.WriteLine (a);

In this case, line 6 will only be reached in the `if' case, but not in
the `else' case - so for the local `a' to be initialized, the `else'
clause is of no importance.

This means that we must change this simple rule to:

     A local variable must be initialized in all possible branches
     which do not return.

* `out' parameters

As a simple rule, an `out' parameter must be assigned before control
leaves the current method.  If `a' is an `out' parameter in the
following example, it must have been initialized in line 5 and in line
8 because control may return to the caller in these lines.

    1    if (something)
    2       Console.WriteLine ("Hello World");
    3    else {
    4       a = 8;
    5       return;
    6    }
    7    a = 9;
    8    return;

* Return vs. Break

This is not so simple as it looks like, let's assume `b' is an `out'
parameter:

    1    int a;
    2    do {
    3       if (something)
    3          a = 3;
    4       else
    5          break;
    6       b = a;
    7    } while (some_condition);
    8    Console.WriteLine (a);

Regarding the local `a', the assignment in line 6 is allowed, but the
output in line 8 is not.  However, control only leaves the current
block in line 5, but not the whole method.

That's why the control flow code distinguishes between `break' and
`return' statements - a `break' statement is any statement which makes
control leave the current block (break, continue, goto, return) and a
`return' statement is a statement which makes control leave the
current method (return).

There are two `FlowReturns' states in the FlowBranching class:
`Breaks' specifies whether control may leave the current block before
reaching its end and `Returns' specifies whether control may leave the
current method before reaching the end of the current block.

At the end of each flow branching which may return, but which does not
always throw an exception, the state of all `out' parameters is
checked and if one of them isn't initialized, an error is reported.

* Forward gotos

In the following example, the local `a' isn't initialized in line 5
since the assignment in line 3 is never executed:

    1    int a;
    2    goto World;
    3    a = 4;
    4  World:
    5    Console.WriteLine (a);

Each time the compiler encounters a forward goto, it duplicates the
bitvector with the current state of the locals and `out' parameters
and passed it to LabeledStatement.AddUserVector() to tell the label
that may be reached with a forward goto from a code position with that
state.  When the label is reached, the state of all these jump origins
is merged with the current state - see UsageVector.MergeJumpOrigins.

* Exception blocks

Things get a bit more difficult in exception blocks.

There are a few rules in exception blocks:

* A local is considered as being initialized after an exception block
  if it has been initialized in either the try block and all not
  always returning catch blocks or in the finally block.

* If the try block always throws an exception and the local is
  initialized in all not always returning catch blocks, then it will
  be initialized after the exception block.

* The code after the exception block is only reached if either the
  whole try block is executed or none of the catch blocks may ever
  return.  Since an exception may occur at any time in a try block,
  the compiler can't know whether the whole try block will be executed
  or not - so it needs to look only at the catch blocks to find out
  whether the exception block may return or not.  The rule here is
  that an exception block may return unless it has at least one catch
  block and none of the catch blocks may ever return.

* Since the finally block is always executed, it is fine for an
  `out' parameter to be initialized there.  The same applies for
  locals - if a local or an `out' parameter is initialized in a
  finally block, then it's always initialized.

* If the try or a catch block may return, all `out' parameters which
  aren't initialized in the finally block must have been initialized
  before the potential return.

Internally, the code handles a return in a try or catch block like a
forward goto to the finally block.


Last updated August 5th, 2002
Martin Baulig <martin@gnome.org>