How Control Flow Analysis works in MCS Martin Baulig (martin@gnome.org) 2002 This document gives you a short overview how control flow analysis works in MCS. Control Flow Analysis is used to ensure that local variables are not accessed before they have been initialized and that all `out' parameters have been assigned before control leaves the current method. To do this, the compiler keeps two bitvectors - one for the locals and one for the out parameters (the latter one isn't needed if a method doesn't have any `out' parameters). Each time the compiler encounters a branching of the program's control flow, these bitvectors are duplicated for each of the possible branches and then merged after the branching. * Simple branchings As a first rule, a local variable is only initialized if it's initialized in all possible branches: 1 int a; 2 if (something) 3 a = 3; 4 else 5 Console.WriteLine ("Hello World"); 6 Console.WriteLine (a); In this example there's one branching of control flow (lines 2-5) which has two branches: the `if' block in line 3 and the `else' block in line 5. In line 6, the local `a' hasn't been initialized because it is only initialized in the `if' case, but not in the `else' case. * Control may return from the current block However, there's an exception to this rule: control may return from the current block before it reaches its end: 1 int a; 2 if (something) 3 a = 3; 4 else 5 return; 6 Console.WriteLine (a); In this case, line 6 will only be reached in the `if' case, but not in the `else' case - so for the local `a' to be initialized, the `else' clause is of no importance. This means that we must change this simple rule to: A local variable must be initialized in all possible branches which do not return. * `out' parameters As a simple rule, an `out' parameter must be assigned before control leaves the current method. If `a' is an `out' parameter in the following example, it must have been initialized in line 5 and in line 8 because control may return to the caller in these lines. 1 if (something) 2 Console.WriteLine ("Hello World"); 3 else { 4 a = 8; 5 return; 6 } 7 a = 9; 8 return; * Return vs. Break This is not so simple as it looks like, let's assume `b' is an `out' parameter: 1 int a; 2 do { 3 if (something) 3 a = 3; 4 else 5 break; 6 b = a; 7 } while (some_condition); 8 Console.WriteLine (a); Regarding the local `a', the assignment in line 6 is allowed, but the output in line 8 is not. However, control only leaves the current block in line 5, but not the whole method. That's why the control flow code distinguishes between `break' and `return' statements - a `break' statement is any statement which makes control leave the current block (break, continue, goto, return) and a `return' statement is a statement which makes control leave the current method (return). There are two `FlowReturns' states in the FlowBranching class: `Breaks' specifies whether control may leave the current block before reaching its end and `Returns' specifies whether control may leave the current method before reaching the end of the current block. At the end of each flow branching which may return, but which does not always throw an exception, the state of all `out' parameters is checked and if one of them isn't initialized, an error is reported. * Forward gotos In the following example, the local `a' isn't initialized in line 5 since the assignment in line 3 is never executed: 1 int a; 2 goto World; 3 a = 4; 4 World: 5 Console.WriteLine (a); Each time the compiler encounters a forward goto, it duplicates the bitvector with the current state of the locals and `out' parameters and passed it to LabeledStatement.AddUserVector() to tell the label that may be reached with a forward goto from a code position with that state. When the label is reached, the state of all these jump origins is merged with the current state - see UsageVector.MergeJumpOrigins. * Exception blocks Things get a bit more difficult in exception blocks. There are a few rules in exception blocks: * A local is considered as being initialized after an exception block if it has been initialized in either the try block and all not always returning catch blocks or in the finally block. * If the try block always throws an exception and the local is initialized in all not always returning catch blocks, then it will be initialized after the exception block. * The code after the exception block is only reached if either the whole try block is executed or none of the catch blocks may ever return. Since an exception may occur at any time in a try block, the compiler can't know whether the whole try block will be executed or not - so it needs to look only at the catch blocks to find out whether the exception block may return or not. The rule here is that an exception block may return unless it has at least one catch block and none of the catch blocks may ever return. * Since the finally block is always executed, it is fine for an `out' parameter to be initialized there. The same applies for locals - if a local or an `out' parameter is initialized in a finally block, then it's always initialized. * If the try or a catch block may return, all `out' parameters which aren't initialized in the finally block must have been initialized before the potential return. Internally, the code handles a return in a try or catch block like a forward goto to the finally block. Last updated August 5th, 2002 Martin Baulig