Miguel de Icaza
(miguel@ximian.com)
- 2002, 2007
+ 2002, 2007, 2009
* Abstract
into an empty operation. Otherwise the above will become
a return statement that can infer return types.
+* Evaluation API
+
+ The compiler can now be used as a library, the API exposed
+ lives in the Mono.CSharp.Evaluator class and it can currently
+ compile statements and expressions passed as strings and
+ compile or compile and execute immediately.
+
+ As of April 2009 this creates a new in-memory assembly for
+ each statement evaluated.
+
+ To support this evaluator mode, the evaluator API primes the
+ tokenizer with an initial character that would not appear in
+ valid C# code and is one of:
+
+ int EvalStatementParserCharacter = 0x2190; // Unicode Left Arrow
+ int EvalCompilationUnitParserCharacter = 0x2191; // Unicode Arrow
+ int EvalUsingDeclarationsParserCharacter = 0x2192; // Unicode Arrow
+
+ These character are turned into the following tokens:
+
+ %token EVAL_STATEMENT_PARSER
+ %token EVAL_COMPILATION_UNIT_PARSER
+ %token EVAL_USING_DECLARATIONS_UNIT_PARSER
+
+ This means that the first token returned by the tokenizer when
+ used by the Evalutor API is a special token that helps the
+ yacc parser go from the traditional parsing of a full
+ compilation-unit to the interactive parsing:
+
+ The entry production for the compiler basically becomes:
+
+ compilation_unit
+ //
+ // The standard rules
+ //
+ : outer_declarations opt_EOF
+ | outer_declarations global_attributes opt_EOF
+ | global_attributes opt_EOF
+ | opt_EOF /* allow empty files */
+
+ //
+ // The rule that allows interactive parsing
+ //
+ | interactive_parsing { Lexer.CompleteOnEOF = false; } opt_EOF
+ ;
+
+ //
+ // This is where Evaluator API drives the compilation
+ //
+ interactive_parsing
+ : EVAL_STATEMENT_PARSER EOF
+ | EVAL_USING_DECLARATIONS_UNIT_PARSER using_directives
+ | EVAL_STATEMENT_PARSER
+ interactive_statement_list opt_COMPLETE_COMPLETION
+ | EVAL_COMPILATION_UNIT_PARSER
+ interactive_compilation_unit
+ ;
+
+ Since there is a little bit of ambiguity for example in the
+ presence of the using directive and the using statement a
+ micro-predicting parser with multiple token look aheads is
+ used in eval.cs to resolve the ambiguity and produce the
+ actual token that will drive the compilation.
+
+ This helps this scenario:
+ using System;
+ vs
+ using (var x = File.OpenRead) {}
+
+ This is the meaning of these new initial tokens:
+
+ EVAL_STATEMENT_PARSER
+ Used to parse statements or expressions as statements.
+
+ EVAL_USING_DECLARATIONS_UNIT_PARSER
+ This instructs the parser to merely do using-directive
+ parsing instead of statement parsing.
+
+ EVAL_COMPILATION_UNIT_PARSER
+ Used to evaluate toplevel declarations like namespaces
+ and classes.
+
+ The feature is currently disabled because later stages
+ of the compiler are not yet able to lookup previous
+ definitions of classes.
+
+ What happens is that between each call to Evaluate()
+ we reset the compiler state and at this stage we drop
+ also any existing definitions, so evaluating "class X
+ {}" followed by "class Y : X {}" does not currently
+ work.
+
+ We need to make sure that new type definitions used
+ interactively are preseved from one evaluation to the
+ next.
+
+ The evaluator the expression or statement `BODY' is hosted
+ inside a wrapper class. If the statement is a variable
+ declaration then the declaration is split from the assignment
+ into a DECLARATION and BODY.
+
+ This is what the code generated looks like:
+
+ public class Foo : $InteractiveBaseClass {
+ DECLARATION
+
+ static void Host (ref object $retval)
+ {
+ BODY
+ }
+ }
+
+ Since both statements and expressions are mixed together and
+ it is useful to use the Evaluator to compute expressions we
+ return expressions for example for "1+2" in the `retval'
+ reference object.
+
+ To support this, the reference retval parameter is set to a
+ special internal value that means "Value was not set" before
+ the method Host is invoked. During parsing the parser turns
+ expressions like "1+2" into:
+
+ retval = 1 + 2;
+
+ This is done using a special OptionalAssign
+ ExpressionStatement class.
+
+ When the Host method return, if the value of retval is still
+ the special flag no value was set. Otherwise the result of
+ the expression is in retval.
+
+ The `InteractiveBaseClass' is the base class for the method,
+ this allows for embedders to provide different base classes
+ that could expose new static methods that could be useful
+ during expression evaluation.
+
+ Our default implementation is InteractiveBaseClass and new
+ implementations should derive from this and set the property
+ in the Evaluator to it.
+
+ In the future we will move to creating dynamic methods as the
+ wrapper for this code.
+
+* Code Completion
+
+ Support for code completion is available to allow the compiler
+ to provide a list of possible completions at any given point
+ int he parsing process. This is used for Tab-completion in
+ an interactive shell or visual aids in GUI shells for possible
+ method completions.
+
+ This method is available as part of the Evaluator API where a
+ special method GetCompletions returns a list of possible
+ completions given a partial input.
+
+ The parser and tokenizer work together so that the tokenizer
+ upon reaching the end of the input generates the following
+ tokens: GENERATE_COMPLETION followed by as many
+ COMPLETE_COMPLETION token and finally the EOF token.
+
+ GENERATE_COMPLETION needs to be handled in every production
+ where the user is likely to press the TAB key in the shell (or
+ in the future the GUI, or an explicit request in an IDE).
+ COMPLETE_COMPLETION must be handled throughout the grammar to
+ provide a way of completing the parsed expression. See below
+ for details.
+
+ For the member access case, I have added productions that
+ mirror the non-completing productions, for example:
+
+ primary_expression DOT IDENTIFIER GENERATE_COMPLETION
+ {
+ LocatedToken lt = (LocatedToken) $3;
+ $$ = new CompletionMemberAccess ((Expression) $1, lt.Value, lt.Location);
+ }
+
+ This mirrors:
+
+ primary_expression DOT IDENTIFIER opt_type_argument_list
+ {
+ LocatedToken lt = (LocatedToken) $3;
+ $$ = new MemberAccess ((Expression) $1, lt.Value, (TypeArguments) $4, lt.Location);
+ }
+
+ The CompletionMemberAccess is a new kind of
+ Mono.CSharp.Expression that does the actual lookup. It
+ internally mimics some of the MemberAccess code but has been
+ tuned for this particular use.
+
+ After this initial token is processed GENERATE_COMPLETION the
+ tokenizer will emit COMPLETE_COMPLETION tokens. This is done
+ to help the parser basically produce a valid result from the
+ partial input it received. For example it is able to produce
+ a valid AST from "(x" even if no parenthesis has been closed.
+ This is achieved by sprinkling the grammar with productions
+ that can cope with this "winding down" token, for example this
+ is what parenthesized_expression looks like now:
+
+ parenthesized_expression
+ : OPEN_PARENS expression CLOSE_PARENS
+ {
+ $$ = new ParenthesizedExpression ((Expression) $2);
+ }
+ //
+ // New production
+ //
+ | OPEN_PARENS expression COMPLETE_COMPLETION
+ {
+ $$ = new ParenthesizedExpression ((Expression) $2);
+ }
+ ;
+
+ Once we have wrapped up everything we generate the last EOF token.
+
+ When the AST is complete we actually trigger the regular
+ semantic analysis process. The DoResolve method of each node
+ in our abstract syntax tree will compute the result and
+ communicate the possible completions by throwing an exception
+ of type CompletionResult.
+
+ So for example if the user type "T" and the completion is
+ "ToString" we return "oString".
+
+** Enhancing Completion
+
+ Code completion is a process that will be curated over time.
+ Just like producing good error reports or warnings is an
+ iterative process to find a good balance, the code completion
+ engine in the compiler will require tuning to find the right
+ balance for the end user.
+
+ This section explains the basic process by which you can
+ improve the code completion by using a real life sample.
+
+ Once you add the GENERATE_COMPLETION token to your grammar
+ rule, chances are, you will need to alter the grammar to
+ support COMPLETE_COMPLETION all the way up to the toplevel
+ production.
+
+ To debug this, you will want to try the completion with either
+ a sample program or with the `csharp' tool.
+
+ I use this setup:
+
+ $ csharp -v -v
+
+ This will turn on the parser debugging output and will
+ generate a lot of data when parsing its input.
+
+ To start with a new completion scheme, type your C# code and
+ then hit the tab key to trigger the completion engine. In the
+ generated output you will want to look for the first time that
+ the parser got the GENERATE_COMPLETION token, it will look
+ like this:
+
+ lex state 414 reading GENERATE_COMPLETION value {interactive}(1,35):
+
+ The first word `lex' indicates that the parser called the
+ lexer at state 414 (more on this in a second) and it got back
+ from the lexer the token GENERATE_COMPLETION. If this is a
+ kind of completion chances are, you will get an error
+ immediately as the rules at that point do not know how to cope
+ with the stream of COMPLETE_COMPLETION tokens that will
+ follow, they will look like this:
+
+ error syntax error
+ pop state 414 on error
+ pop state 805 on error
+ pop state 628 on error
+ pop state 417 on error
+
+ The first line means that the parser has entered the error
+ state and will pop states until it can find a production that
+ can deal with the error. At that point an error message will
+ be displayed.
+
+ Open the file `y.output' which describes the parser states
+ generated by jay and search for the state that was reported
+ previously in `lex' that got the GENERATE_COMPLETION:
+
+ state 414
+ object_or_collection_initializer : OPEN_BRACE . opt_member_initializer_list CLOSE_BRACE (444)
+ object_or_collection_initializer : OPEN_BRACE . member_initializer_list COMMA CLOSE_BRACE (445)
+ opt_member_initializer_list : . (446)
+
+ We now know that the parser was in the middle of parsing an
+ `object_or_collection_initializer' and had alread seen the
+ OPEN_BRACE token.
+
+ The `.' after OPEN_BRACE indicates the current state of the
+ parser, and this is where our parser got the
+ GENERATE_COMPLETION token. As you can see from the three
+ rules in this sample, support for GENERATE_COMPLETION did not
+ exist.
+
+ So we must edit the grammar to add a production for this case,
+ I made the code look like this:
+
+ member_initializer
+ [...]
+ | GENERATE_COMPLETION
+ {
+ LocatedToken lt = $1 as LocatedToken;
+ $$ = new CompletionElementInitializer (GetLocation ($1));
+ }
+ [...]
+
+ This new production creates the class
+ CompletionElementInitializer and returns this as the value for
+ this. The following is a trivial implementation that always
+ returns "foo" and "bar" as the two completions and it
+ illustrates how things work:
+
+ public class CompletionElementInitializer : CompletingExpression {
+ public CompletionElementInitializer (Location l)
+ {
+ this.loc = l;
+ }
+
+ public override Expression DoResolve (EmitContext ec)
+ {
+ string [] = new string [] { "foo", "bar" };
+ throw new CompletionResult ("", result);
+ }
+
+ //
+ // You should implement CloneTo if your CompletingExpression
+ // keeps copies to Statements or Expressions. CloneTo
+ // is used by the lambda engine, so you should always
+ // implement this
+ //
+ protected override void CloneTo (CloneContext clonectx, Expression t)
+ {
+ // We do not keep references to anything interesting
+ // so cloning is an empty operation.
+ }
+ }
+
+
+ We then rebuild our compiler:
+
+ (cd mcs/; make cs-parser.jay)
+ (cd tools/csharplib; make install)
+
+ And re-run csharp:
+
+ (cd tools/csharp; csharp -v -v)
+
+ Chances are, you will get another error, but this time it will
+ not be for the GENERATE_COMPLETION, we already handled that
+ one. This time it will be for COMPLETE_COMPLETION.
+
+ The remaining of the process is iterative: you need to locate
+ the state where this error happens. It will look like this:
+
+ lex state 623 reading COMPLETE_COMPLETION value {interactive}(1,35):
+ error syntax error
+
+ And make sure that the state can handle at this point a
+ COMPLETE_COMPLETION. When receiving COMPLETE_COMPLETION the
+ parser needs to complete constructing the parse tree, so
+ productions that handle COMPLETE_COMPLETION need to wrap
+ things up with whatever data they have available and just make
+ it so that the parser can complete.
+
+ To avoid rule duplication you can use the
+ opt_COMPLETE_COMPLETION production and append it to an
+ existing production:
+
+ foo : bar opt_COMPLETE_COMPLETION {
+ ..
+ }
+
* Miscellaneous
** Error Processing.
RootContext.WarningLevel in a few places to decide whether a
warning is worth reporting to the user or not.
-* Debugging the compiler
+** Debugging the compiler
Sometimes it is convenient to find *how* a particular error
message is being reported from, to do that, you might want to use
You can use this with -warnaserror to obtain the same effect
with warnings.
-* Debugging the Parser.
+** Debugging the Parser.
A useful trick while debugging the parser is to pass the -v
command line option to the compiler.
* Editing the compiler sources
- The compiler sources are intended to be edited with 134 columns of width
+ The compiler sources are intended to be edited with 134
+ columns of width.
* Quick Hacks