Code to do semantic analysis and emit the attributes
is here.
- rootcontext.cs:
+ module.cs:
Keeps track of the types defined in the source code,
as well as the assemblies loaded.
top of the stack as a number of values `TAKING',
`TAKEN_BEFORE', `ELSE_SEEN', `PARENT_TAKING'.
+ To debug problems in your grammar, you need to edit the
+ Makefile and make sure that the -ct options are passed to
+ jay. The current incarnation says:
+
+ ./../jay/jay -c < ./../jay/skeleton.cs cs-parser.jay
+
+ During debugging, you want to change this to:
+
+ ./../jay/jay -cvt < ./../jay/skeleton.cs cs-parser.jay
+
+ This generates a parser with debugging information and allows
+ you to activate verbose parser output in both the csharp
+ command and the mcs command by passing the "-v -v" flag (-v
+ twice).
+
+ When you do this, standard output will have a dump of the
+ tokens parsed and how the parser reacted to those. You can
+ look up the states with the y.output file that contains the
+ entire parser state diagram in human readable form.
+
** Locations
Locations are encoded as a 32-bit number (the Location
The token 0 is reserved for ``anonymous'' locations, ie. if we
don't know the location (Location.Null).
- The tokenizer also tracks the column number for a token, but
- this is currently not being used or encoded. It could
- probably be encoded in the low 9 bits, allowing for columns
- from 1 to 512 to be encoded.
-
* The Parser
The parser is written using Jay, which is a port of Berkeley
In the Mono C# compiler, we use a class for each of the
statements and expressions in the C# language. For example,
- there is a `While' class for the the `while' statement, a
+ there is a `While' class for the `while' statement, a
`Cast' class to represent a cast expression and so on.
There is a Statement class, and an Expression class which are
During parsing, the compiler will create the various trees of
expressions. These expressions have to be resolved before they
- are can be used. The semantic analysis is implemented by
+ can be used. The semantic analysis is implemented by
resolving each of the expressions created during parsing and
creating fully resolved expressions.
a [i++]++
a [i++] += 5;
+
+** Optimalizations
+
+ Compiler does some limited high-level optimalizations when
+ -optimize option is used
+
+*** Instance field initializer to default value
+
+ Code to optimize:
+
+ class C
+ {
+ enum E
+ {
+ Test
+ }
+
+ int i = 0; // Field will not be redundantly assigned
+ int i2 = new int (); // This will be also completely optimized out
+
+ E e = E.Test; // Even this will go out.
+ }
** Statements
* Elements that contain code are now invoked to
perform semantic analysis and code generation.
+
+* References loading
+
+ Most programs use external references (assemblies and modules).
+ Compiler loads all referenced top-level types from referenced
+ assemblies into import cached. It imports initialy only C#
+ valid top-level types all other members are imported on demand
+ when needed.
+
+* Namespaces definition
+
+ Before any type resolution can be done we define all compiled
+ namespaces. This is mainly done to prepare using clauses of each
+ namespace block before any type resolution takes a place.
+
+* Types definition
+
+ The first step of type definition is to resolve base class or
+ base interfaces to correctly setup type hierarchy before any
+ member is defined.
+
+ At this point we do some error checking and verify that the
+ members inheritance is correct and some other members
+ oriented checks.
+
+ By the time we are done, all classes, structs and interfaces
+ have been defined and all their members have been defined as
+ well.
+
+* MemberCache
+
+ MemberCache is one of core compiler components. It maintains information
+ about types and their members. It tries to be as fast as possible
+ because almost all resolve operations end up querying members info in
+ some way.
+
+ MemberCache is not definition but specification oriented to maintain
+ differences between inflated versions of generic types. This makes usage
+ of MemberCache simple because consumer does not need to care how to inflate
+ current member and returned type information will always give correctly
+ inflated type. However setting MemberCache up is one of the most complicated
+ parts of the compiler due to possible dependencies when types are defined
+ and complexity of nested types.
* Output Generation
into an empty operation. Otherwise the above will become
a return statement that can infer return types.
+* Debugger support
+
+ Compiler produces .mdb symbol file for better debugging experience. The
+ process is quite straightforward. For every statement or a block there
+ is an entry in symbol file. Each entry includes of start location of
+ the statement and it's starting IL offset in the method. For most statements
+ this is easy but few need special handling (e.g. do, while).
+
+ When sequence point is needed to represent original location and no IL
+ entry is written for the line we emit `nop' instruction. This is done only
+ for very few constructs (e.g. block opening brace).
+
+ Captured variables are not treated differently at the moment. Debugger has
+ internal knowledge of their mangled names and how to decode them.
+
+* IKVM.Reflection vs System.Reflection
+
+ Mono compiler can be compiled using different reflection backends. At the
+ moment we support System.Reflection and IKVM.Reflection they both use same
+ API as official System.Reflection.Emit API which allows us to maintain only
+ single version of compiler with few using aliases to specialise.
+
+ The backends are not plug-able but require compiler to be compiled with
+ specific STATIC define when targeting IKVM.Reflection.
+
+ IKVM.Reflection is used for static compilation. This means the compiler runs
+ in batch mode like most compilers do. It can target any runtime version and
+ use any mscorlib. The mcs.exe is using IKVM.Reflection.
+
+ System.Reflection is used for dynamic compilation. This mode is used by
+ our REPL and Evaluator API. Produced IL code is not written to disc but
+ executed by runtime (JIT). Mono.CSharp.dll is using System.Reflection and
+ System.Reflection.Emit.
+
* Evaluation API
The compiler can now be used as a library, the API exposed
Once we have wrapped up everything we generate the last EOF token.
- When the AST is complete we actually trigger the regular semantic
- analysis process. The DoResolve method of each node in our abstract
- syntax tree will compute the result and communicate the possible
- completions by throwing an exception of type CompletionResult.
+ When the AST is complete we actually trigger the regular
+ semantic analysis process. The DoResolve method of each node
+ in our abstract syntax tree will compute the result and
+ communicate the possible completions by throwing an exception
+ of type CompletionResult.
So for example if the user type "T" and the completion is
"ToString" we return "oString".
+** Enhancing Completion
+
+ Code completion is a process that will be curated over time.
+ Just like producing good error reports and warnings is an
+ iterative process, to find a good balance, the code completion
+ engine in the compiler will require tuning to find the right
+ balance for the end user.
+
+ This section explains the basic process by which you can
+ improve the code completion by using a real life sample.
+
+ Once you add the GENERATE_COMPLETION token to your grammar
+ rule, chances are, you will need to alter the grammar to
+ support COMPLETE_COMPLETION all the way up to the toplevel
+ production.
+
+ To debug this, you will want to try the completion with either
+ a sample program or with the `csharp' tool.
+
+ I use this setup:
+
+ $ csharp -v -v
+
+ This will turn on the parser debugging output and will
+ generate a lot of data when parsing its input (make sure that
+ your parser has been compiled with the -v flag, see above for
+ details).
+
+ To start with a new completion scheme, type your C# code and
+ then hit the tab key to trigger the completion engine. In the
+ generated output you will want to look for the first time that
+ the parser got the GENERATE_COMPLETION token, it will look
+ like this:
+
+ lex state 414 reading GENERATE_COMPLETION value {interactive}(1,35):
+
+ The first word `lex' indicates that the parser called the
+ lexer at state 414 (more on this in a second) and it got back
+ from the lexer the token GENERATE_COMPLETION. If this is a
+ kind of completion chances are, you will get an error
+ immediately as the rules at that point do not know how to cope
+ with the stream of COMPLETE_COMPLETION tokens that will
+ follow, they will look like this:
+
+ error syntax error
+ pop state 414 on error
+ pop state 805 on error
+ pop state 628 on error
+ pop state 417 on error
+
+ The first line means that the parser has entered the error
+ state and will pop states until it can find a production that
+ can deal with the error. At that point an error message will
+ be displayed.
+
+ Open the file `y.output' which describes the parser states
+ generated by jay and search for the state that was reported
+ previously in `lex' that got the GENERATE_COMPLETION:
+
+ state 414
+ object_or_collection_initializer : OPEN_BRACE . opt_member_initializer_list CLOSE_BRACE (444)
+ object_or_collection_initializer : OPEN_BRACE . member_initializer_list COMMA CLOSE_BRACE (445)
+ opt_member_initializer_list : . (446)
+
+ We now know that the parser was in the middle of parsing an
+ `object_or_collection_initializer' and had alread seen the
+ OPEN_BRACE token.
+
+ The `.' after OPEN_BRACE indicates the current state of the
+ parser, and this is where our parser got the
+ GENERATE_COMPLETION token. As you can see from the three
+ rules in this sample, support for GENERATE_COMPLETION did not
+ exist.
+
+ So we must edit the grammar to add a production for this case,
+ I made the code look like this:
+
+ member_initializer
+ [...]
+ | GENERATE_COMPLETION
+ {
+ LocatedToken lt = $1 as LocatedToken;
+ $$ = new CompletionElementInitializer (GetLocation ($1));
+ }
+ [...]
+
+ This new production creates the class
+ CompletionElementInitializer and returns this as the value for
+ this. The following is a trivial implementation that always
+ returns "foo" and "bar" as the two completions and it
+ illustrates how things work:
+
+ public class CompletionElementInitializer : CompletingExpression {
+ public CompletionElementInitializer (Location l)
+ {
+ this.loc = l;
+ }
+
+ public override Expression DoResolve (EmitContext ec)
+ {
+ string [] = new string [] { "foo", "bar" };
+ throw new CompletionResult ("", result);
+ }
+
+ //
+ // You should implement CloneTo if your CompletingExpression
+ // keeps copies to Statements or Expressions. CloneTo
+ // is used by the lambda engine, so you should always
+ // implement this
+ //
+ protected override void CloneTo (CloneContext clonectx, Expression t)
+ {
+ // We do not keep references to anything interesting
+ // so cloning is an empty operation.
+ }
+ }
+
+
+ We then rebuild our compiler:
+
+ (cd mcs/; make cs-parser.jay)
+ (cd class/Mono.CSharp; make install)
+
+ And re-run csharp:
+
+ (cd tools/csharp; csharp -v -v)
+
+ Chances are, you will get another error, but this time it will
+ not be for the GENERATE_COMPLETION, we already handled that
+ one. This time it will be for COMPLETE_COMPLETION.
+
+ The remaining of the process is iterative: you need to locate
+ the state where this error happens. It will look like this:
+
+ lex state 623 reading COMPLETE_COMPLETION value {interactive}(1,35):
+ error syntax error
+
+ And make sure that the state can handle at this point a
+ COMPLETE_COMPLETION. When receiving COMPLETE_COMPLETION the
+ parser needs to complete constructing the parse tree, so
+ productions that handle COMPLETE_COMPLETION need to wrap
+ things up with whatever data they have available and just make
+ it so that the parser can complete.
+
+ To avoid rule duplication you can use the
+ opt_COMPLETE_COMPLETION production and append it to an
+ existing production:
+
+ foo : bar opt_COMPLETE_COMPLETION {
+ ..
+ }
+
* Miscellaneous
** Error Processing.
`--fatal' flag is passed to the compiler, the Report.Error
routine will throw an exception. This can be used to pinpoint
the location of the bug and examine the variables around the
- error location.
+ error location. If you pass a number to --fatal the exception
+ will only be thrown when the error count reaches the specified
+ count.
Warnings can be turned into errors by using the `--werror'
flag to the compiler.