SUBDIRS =
include ../build/rules.make
--DISTFILES = clr-abi.txt compiler control-flow-analysis.txt order.txt
++DISTFILES = clr-abi.txt compiler.txt control-flow-analysis.txt order.txt
all-local install-local clean-local test-local run-test-local run-test-ondotnet-local uninstall-local:
+++ /dev/null
-- The Internals of the Mono C# Compiler
--
-- Miguel de Icaza
-- (miguel@ximian.com)
-- 2002
--
--* Abstract
--
-- The Mono C# compiler is a C# compiler written in C# itself.
-- Its goals are to provide a free and alternate implementation
-- of the C# language. The Mono C# compiler generates ECMA CIL
-- images through the use of the System.Reflection.Emit API which
-- enable the compiler to be platform independent.
--
--* Overview: How the compiler fits together
--
-- The compilation process is managed by the compiler driver (it
-- lives in driver.cs).
--
-- The compiler reads a set of C# source code files, and parses
-- them. Any assemblies or modules that the user might want to
-- use with his project are loaded after parsing is done.
--
-- Once all the files have been parsed, the type hierarchy is
-- resolved. First interfaces are resolved, then types and
-- enumerations.
--
-- Once the type hierarchy is resolved, every type is populated:
-- fields, methods, indexers, properties, events and delegates
-- are entered into the type system.
--
-- At this point the program skeleton has been completed. The
-- next process is to actually emit the code for each of the
-- executable methods. The compiler drives this from
-- RootContext.EmitCode.
--
-- Each type then has to populate its methods: populating a
-- method requires creating a structure that is used as the state
-- of the block being emitted (this is the EmitContext class) and
-- then generating code for the topmost statement (the Block).
--
-- Code generation has two steps: the first step is the semantic
-- analysis (Resolve method) that resolves any pending tasks, and
-- guarantees that the code is correct. The second phase is the
-- actual code emission. All errors are flagged during in the
-- "Resolution" process.
--
-- After all code has been emitted, then the compiler closes all
-- the types (this basically tells the Reflection.Emit library to
-- finish up the types), resources, and definition of the entry
-- point are done at this point, and the output is saved to
-- disk.
--
-- The following list will give you an idea of where the
-- different pieces of the compiler live:
--
-- Infrastructure:
--
-- driver.cs:
-- This drives the compilation process: loading of
-- command line options; parsing the inputs files;
-- loading the referenced assemblies; resolving the type
-- hierarchy and emitting the code.
--
-- codegen.cs:
--
-- The state tracking for code generation.
--
-- attribute.cs:
--
-- Code to do semantic analysis and emit the attributes
-- is here.
--
-- rootcontext.cs:
--
-- Keeps track of the types defined in the source code,
-- as well as the assemblies loaded.
--
-- typemanager.cs:
--
-- This contains the MCS type system.
--
-- report.cs:
--
-- Error and warning reporting methods.
--
-- support.cs:
--
-- Assorted utility functions used by the compiler.
--
-- Parsing
--
-- cs-tokenizer.cs:
--
-- The tokenizer for the C# language, it includes also
-- the C# pre-processor.
--
-- cs-parser.jay, cs-parser.cs:
--
-- The parser is implemented using a C# port of the Yacc
-- parser. The parser lives in the cs-parser.jay file,
-- and cs-parser.cs is the generated parser.
--
-- location.cs:
--
-- The `location' structure is a compact representation
-- of a file, line, column where a token, or a high-level
-- construct appears. This is used to report errors.
--
-- Expressions:
--
-- ecore.cs
--
-- Basic expression classes, and interfaces most shared
-- code and static methods are here.
--
-- expression.cs:
--
-- Most of the different kinds of expressions classes
-- live in this file.
--
-- assign.cs:
--
-- The assignment expression got its own file.
--
-- constant.cs:
--
-- The classes that represent the constant expressions.
--
-- literal.cs
--
-- Literals are constants that have been entered manually
-- in the source code, like `1' or `true'. The compiler
-- needs to tell constants from literals apart during the
-- compilation process, as literals sometimes have some
-- implicit extra conversions defined for them.
--
-- cfold.cs:
--
-- The constant folder for binary expressions.
--
-- Statements
--
-- statement.cs:
--
-- All of the abstract syntax tree elements for
-- statements live in this file. This also drives the
-- semantic analysis process.
--
-- iterators.cs:
--
-- Contains the support for implementing iterators from
-- the C# 2.0 specification.
--
-- Declarations, Classes, Structs, Enumerations
--
-- decl.cs
--
-- This contains the base class for Members and
-- Declaration Spaces. A declaration space introduces
-- new names in types, so classes, structs, delegates and
-- enumerations derive from it.
--
-- class.cs:
--
-- Methods for holding and defining class and struct
-- information, and every member that can be in these
-- (methods, fields, delegates, events, etc).
--
-- The most interesting type here is the `TypeContainer'
-- which is a derivative of the `DeclSpace'
--
-- delegate.cs:
--
-- Handles delegate definition and use.
--
-- enum.cs:
--
-- Handles enumerations.
--
-- interface.cs:
--
-- Holds and defines interfaces. All the code related to
-- interface declaration lives here.
--
-- parameter.cs:
--
-- During the parsing process, the compiler encapsulates
-- parameters in the Parameter and Parameters classes.
-- These classes provide definition and resolution tools
-- for them.
--
-- pending.cs:
--
-- Routines to track pending implementations of abstract
-- methods and interfaces. These are used by the
-- TypeContainer-derived classes to track whether every
-- method required is implemented.
--
--
--* The parsing process
--
-- All the input files that make up a program need to be read in
-- advance, because C# allows declarations to happen after an
-- entity is used, for example, the following is a valid program:
--
-- class X : Y {
-- static void Main ()
-- {
-- a = "hello"; b = "world";
-- }
-- string a;
-- }
--
-- class Y {
-- public string b;
-- }
--
-- At the time the assignment expression `a = "hello"' is parsed,
-- it is not know whether a is a class field from this class, or
-- its parents, or whether it is a property access or a variable
-- reference. The actual meaning of `a' will not be discovered
-- until the semantic analysis phase.
--
--** The Tokenizer and the pre-processor
--
-- The tokenizer is contained in the file `cs-tokenizer.cs', and
-- the main entry point is the `token ()' method. The tokenizer
-- implements the `yyParser.yyInput' interface, which is what the
-- Yacc/Jay parser will use when fetching tokens.
--
-- Token definitions are generated by jay during the compilation
-- process, and those can be references from the tokenizer class
-- with the `Token.' prefix.
--
-- Each time a token is returned, the location for the token is
-- recorded into the `Location' property, that can be accessed by
-- the parser. The parser retrieves the Location properties as
-- it builds its internal representation to allow the semantic
-- analysis phase to produce error messages that can pin point
-- the location of the problem.
--
-- Some tokens have values associated with it, for example when
-- the tokenizer encounters a string, it will return a
-- LITERAL_STRING token, and the actual string parsed will be
-- available in the `Value' property of the tokenizer. The same
-- mechanism is used to return integers and floating point
-- numbers.
--
-- C# has a limited pre-processor that allows conditional
-- compilation, but it is not as fully featured as the C
-- pre-processor, and most notably, macros are missing. This
-- makes it simple to implement in very few lines and mesh it
-- with the tokenizer.
--
-- The `handle_preprocessing_directive' method in the tokenizer
-- handles all the pre-processing, and it is invoked when the '#'
-- symbol is found as the first token in a line.
--
-- The state of the pre-processor is contained in a Stack called
-- `ifstack', this state is used to track the if/elif/else/endif
-- nesting and the current state. The state is encoded in the
-- top of the stack as a number of values `TAKING',
-- `TAKEN_BEFORE', `ELSE_SEEN', `PARENT_TAKING'.
--
--** Locations
--
-- Locations are encoded as a 32-bit number (the Location
-- struct) that map each input source line to a linear number.
-- As new files are parsed, the Location manager is informed of
-- the new file, to allow it to map back from an int constant to
-- a file + line number.
--
-- Prior to parsing/tokenizing any source files, the compiler
-- generates a list of all the source files and then reserves the
-- low N bits of the location to hold the source file, where N is
-- large enough to hold at least twice as many source files as were
-- specified on the command line (to allow for a #line in each file).
-- The upper 32-N bits are the line number in that file.
--
-- The token 0 is reserved for ``anonymous'' locations, ie. if we
-- don't know the location (Location.Null).
--
-- The tokenizer also tracks the column number for a token, but
-- this is currently not being used or encoded. It could
-- probably be encoded in the low 9 bits, allowing for columns
-- from 1 to 512 to be encoded.
--
--* The Parser
--
-- The parser is written using Jay, which is a port of Berkeley
-- Yacc to Java, that I later ported to C#.
--
-- Many people ask why the grammar of the parser does not match
-- exactly the definition in the C# specification. The reason is
-- simple: the grammar in the C# specification is designed to be
-- consumed by humans, and not by a computer program. Before
-- you can feed this grammar to a tool, it needs to be simplified
-- to allow the tool to generate a correct parser for it.
--
-- In the Mono C# compiler, we use a class for each of the
-- statements and expressions in the C# language. For example,
-- there is a `While' class for the the `while' statement, a
-- `Cast' class to represent a cast expression and so on.
--
-- There is a Statement class, and an Expression class which are
-- the base classes for statements and expressions.
--
--** Namespaces
--
-- Using list.
--
--* Internal Representation
--
--** Expressions
--
-- Expressions in the Mono C# compiler are represented by the
-- `Expression' class. This is an abstract class that particular
-- kinds of expressions have to inherit from and override a few
-- methods.
--
-- The base Expression class contains two fields: `eclass' which
-- represents the "expression classification" (from the C#
-- specs) and the type of the expression.
--
-- Expressions have to be resolved before they are can be used.
-- The resolution process is implemented by overriding the
-- `DoResolve' method. The DoResolve method has to set the
-- `eclass' field and the `type', perform all error checking and
-- computations that will be required for code generation at this
-- stage.
--
-- The return value from DoResolve is an expression. Most of the
-- time an Expression derived class will return itself (return
-- this) when it will handle the emission of the code itself, or
-- it can return a new Expression.
--
-- For example, the parser will create an "ElementAccess" class
-- for:
--
-- a [0] = 1;
--
-- During the resolution process, the compiler will know whether
-- this is an array access, or an indexer access. And will
-- return either an ArrayAccess expression or an IndexerAccess
-- expression from DoResolve.
--
-- All errors must be reported during the resolution phase
-- (DoResolve) and if an error is detected the DoResolve method
-- will return null which is used to flag that an error condition
-- has ocurred, this will be used to stop compilation later on.
-- This means that anyone that calls Expression.Resolve must
-- check the return value for null which would indicate an error
-- condition.
--
-- The second stage that Expressions participate in is code
-- generation, this is done by overwriting the "Emit" method of
-- the Expression class. No error checking must be performed
-- during this stage.
--
--** Simple Names, MemberAccess
--
-- One of the most important classes in the compiler is
-- "SimpleName" which represents a simple name (from the C#
-- specification). The names during the resolution time are
-- bound to field names, parameter names or local variable names.
--
-- More complicated expressions like:
--
-- Math.Sin
--
-- Are composed using the MemberAccess class which contains a
-- name (Math) and a SimpleName (Sin), this helps driving the
-- resolution process.
--
--** Types
--
-- The parser creates expressions to represent types during
-- compilation. For example:
--
-- class Sample {
--
-- Version vers;
--
-- }
--
--
-- That will produce a "SimpleName" expression for the "Version"
-- word. And in this particular case, the parser will introduce
-- "Version vers" as a field declaration.
--
-- During the resolution process for the fields, the compiler
-- will have to resolve the word "Version" to a type. This is
-- done by using the "ResolveAsType" method in Expression instead
-- of using "Resolve".
--
-- ResolveAsType just turns on a different set of code paths for
-- things like SimpleNames and does a different kind of error
-- checking than the one used by regular expressions.
--
--
--** Constants
--
-- Constants in the Mono C# compiler are represented by the
-- abstract class `Constant'. Constant is in turn derived from
-- Expression. The base constructor for `Constant' just sets the
-- expression class to be an `ExprClass.Value', Constants are
-- born in a fully resolved state, so the `DoResolve' method
-- only returns a reference to itself.
--
-- Each Constant should implement the `GetValue' method which
-- returns an object with the actual contents of this constant, a
-- utility virtual method called `AsString' is used to render a
-- diagnostic message. The output of AsString is shown to the
-- developer when an error or a warning is triggered.
--
-- Constant classes also participate in the constant folding
-- process. Constant folding is invoked by those expressions
-- that can be constant folded invoking the functionality
-- provided by the ConstantFold class (cfold.cs).
--
-- Each Constant has to implement a number of methods to convert
-- itself into a Constant of a different type. These methods are
-- called `ConvertToXXXX' and they are invoked by the wrapper
-- functions `ToXXXX'. These methods only perform implicit
-- numeric conversions. Explicit conversions are handled by the
-- `Cast' expression class.
--
-- The `ToXXXX' methods are the entry point, and provide error
-- reporting in case a conversion can not be performed.
--
--** Constant Folding
--
-- The C# language requires constant folding to be implemented.
-- Constant folding is hooked up in the Binary.Resolve method.
-- If both sides of a binary expression are constants, then the
-- ConstantFold.BinaryFold routine is invoked.
--
-- This routine implements all the binary operator rules, it
-- is a mirror of the code that generates code for binary
-- operators, but that has to be evaluated at runtime.
--
-- If the constants can be folded, then a new constant expression
-- is returned, if not, then the null value is returned (for
-- example, the concatenation of a string constant and a numeric
-- constant is deferred to the runtime).
--
--** Side effects
--
-- a [i++]++
-- a [i++] += 5;
--
--** Statements
--
--* The semantic analysis
--
-- Hence, the compiler driver has to parse all the input files.
-- Once all the input files have been parsed, and an internal
-- representation of the input program exists, the following
-- steps are taken:
--
-- * The interface hierarchy is resolved first.
-- As the interface hierarchy is constructed,
-- TypeBuilder objects are created for each one of
-- them.
--
-- * Classes and structure hierarchy is resolved next,
-- TypeBuilder objects are created for them.
--
-- * Constants and enumerations are resolved.
--
-- * Method, indexer, properties, delegates and event
-- definitions are now entered into the TypeBuilders.
--
-- * Elements that contain code are now invoked to
-- perform semantic analysis and code generation.
--
--* Output Generation
--
--** Code Generation
--
-- The EmitContext class is created any time that IL code is to
-- be generated (methods, properties, indexers and attributes all
-- create EmitContexts).
--
-- The EmitContext keeps track of the current namespace and type
-- container. This is used during name resolution.
--
-- An EmitContext is used by the underlying code generation
-- facilities to track the state of code generation:
--
-- * The ILGenerator used to generate code for this
-- method.
--
-- * The TypeContainer where the code lives, this is used
-- to access the TypeBuilder.
--
-- * The DeclSpace, this is used to resolve names through
-- RootContext.LookupType in the various statements and
-- expressions.
--
-- Code generation state is also tracked here:
--
-- * CheckState:
--
-- This variable tracks the `checked' state of the
-- compilation, it controls whether we should generate
-- code that does overflow checking, or if we generate
-- code that ignores overflows.
--
-- The default setting comes from the command line
-- option to generate checked or unchecked code plus
-- any source code changes using the checked/unchecked
-- statements or expressions. Contrast this with the
-- ConstantCheckState flag.
--
-- * ConstantCheckState
--
-- The constant check state is always set to `true' and
-- cant be changed from the command line. The source
-- code can change this setting with the `checked' and
-- `unchecked' statements and expressions.
--
-- * IsStatic
--
-- Whether we are emitting code inside a static or
-- instance method
--
-- * ReturnType
--
-- The value that is allowed to be returned or NULL if
-- there is no return type.
--
-- * ReturnLabel
--
-- A `Label' used by the code if it must jump to it.
-- This is used by a few routines that deals with exception
-- handling.
--
-- * HasReturnLabel
--
-- Whether we have a return label defined by the toplevel
-- driver.
--
-- * ContainerType
--
-- Points to the Type (extracted from the
-- TypeContainer) that declares this body of code
-- summary>
--
--
-- * IsConstructor
--
-- Whether this is generating code for a constructor
--
-- * CurrentBlock
--
-- Tracks the current block being generated.
--
-- * ReturnLabel;
--
-- The location where return has to jump to return the
-- value
--
-- A few variables are used to track the state for checking in
-- for loops, or in try/catch statements:
--
-- * InFinally
--
-- Whether we are in a Finally block
--
-- * InTry
--
-- Whether we are in a Try block
--
-- * InCatch
--
-- Whether we are in a Catch block
--
-- * InUnsafe
-- Whether we are inside an unsafe block
--
-- Methods exposed by the EmitContext:
--
-- * EmitTopBlock()
--
-- This emits a toplevel block.
--
-- This routine is very simple, to allow the anonymous
-- method support to roll its two-stage version of this
-- routine on its own.
--
-- * NeedReturnLabel ():
--
-- This is used to flag during the resolution phase that
-- the driver needs to initialize the `ReturnLabel'
--
--* Anonymous Methods
--
-- The introduction of anonymous methods in the compiler changed
-- various ways of doing things in the compiler. The most
-- significant one is the hard split between the resolution phase
-- and the emission phases of the compiler.
--
-- For instance, routines that referenced local variables no
-- longer can safely create temporary variables during the
-- resolution phase: they must do so from the emission phase,
-- since the variable might have been "captured", hence access to
-- it can not be done with the local-variable operations from the
-- runtime.
--
-- The code emission is in:
--
-- EmitTopBlock ()
--
-- Which drives the process, it first resolves the topblock, then
-- emits the required metadata (local variable definitions) and
-- finally emits the code.
--
--* Miscellaneous
--
--** Error Processing.
--
-- Errors are reported during the various stages of the
-- compilation process. The compiler stops its processing if
-- there are errors between the various phases. This simplifies
-- the code, because it is safe to assume always that the data
-- structures that the compiler is operating on are always
-- consistent.
--
-- The error codes in the Mono C# compiler are the same as those
-- found in the Microsoft C# compiler, with a few exceptions
-- (where we report a few more errors, those are documented in
-- mcs/errors/errors.txt). The goal is to reduce confusion to
-- the users, and also to help us track the progress of the
-- compiler in terms of the errors we report.
--
-- The Report class provides error and warning display functions,
-- and also keeps an error count which is used to stop the
-- compiler between the phases.
--
-- A couple of debugging tools are available here, and are useful
-- when extending or fixing bugs in the compiler. If the
-- `--fatal' flag is passed to the compiler, the Report.Error
-- routine will throw an exception. This can be used to pinpoint
-- the location of the bug and examine the variables around the
-- error location.
--
-- Warnings can be turned into errors by using the `--werror'
-- flag to the compiler.
--
-- The report class also ignores warnings that have been
-- specified on the command line with the `--nowarn' flag.
--
-- Finally, code in the compiler uses the global variable
-- RootContext.WarningLevel in a few places to decide whether a
-- warning is worth reporting to the user or not.
--
--* Debugging the compiler
--
-- Sometimes it is convenient to find *how* a particular error
-- message is being reported from, to do that, you might want to use
-- the --fatal flag to mcs. The flag will instruct the compiler to
-- abort with a stack trace execution when the error is reported.
--
-- You can use this with -warnaserror to obtain the same effect
-- with warnings.
--
--* Editing the compiler sources
--
-- The compiler sources are intended to be edited with 134 columns of width
--
--- /dev/null
--- /dev/null
++ The Internals of the Mono C# Compiler
++
++ Miguel de Icaza
++ (miguel@ximian.com)
++ 2002
++
++* Abstract
++
++ The Mono C# compiler is a C# compiler written in C# itself.
++ Its goals are to provide a free and alternate implementation
++ of the C# language. The Mono C# compiler generates ECMA CIL
++ images through the use of the System.Reflection.Emit API which
++ enable the compiler to be platform independent.
++
++* Overview: How the compiler fits together
++
++ The compilation process is managed by the compiler driver (it
++ lives in driver.cs).
++
++ The compiler reads a set of C# source code files, and parses
++ them. Any assemblies or modules that the user might want to
++ use with his project are loaded after parsing is done.
++
++ Once all the files have been parsed, the type hierarchy is
++ resolved. First interfaces are resolved, then types and
++ enumerations.
++
++ Once the type hierarchy is resolved, every type is populated:
++ fields, methods, indexers, properties, events and delegates
++ are entered into the type system.
++
++ At this point the program skeleton has been completed. The
++ next process is to actually emit the code for each of the
++ executable methods. The compiler drives this from
++ RootContext.EmitCode.
++
++ Each type then has to populate its methods: populating a
++ method requires creating a structure that is used as the state
++ of the block being emitted (this is the EmitContext class) and
++ then generating code for the topmost statement (the Block).
++
++ Code generation has two steps: the first step is the semantic
++ analysis (Resolve method) that resolves any pending tasks, and
++ guarantees that the code is correct. The second phase is the
++ actual code emission. All errors are flagged during in the
++ "Resolution" process.
++
++ After all code has been emitted, then the compiler closes all
++ the types (this basically tells the Reflection.Emit library to
++ finish up the types), resources, and definition of the entry
++ point are done at this point, and the output is saved to
++ disk.
++
++ The following list will give you an idea of where the
++ different pieces of the compiler live:
++
++ Infrastructure:
++
++ driver.cs:
++ This drives the compilation process: loading of
++ command line options; parsing the inputs files;
++ loading the referenced assemblies; resolving the type
++ hierarchy and emitting the code.
++
++ codegen.cs:
++
++ The state tracking for code generation.
++
++ attribute.cs:
++
++ Code to do semantic analysis and emit the attributes
++ is here.
++
++ rootcontext.cs:
++
++ Keeps track of the types defined in the source code,
++ as well as the assemblies loaded.
++
++ typemanager.cs:
++
++ This contains the MCS type system.
++
++ report.cs:
++
++ Error and warning reporting methods.
++
++ support.cs:
++
++ Assorted utility functions used by the compiler.
++
++ Parsing
++
++ cs-tokenizer.cs:
++
++ The tokenizer for the C# language, it includes also
++ the C# pre-processor.
++
++ cs-parser.jay, cs-parser.cs:
++
++ The parser is implemented using a C# port of the Yacc
++ parser. The parser lives in the cs-parser.jay file,
++ and cs-parser.cs is the generated parser.
++
++ location.cs:
++
++ The `location' structure is a compact representation
++ of a file, line, column where a token, or a high-level
++ construct appears. This is used to report errors.
++
++ Expressions:
++
++ ecore.cs
++
++ Basic expression classes, and interfaces most shared
++ code and static methods are here.
++
++ expression.cs:
++
++ Most of the different kinds of expressions classes
++ live in this file.
++
++ assign.cs:
++
++ The assignment expression got its own file.
++
++ constant.cs:
++
++ The classes that represent the constant expressions.
++
++ literal.cs
++
++ Literals are constants that have been entered manually
++ in the source code, like `1' or `true'. The compiler
++ needs to tell constants from literals apart during the
++ compilation process, as literals sometimes have some
++ implicit extra conversions defined for them.
++
++ cfold.cs:
++
++ The constant folder for binary expressions.
++
++ Statements
++
++ statement.cs:
++
++ All of the abstract syntax tree elements for
++ statements live in this file. This also drives the
++ semantic analysis process.
++
++ iterators.cs:
++
++ Contains the support for implementing iterators from
++ the C# 2.0 specification.
++
++ Declarations, Classes, Structs, Enumerations
++
++ decl.cs
++
++ This contains the base class for Members and
++ Declaration Spaces. A declaration space introduces
++ new names in types, so classes, structs, delegates and
++ enumerations derive from it.
++
++ class.cs:
++
++ Methods for holding and defining class and struct
++ information, and every member that can be in these
++ (methods, fields, delegates, events, etc).
++
++ The most interesting type here is the `TypeContainer'
++ which is a derivative of the `DeclSpace'
++
++ delegate.cs:
++
++ Handles delegate definition and use.
++
++ enum.cs:
++
++ Handles enumerations.
++
++ interface.cs:
++
++ Holds and defines interfaces. All the code related to
++ interface declaration lives here.
++
++ parameter.cs:
++
++ During the parsing process, the compiler encapsulates
++ parameters in the Parameter and Parameters classes.
++ These classes provide definition and resolution tools
++ for them.
++
++ pending.cs:
++
++ Routines to track pending implementations of abstract
++ methods and interfaces. These are used by the
++ TypeContainer-derived classes to track whether every
++ method required is implemented.
++
++
++* The parsing process
++
++ All the input files that make up a program need to be read in
++ advance, because C# allows declarations to happen after an
++ entity is used, for example, the following is a valid program:
++
++ class X : Y {
++ static void Main ()
++ {
++ a = "hello"; b = "world";
++ }
++ string a;
++ }
++
++ class Y {
++ public string b;
++ }
++
++ At the time the assignment expression `a = "hello"' is parsed,
++ it is not know whether a is a class field from this class, or
++ its parents, or whether it is a property access or a variable
++ reference. The actual meaning of `a' will not be discovered
++ until the semantic analysis phase.
++
++** The Tokenizer and the pre-processor
++
++ The tokenizer is contained in the file `cs-tokenizer.cs', and
++ the main entry point is the `token ()' method. The tokenizer
++ implements the `yyParser.yyInput' interface, which is what the
++ Yacc/Jay parser will use when fetching tokens.
++
++ Token definitions are generated by jay during the compilation
++ process, and those can be references from the tokenizer class
++ with the `Token.' prefix.
++
++ Each time a token is returned, the location for the token is
++ recorded into the `Location' property, that can be accessed by
++ the parser. The parser retrieves the Location properties as
++ it builds its internal representation to allow the semantic
++ analysis phase to produce error messages that can pin point
++ the location of the problem.
++
++ Some tokens have values associated with it, for example when
++ the tokenizer encounters a string, it will return a
++ LITERAL_STRING token, and the actual string parsed will be
++ available in the `Value' property of the tokenizer. The same
++ mechanism is used to return integers and floating point
++ numbers.
++
++ C# has a limited pre-processor that allows conditional
++ compilation, but it is not as fully featured as the C
++ pre-processor, and most notably, macros are missing. This
++ makes it simple to implement in very few lines and mesh it
++ with the tokenizer.
++
++ The `handle_preprocessing_directive' method in the tokenizer
++ handles all the pre-processing, and it is invoked when the '#'
++ symbol is found as the first token in a line.
++
++ The state of the pre-processor is contained in a Stack called
++ `ifstack', this state is used to track the if/elif/else/endif
++ nesting and the current state. The state is encoded in the
++ top of the stack as a number of values `TAKING',
++ `TAKEN_BEFORE', `ELSE_SEEN', `PARENT_TAKING'.
++
++** Locations
++
++ Locations are encoded as a 32-bit number (the Location
++ struct) that map each input source line to a linear number.
++ As new files are parsed, the Location manager is informed of
++ the new file, to allow it to map back from an int constant to
++ a file + line number.
++
++ Prior to parsing/tokenizing any source files, the compiler
++ generates a list of all the source files and then reserves the
++ low N bits of the location to hold the source file, where N is
++ large enough to hold at least twice as many source files as were
++ specified on the command line (to allow for a #line in each file).
++ The upper 32-N bits are the line number in that file.
++
++ The token 0 is reserved for ``anonymous'' locations, ie. if we
++ don't know the location (Location.Null).
++
++ The tokenizer also tracks the column number for a token, but
++ this is currently not being used or encoded. It could
++ probably be encoded in the low 9 bits, allowing for columns
++ from 1 to 512 to be encoded.
++
++* The Parser
++
++ The parser is written using Jay, which is a port of Berkeley
++ Yacc to Java, that I later ported to C#.
++
++ Many people ask why the grammar of the parser does not match
++ exactly the definition in the C# specification. The reason is
++ simple: the grammar in the C# specification is designed to be
++ consumed by humans, and not by a computer program. Before
++ you can feed this grammar to a tool, it needs to be simplified
++ to allow the tool to generate a correct parser for it.
++
++ In the Mono C# compiler, we use a class for each of the
++ statements and expressions in the C# language. For example,
++ there is a `While' class for the the `while' statement, a
++ `Cast' class to represent a cast expression and so on.
++
++ There is a Statement class, and an Expression class which are
++ the base classes for statements and expressions.
++
++** Namespaces
++
++ Using list.
++
++* Internal Representation
++
++** Expressions
++
++ Expressions in the Mono C# compiler are represented by the
++ `Expression' class. This is an abstract class that particular
++ kinds of expressions have to inherit from and override a few
++ methods.
++
++ The base Expression class contains two fields: `eclass' which
++ represents the "expression classification" (from the C#
++ specs) and the type of the expression.
++
++ Expressions have to be resolved before they are can be used.
++ The resolution process is implemented by overriding the
++ `DoResolve' method. The DoResolve method has to set the
++ `eclass' field and the `type', perform all error checking and
++ computations that will be required for code generation at this
++ stage.
++
++ The return value from DoResolve is an expression. Most of the
++ time an Expression derived class will return itself (return
++ this) when it will handle the emission of the code itself, or
++ it can return a new Expression.
++
++ For example, the parser will create an "ElementAccess" class
++ for:
++
++ a [0] = 1;
++
++ During the resolution process, the compiler will know whether
++ this is an array access, or an indexer access. And will
++ return either an ArrayAccess expression or an IndexerAccess
++ expression from DoResolve.
++
++ All errors must be reported during the resolution phase
++ (DoResolve) and if an error is detected the DoResolve method
++ will return null which is used to flag that an error condition
++ has ocurred, this will be used to stop compilation later on.
++ This means that anyone that calls Expression.Resolve must
++ check the return value for null which would indicate an error
++ condition.
++
++ The second stage that Expressions participate in is code
++ generation, this is done by overwriting the "Emit" method of
++ the Expression class. No error checking must be performed
++ during this stage.
++
++** Simple Names, MemberAccess
++
++ One of the most important classes in the compiler is
++ "SimpleName" which represents a simple name (from the C#
++ specification). The names during the resolution time are
++ bound to field names, parameter names or local variable names.
++
++ More complicated expressions like:
++
++ Math.Sin
++
++ Are composed using the MemberAccess class which contains a
++ name (Math) and a SimpleName (Sin), this helps driving the
++ resolution process.
++
++** Types
++
++ The parser creates expressions to represent types during
++ compilation. For example:
++
++ class Sample {
++
++ Version vers;
++
++ }
++
++
++ That will produce a "SimpleName" expression for the "Version"
++ word. And in this particular case, the parser will introduce
++ "Version vers" as a field declaration.
++
++ During the resolution process for the fields, the compiler
++ will have to resolve the word "Version" to a type. This is
++ done by using the "ResolveAsType" method in Expression instead
++ of using "Resolve".
++
++ ResolveAsType just turns on a different set of code paths for
++ things like SimpleNames and does a different kind of error
++ checking than the one used by regular expressions.
++
++
++** Constants
++
++ Constants in the Mono C# compiler are represented by the
++ abstract class `Constant'. Constant is in turn derived from
++ Expression. The base constructor for `Constant' just sets the
++ expression class to be an `ExprClass.Value', Constants are
++ born in a fully resolved state, so the `DoResolve' method
++ only returns a reference to itself.
++
++ Each Constant should implement the `GetValue' method which
++ returns an object with the actual contents of this constant, a
++ utility virtual method called `AsString' is used to render a
++ diagnostic message. The output of AsString is shown to the
++ developer when an error or a warning is triggered.
++
++ Constant classes also participate in the constant folding
++ process. Constant folding is invoked by those expressions
++ that can be constant folded invoking the functionality
++ provided by the ConstantFold class (cfold.cs).
++
++ Each Constant has to implement a number of methods to convert
++ itself into a Constant of a different type. These methods are
++ called `ConvertToXXXX' and they are invoked by the wrapper
++ functions `ToXXXX'. These methods only perform implicit
++ numeric conversions. Explicit conversions are handled by the
++ `Cast' expression class.
++
++ The `ToXXXX' methods are the entry point, and provide error
++ reporting in case a conversion can not be performed.
++
++** Constant Folding
++
++ The C# language requires constant folding to be implemented.
++ Constant folding is hooked up in the Binary.Resolve method.
++ If both sides of a binary expression are constants, then the
++ ConstantFold.BinaryFold routine is invoked.
++
++ This routine implements all the binary operator rules, it
++ is a mirror of the code that generates code for binary
++ operators, but that has to be evaluated at runtime.
++
++ If the constants can be folded, then a new constant expression
++ is returned, if not, then the null value is returned (for
++ example, the concatenation of a string constant and a numeric
++ constant is deferred to the runtime).
++
++** Side effects
++
++ a [i++]++
++ a [i++] += 5;
++
++** Statements
++
++* The semantic analysis
++
++ Hence, the compiler driver has to parse all the input files.
++ Once all the input files have been parsed, and an internal
++ representation of the input program exists, the following
++ steps are taken:
++
++ * The interface hierarchy is resolved first.
++ As the interface hierarchy is constructed,
++ TypeBuilder objects are created for each one of
++ them.
++
++ * Classes and structure hierarchy is resolved next,
++ TypeBuilder objects are created for them.
++
++ * Constants and enumerations are resolved.
++
++ * Method, indexer, properties, delegates and event
++ definitions are now entered into the TypeBuilders.
++
++ * Elements that contain code are now invoked to
++ perform semantic analysis and code generation.
++
++* Output Generation
++
++** Code Generation
++
++ The EmitContext class is created any time that IL code is to
++ be generated (methods, properties, indexers and attributes all
++ create EmitContexts).
++
++ The EmitContext keeps track of the current namespace and type
++ container. This is used during name resolution.
++
++ An EmitContext is used by the underlying code generation
++ facilities to track the state of code generation:
++
++ * The ILGenerator used to generate code for this
++ method.
++
++ * The TypeContainer where the code lives, this is used
++ to access the TypeBuilder.
++
++ * The DeclSpace, this is used to resolve names through
++ RootContext.LookupType in the various statements and
++ expressions.
++
++ Code generation state is also tracked here:
++
++ * CheckState:
++
++ This variable tracks the `checked' state of the
++ compilation, it controls whether we should generate
++ code that does overflow checking, or if we generate
++ code that ignores overflows.
++
++ The default setting comes from the command line
++ option to generate checked or unchecked code plus
++ any source code changes using the checked/unchecked
++ statements or expressions. Contrast this with the
++ ConstantCheckState flag.
++
++ * ConstantCheckState
++
++ The constant check state is always set to `true' and
++ cant be changed from the command line. The source
++ code can change this setting with the `checked' and
++ `unchecked' statements and expressions.
++
++ * IsStatic
++
++ Whether we are emitting code inside a static or
++ instance method
++
++ * ReturnType
++
++ The value that is allowed to be returned or NULL if
++ there is no return type.
++
++ * ReturnLabel
++
++ A `Label' used by the code if it must jump to it.
++ This is used by a few routines that deals with exception
++ handling.
++
++ * HasReturnLabel
++
++ Whether we have a return label defined by the toplevel
++ driver.
++
++ * ContainerType
++
++ Points to the Type (extracted from the
++ TypeContainer) that declares this body of code
++ summary>
++
++
++ * IsConstructor
++
++ Whether this is generating code for a constructor
++
++ * CurrentBlock
++
++ Tracks the current block being generated.
++
++ * ReturnLabel;
++
++ The location where return has to jump to return the
++ value
++
++ A few variables are used to track the state for checking in
++ for loops, or in try/catch statements:
++
++ * InFinally
++
++ Whether we are in a Finally block
++
++ * InTry
++
++ Whether we are in a Try block
++
++ * InCatch
++
++ Whether we are in a Catch block
++
++ * InUnsafe
++ Whether we are inside an unsafe block
++
++ Methods exposed by the EmitContext:
++
++ * EmitTopBlock()
++
++ This emits a toplevel block.
++
++ This routine is very simple, to allow the anonymous
++ method support to roll its two-stage version of this
++ routine on its own.
++
++ * NeedReturnLabel ():
++
++ This is used to flag during the resolution phase that
++ the driver needs to initialize the `ReturnLabel'
++
++* Anonymous Methods
++
++ The introduction of anonymous methods in the compiler changed
++ various ways of doing things in the compiler. The most
++ significant one is the hard split between the resolution phase
++ and the emission phases of the compiler.
++
++ For instance, routines that referenced local variables no
++ longer can safely create temporary variables during the
++ resolution phase: they must do so from the emission phase,
++ since the variable might have been "captured", hence access to
++ it can not be done with the local-variable operations from the
++ runtime.
++
++ The code emission is in:
++
++ EmitTopBlock ()
++
++ Which drives the process, it first resolves the topblock, then
++ emits the required metadata (local variable definitions) and
++ finally emits the code.
++
++* Miscellaneous
++
++** Error Processing.
++
++ Errors are reported during the various stages of the
++ compilation process. The compiler stops its processing if
++ there are errors between the various phases. This simplifies
++ the code, because it is safe to assume always that the data
++ structures that the compiler is operating on are always
++ consistent.
++
++ The error codes in the Mono C# compiler are the same as those
++ found in the Microsoft C# compiler, with a few exceptions
++ (where we report a few more errors, those are documented in
++ mcs/errors/errors.txt). The goal is to reduce confusion to
++ the users, and also to help us track the progress of the
++ compiler in terms of the errors we report.
++
++ The Report class provides error and warning display functions,
++ and also keeps an error count which is used to stop the
++ compiler between the phases.
++
++ A couple of debugging tools are available here, and are useful
++ when extending or fixing bugs in the compiler. If the
++ `--fatal' flag is passed to the compiler, the Report.Error
++ routine will throw an exception. This can be used to pinpoint
++ the location of the bug and examine the variables around the
++ error location.
++
++ Warnings can be turned into errors by using the `--werror'
++ flag to the compiler.
++
++ The report class also ignores warnings that have been
++ specified on the command line with the `--nowarn' flag.
++
++ Finally, code in the compiler uses the global variable
++ RootContext.WarningLevel in a few places to decide whether a
++ warning is worth reporting to the user or not.
++
++* Debugging the compiler
++
++ Sometimes it is convenient to find *how* a particular error
++ message is being reported from, to do that, you might want to use
++ the --fatal flag to mcs. The flag will instruct the compiler to
++ abort with a stack trace execution when the error is reported.
++
++ You can use this with -warnaserror to obtain the same effect
++ with warnings.
++
++* Editing the compiler sources
++
++ The compiler sources are intended to be edited with 134 columns of width
++