Rename this file

author Miguel de Icaza <miguel@gnome.org>

Tue, 12 Sep 2006 16:37:52 +0000 (16:37 -0000)

committer Miguel de Icaza <miguel@gnome.org>

Tue, 12 Sep 2006 16:37:52 +0000 (16:37 -0000)
author Miguel de Icaza <miguel@gnome.org>
Tue, 12 Sep 2006 16:37:52 +0000 (16:37 -0000)
committer Miguel de Icaza <miguel@gnome.org>
Tue, 12 Sep 2006 16:37:52 +0000 (16:37 -0000)
diff --cc mcs/docs/Makefile

index 74f77840747d0daa9c76086af26d3d03ec3f439b,74f77840747d0daa9c76086af26d3d03ec3f439b..8e8b87fd9009cc1b6dc2db47e9316877c2d24e10
--- 1/mcs/docs/Makefile
--- 2/mcs/docs/Makefile
+++ b/mcs/docs/Makefile
@@@ -2,7 -2,7 +2,7 @@@ thisdir = doc
   SUBDIRS = 
   include ../build/rules.make
   
--DISTFILES = clr-abi.txt compiler control-flow-analysis.txt order.txt
++DISTFILES = clr-abi.txt compiler.txt control-flow-analysis.txt order.txt
   
   all-local install-local clean-local test-local run-test-local run-test-ondotnet-local uninstall-local:
   
diff --cc mcs/docs/compiler

index 8b340d2834df15016abf315b7ad647ad4384e13e,8b340d2834df15016abf315b7ad647ad4384e13e..0000000000000000000000000000000000000000

deleted file mode 100755,100755
--- 1/mcs/docs/compiler
--- 2/mcs/docs/compiler
+++ /dev/null
@@@ -1,672 -1,672 +1,0 @@@
--                     The Internals of the Mono C# Compiler
--      
--                              Miguel de Icaza
--                            (miguel@ximian.com)
--                                    2002
--
--* Abstract
--
--      The Mono C# compiler is a C# compiler written in C# itself.
--      Its goals are to provide a free and alternate implementation
--      of the C# language.  The Mono C# compiler generates ECMA CIL
--      images through the use of the System.Reflection.Emit API which
--      enable the compiler to be platform independent.
--      
--* Overview: How the compiler fits together
--
--      The compilation process is managed by the compiler driver (it
--      lives in driver.cs).
--
--      The compiler reads a set of C# source code files, and parses
--      them.  Any assemblies or modules that the user might want to
--      use with his project are loaded after parsing is done.
--
--      Once all the files have been parsed, the type hierarchy is
--      resolved.  First interfaces are resolved, then types and
--      enumerations.
--
--      Once the type hierarchy is resolved, every type is populated:
--      fields, methods, indexers, properties, events and delegates
--      are entered into the type system.  
--
--      At this point the program skeleton has been completed.  The
--      next process is to actually emit the code for each of the
--      executable methods.  The compiler drives this from
--      RootContext.EmitCode.
--
--      Each type then has to populate its methods: populating a
--      method requires creating a structure that is used as the state
--      of the block being emitted (this is the EmitContext class) and
--      then generating code for the topmost statement (the Block).
--
--      Code generation has two steps: the first step is the semantic
--      analysis (Resolve method) that resolves any pending tasks, and
--      guarantees that the code is correct.  The second phase is the
--      actual code emission.  All errors are flagged during in the
--      "Resolution" process. 
--
--      After all code has been emitted, then the compiler closes all
--      the types (this basically tells the Reflection.Emit library to
--      finish up the types), resources, and definition of the entry
--      point are done at this point, and the output is saved to
--      disk. 
--
--      The following list will give you an idea of where the
--      different pieces of the compiler live:
--
--      Infrastructure:
--
--          driver.cs:
--              This drives the compilation process: loading of
--              command line options; parsing the inputs files;
--              loading the referenced assemblies; resolving the type
--              hierarchy and emitting the code. 
--
--          codegen.cs:
--              
--              The state tracking for code generation. 
--
--          attribute.cs:
--
--              Code to do semantic analysis and emit the attributes
--              is here.
--
--          rootcontext.cs:
--
--              Keeps track of the types defined in the source code,
--              as well as the assemblies loaded.  
--
--          typemanager.cs:
--
--              This contains the MCS type system.
--
--          report.cs:
--
--              Error and warning reporting methods.
--
--          support.cs:
--
--              Assorted utility functions used by the compiler.
--              
--      Parsing
--
--          cs-tokenizer.cs:
--
--              The tokenizer for the C# language, it includes also
--              the C# pre-processor.
--
--          cs-parser.jay, cs-parser.cs:
--
--              The parser is implemented using a C# port of the Yacc
--              parser.  The parser lives in the cs-parser.jay file,
--              and cs-parser.cs is the generated parser.
--
--          location.cs:
--
--              The `location' structure is a compact representation
--              of a file, line, column where a token, or a high-level
--              construct appears.  This is used to report errors.
--
--      Expressions:
--        
--          ecore.cs
--      
--              Basic expression classes, and interfaces most shared
--              code and static methods are here.
--
--          expression.cs:
--
--              Most of the different kinds of expressions classes
--              live in this file.
--
--          assign.cs:
--
--              The assignment expression got its own file.
--
--          constant.cs:
--
--              The classes that represent the constant expressions.
--
--          literal.cs
--              
--              Literals are constants that have been entered manually
--              in the source code, like `1' or `true'.  The compiler
--              needs to tell constants from literals apart during the
--              compilation process, as literals sometimes have some
--              implicit extra conversions defined for them. 
--
--          cfold.cs:
--
--              The constant folder for binary expressions.
--
--      Statements
--
--          statement.cs:
--
--              All of the abstract syntax tree elements for
--              statements live in this file.  This also drives the
--              semantic analysis process.
--
--          iterators.cs:
--
--              Contains the support for implementing iterators from
--              the C# 2.0 specification.
--
--      Declarations, Classes, Structs, Enumerations
--
--          decl.cs
--
--              This contains the base class for Members and
--              Declaration Spaces.   A declaration space introduces
--              new names in types, so classes, structs, delegates and
--              enumerations derive from it.
--
--          class.cs:
--              
--              Methods for holding and defining class and struct
--              information, and every member that can be in these
--              (methods, fields, delegates, events, etc).
--
--              The most interesting type here is the `TypeContainer'
--              which is a derivative of the `DeclSpace' 
--
--          delegate.cs:
--
--              Handles delegate definition and use. 
--
--          enum.cs:
--
--              Handles enumerations.
--
--          interface.cs:
--
--              Holds and defines interfaces.  All the code related to
--              interface declaration lives here.
--
--          parameter.cs:
--
--              During the parsing process, the compiler encapsulates
--              parameters in the Parameter and Parameters classes.
--              These classes provide definition and resolution tools
--              for them.
--
--          pending.cs:
--
--              Routines to track pending implementations of abstract
--              methods and interfaces.  These are used by the
--              TypeContainer-derived classes to track whether every
--              method required is implemented.
--
--      
--* The parsing process
--
--      All the input files that make up a program need to be read in
--      advance, because C# allows declarations to happen after an
--      entity is used, for example, the following is a valid program:
--
--      class X : Y {
--              static void Main ()
--              {
--                      a = "hello"; b = "world";
--              }
--              string a;
--      }
--      
--      class Y {
--              public string b;
--      }
--
--      At the time the assignment expression `a = "hello"' is parsed,
--      it is not know whether a is a class field from this class, or
--      its parents, or whether it is a property access or a variable
--      reference.  The actual meaning of `a' will not be discovered
--      until the semantic analysis phase.
--
--** The Tokenizer and the pre-processor
--
--      The tokenizer is contained in the file `cs-tokenizer.cs', and
--      the main entry point is the `token ()' method.  The tokenizer
--      implements the `yyParser.yyInput' interface, which is what the
--      Yacc/Jay parser will use when fetching tokens.  
--
--      Token definitions are generated by jay during the compilation
--      process, and those can be references from the tokenizer class
--      with the `Token.' prefix. 
--
--      Each time a token is returned, the location for the token is
--      recorded into the `Location' property, that can be accessed by
--      the parser.  The parser retrieves the Location properties as
--      it builds its internal representation to allow the semantic
--      analysis phase to produce error messages that can pin point
--      the location of the problem. 
--
--      Some tokens have values associated with it, for example when
--      the tokenizer encounters a string, it will return a
--      LITERAL_STRING token, and the actual string parsed will be
--      available in the `Value' property of the tokenizer.   The same
--      mechanism is used to return integers and floating point
--      numbers. 
--
--      C# has a limited pre-processor that allows conditional
--      compilation, but it is not as fully featured as the C
--      pre-processor, and most notably, macros are missing.  This
--      makes it simple to implement in very few lines and mesh it
--      with the tokenizer.
--
--      The `handle_preprocessing_directive' method in the tokenizer
--      handles all the pre-processing, and it is invoked when the '#'
--      symbol is found as the first token in a line.  
--
--      The state of the pre-processor is contained in a Stack called
--      `ifstack', this state is used to track the if/elif/else/endif
--      nesting and the current state.  The state is encoded in the
--      top of the stack as a number of values `TAKING',
--      `TAKEN_BEFORE', `ELSE_SEEN', `PARENT_TAKING'.
--
--** Locations
--
--      Locations are encoded as a 32-bit number (the Location
--      struct) that map each input source line to a linear number.
--      As new files are parsed, the Location manager is informed of
--      the new file, to allow it to map back from an int constant to
--      a file + line number.
--
--      Prior to parsing/tokenizing any source files, the compiler
--      generates a list of all the source files and then reserves the
--      low N bits of the location to hold the source file, where N is
--      large enough to hold at least twice as many source files as were
--      specified on the command line (to allow for a #line in each file).
--      The upper 32-N bits are the line number in that file.
--
--      The token 0 is reserved for ``anonymous'' locations, ie. if we
--      don't know the location (Location.Null).
--
--      The tokenizer also tracks the column number for a token, but
--      this is currently not being used or encoded.  It could
--      probably be encoded in the low 9 bits, allowing for columns
--      from 1 to 512 to be encoded.
--
--* The Parser
--
--      The parser is written using Jay, which is a port of Berkeley
--      Yacc to Java, that I later ported to C#. 
--
--      Many people ask why the grammar of the parser does not match
--      exactly the definition in the C# specification.  The reason is
--      simple: the grammar in the C# specification is designed to be
--      consumed by humans, and not by a computer program.  Before
--      you can feed this grammar to a tool, it needs to be simplified
--      to allow the tool to generate a correct parser for it. 
--
--      In the Mono C# compiler, we use a class for each of the
--      statements and expressions in the C# language.  For example,
--      there is a `While' class for the the `while' statement, a
--      `Cast' class to represent a cast expression and so on.
--
--      There is a Statement class, and an Expression class which are
--      the base classes for statements and expressions. 
--
--** Namespaces
--      
--      Using list.
--
--* Internal Representation
--
--** Expressions
--
--      Expressions in the Mono C# compiler are represented by the
--      `Expression' class.  This is an abstract class that particular
--      kinds of expressions have to inherit from and override a few
--      methods.
--
--      The base Expression class contains two fields: `eclass' which
--      represents the "expression classification" (from the C#
--      specs) and the type of the expression.
--
--      Expressions have to be resolved before they are can be used.
--      The resolution process is implemented by overriding the
--      `DoResolve' method.  The DoResolve method has to set the
--      `eclass' field and the `type', perform all error checking and
--      computations that will be required for code generation at this
--      stage. 
--
--      The return value from DoResolve is an expression.  Most of the
--      time an Expression derived class will return itself (return
--      this) when it will handle the emission of the code itself, or
--      it can return a new Expression.
--
--      For example, the parser will create an "ElementAccess" class
--      for:
--
--              a [0] = 1;
--
--      During the resolution process, the compiler will know whether
--      this is an array access, or an indexer access.  And will
--      return either an ArrayAccess expression or an IndexerAccess
--      expression from DoResolve.
--
--      All errors must be reported during the resolution phase
--      (DoResolve) and if an error is detected the DoResolve method
--      will return null which is used to flag that an error condition
--      has ocurred, this will be used to stop compilation later on.
--      This means that anyone that calls Expression.Resolve must
--      check the return value for null which would indicate an error
--      condition.
--
--      The second stage that Expressions participate in is code
--      generation, this is done by overwriting the "Emit" method of
--      the Expression class.  No error checking must be performed
--      during this stage.
--
--** Simple Names, MemberAccess
--
--      One of the most important classes in the compiler is
--      "SimpleName" which represents a simple name (from the C#
--      specification).  The names during the resolution time are
--      bound to field names, parameter names or local variable names.
--
--      More complicated expressions like:
--
--              Math.Sin
--
--      Are composed using the MemberAccess class which contains a
--      name (Math) and a SimpleName (Sin), this helps driving the
--      resolution process.
--
--** Types
--
--      The parser creates expressions to represent types during
--      compilation.  For example:
--
--         class Sample {
--
--              Version vers;
--
--         }
--
--
--      That will produce a "SimpleName" expression for the "Version"
--      word.  And in this particular case, the parser will introduce
--      "Version vers" as a field declaration.
--
--      During the resolution process for the fields, the compiler
--      will have to resolve the word "Version" to a type.  This is
--      done by using the "ResolveAsType" method in Expression instead
--      of using "Resolve".
--
--      ResolveAsType just turns on a different set of code paths for
--      things like SimpleNames and does a different kind of error
--      checking than the one used by regular expressions. 
--
--
--** Constants
--
--      Constants in the Mono C# compiler are represented by the
--      abstract class `Constant'.  Constant is in turn derived from
--      Expression.  The base constructor for `Constant' just sets the
--      expression class to be an `ExprClass.Value', Constants are
--      born in a fully resolved state, so the `DoResolve' method
--      only returns a reference to itself.
--
--      Each Constant should implement the `GetValue' method which
--      returns an object with the actual contents of this constant, a
--      utility virtual method called `AsString' is used to render a
--      diagnostic message.  The output of AsString is shown to the
--      developer when an error or a warning is triggered.
--
--      Constant classes also participate in the constant folding
--      process.  Constant folding is invoked by those expressions
--      that can be constant folded invoking the functionality
--      provided by the ConstantFold class (cfold.cs).   
--
--      Each Constant has to implement a number of methods to convert
--      itself into a Constant of a different type.  These methods are
--      called `ConvertToXXXX' and they are invoked by the wrapper
--      functions `ToXXXX'.  These methods only perform implicit
--      numeric conversions.  Explicit conversions are handled by the
--      `Cast' expression class.
--
--      The `ToXXXX' methods are the entry point, and provide error
--      reporting in case a conversion can not be performed.
--
--** Constant Folding
--
--      The C# language requires constant folding to be implemented.
--      Constant folding is hooked up in the Binary.Resolve method.
--      If both sides of a binary expression are constants, then the
--      ConstantFold.BinaryFold routine is invoked.  
--
--      This routine implements all the binary operator rules, it
--      is a mirror of the code that generates code for binary
--      operators, but that has to be evaluated at runtime.
--
--      If the constants can be folded, then a new constant expression
--      is returned, if not, then the null value is returned (for
--      example, the concatenation of a string constant and a numeric
--      constant is deferred to the runtime). 
--
--** Side effects
--
--      a [i++]++ 
--      a [i++] += 5;
--
--** Statements
--
--* The semantic analysis 
--
--      Hence, the compiler driver has to parse all the input files.
--      Once all the input files have been parsed, and an internal
--      representation of the input program exists, the following
--      steps are taken:
--
--              * The interface hierarchy is resolved first.
--                As the interface hierarchy is constructed,
--                TypeBuilder objects are created for each one of
--                them. 
--
--              * Classes and structure hierarchy is resolved next,
--                TypeBuilder objects are created for them.
--
--              * Constants and enumerations are resolved.
--
--              * Method, indexer, properties, delegates and event
--                definitions are now entered into the TypeBuilders. 
--
--              * Elements that contain code are now invoked to
--                perform semantic analysis and code generation.
--
--* Output Generation
--
--** Code Generation
--
--      The EmitContext class is created any time that IL code is to
--      be generated (methods, properties, indexers and attributes all
--      create EmitContexts).  
--
--      The EmitContext keeps track of the current namespace and type
--      container.  This is used during name resolution.
--
--      An EmitContext is used by the underlying code generation
--      facilities to track the state of code generation:
--
--              * The ILGenerator used to generate code for this
--                method.
--
--              * The TypeContainer where the code lives, this is used
--                to access the TypeBuilder.
--
--              * The DeclSpace, this is used to resolve names through
--                RootContext.LookupType in the various statements and
--                expressions. 
--      
--      Code generation state is also tracked here:
--
--              * CheckState:
--
--                This variable tracks the `checked' state of the
--                compilation, it controls whether we should generate
--                code that does overflow checking, or if we generate
--                code that ignores overflows.
--                
--                The default setting comes from the command line
--                option to generate checked or unchecked code plus
--                any source code changes using the checked/unchecked
--                statements or expressions.  Contrast this with the
--                ConstantCheckState flag.
--
--              * ConstantCheckState
--                
--                The constant check state is always set to `true' and
--                cant be changed from the command line.  The source
--                code can change this setting with the `checked' and
--                `unchecked' statements and expressions.
--                
--              * IsStatic
--                
--                Whether we are emitting code inside a static or
--                instance method
--                
--              * ReturnType
--                
--                The value that is allowed to be returned or NULL if
--                there is no return type.
--                
--              * ReturnLabel 
--
--                A `Label' used by the code if it must jump to it.
--                This is used by a few routines that deals with exception
--                handling.
--
--              * HasReturnLabel
--
--                Whether we have a return label defined by the toplevel
--                driver.
--                
--              * ContainerType
--                
--                Points to the Type (extracted from the
--                TypeContainer) that declares this body of code
--                summary>
--                
--                
--              * IsConstructor
--                
--                Whether this is generating code for a constructor
--
--              * CurrentBlock
--
--                Tracks the current block being generated.
--
--              * ReturnLabel;
--              
--                The location where return has to jump to return the
--                value
--
--      A few variables are used to track the state for checking in
--      for loops, or in try/catch statements:
--
--              * InFinally
--              
--                Whether we are in a Finally block
--
--              * InTry
--
--                Whether we are in a Try block
--
--              * InCatch
--                
--                Whether we are in a Catch block
--
--              * InUnsafe
--                Whether we are inside an unsafe block
--
--      Methods exposed by the EmitContext:
--
--              * EmitTopBlock()
--
--                This emits a toplevel block. 
--
--                This routine is very simple, to allow the anonymous
--                method support to roll its two-stage version of this
--                routine on its own.
--
--              * NeedReturnLabel ():
--
--                This is used to flag during the resolution phase that 
--                the driver needs to initialize the `ReturnLabel'
--
--* Anonymous Methods
--
--      The introduction of anonymous methods in the compiler changed
--      various ways of doing things in the compiler.  The most
--      significant one is the hard split between the resolution phase
--      and the emission phases of the compiler.
--
--      For instance, routines that referenced local variables no
--      longer can safely create temporary variables during the
--      resolution phase: they must do so from the emission phase,
--      since the variable might have been "captured", hence access to
--      it can not be done with the local-variable operations from the
--      runtime.
--
--      The code emission is in:
--
--              EmitTopBlock ()
--
--      Which drives the process, it first resolves the topblock, then
--      emits the required metadata (local variable definitions) and
--      finally emits the code.
--              
--* Miscellaneous
--
--** Error Processing.
--
--      Errors are reported during the various stages of the
--      compilation process.  The compiler stops its processing if
--      there are errors between the various phases.  This simplifies
--      the code, because it is safe to assume always that the data
--      structures that the compiler is operating on are always
--      consistent.
--
--      The error codes in the Mono C# compiler are the same as those
--      found in the Microsoft C# compiler, with a few exceptions
--      (where we report a few more errors, those are documented in
--      mcs/errors/errors.txt).  The goal is to reduce confusion to
--      the users, and also to help us track the progress of the
--      compiler in terms of the errors we report. 
--
--      The Report class provides error and warning display functions,
--      and also keeps an error count which is used to stop the
--      compiler between the phases.  
--
--      A couple of debugging tools are available here, and are useful
--      when extending or fixing bugs in the compiler.  If the
--      `--fatal' flag is passed to the compiler, the Report.Error
--      routine will throw an exception.  This can be used to pinpoint
--      the location of the bug and examine the variables around the
--      error location.
--
--      Warnings can be turned into errors by using the `--werror'
--      flag to the compiler. 
--
--      The report class also ignores warnings that have been
--      specified on the command line with the `--nowarn' flag.
--
--      Finally, code in the compiler uses the global variable
--      RootContext.WarningLevel in a few places to decide whether a
--      warning is worth reporting to the user or not.  
--
--* Debugging the compiler
--
--      Sometimes it is convenient to find *how* a particular error
--      message is being reported from, to do that, you might want to use
--      the --fatal flag to mcs.  The flag will instruct the compiler to 
--      abort with a stack trace execution when the error is reported.
--
--      You can use this with -warnaserror to obtain the same effect
--      with warnings. 
--
--* Editing the compiler sources
--
--      The compiler sources are intended to be edited with 134 columns of width
--      
diff --cc mcs/docs/compiler.txt

index 0000000000000000000000000000000000000000,0000000000000000000000000000000000000000..8b340d2834df15016abf315b7ad647ad4384e13e

new file mode 100755 (executable)
--- /dev/null
--- /dev/null
+++ b/mcs/docs/compiler.txt
@@@ -1,0 -1,0 +1,672 @@@
++                     The Internals of the Mono C# Compiler
++      
++                              Miguel de Icaza
++                            (miguel@ximian.com)
++                                    2002
++
++* Abstract
++
++      The Mono C# compiler is a C# compiler written in C# itself.
++      Its goals are to provide a free and alternate implementation
++      of the C# language.  The Mono C# compiler generates ECMA CIL
++      images through the use of the System.Reflection.Emit API which
++      enable the compiler to be platform independent.
++      
++* Overview: How the compiler fits together
++
++      The compilation process is managed by the compiler driver (it
++      lives in driver.cs).
++
++      The compiler reads a set of C# source code files, and parses
++      them.  Any assemblies or modules that the user might want to
++      use with his project are loaded after parsing is done.
++
++      Once all the files have been parsed, the type hierarchy is
++      resolved.  First interfaces are resolved, then types and
++      enumerations.
++
++      Once the type hierarchy is resolved, every type is populated:
++      fields, methods, indexers, properties, events and delegates
++      are entered into the type system.  
++
++      At this point the program skeleton has been completed.  The
++      next process is to actually emit the code for each of the
++      executable methods.  The compiler drives this from
++      RootContext.EmitCode.
++
++      Each type then has to populate its methods: populating a
++      method requires creating a structure that is used as the state
++      of the block being emitted (this is the EmitContext class) and
++      then generating code for the topmost statement (the Block).
++
++      Code generation has two steps: the first step is the semantic
++      analysis (Resolve method) that resolves any pending tasks, and
++      guarantees that the code is correct.  The second phase is the
++      actual code emission.  All errors are flagged during in the
++      "Resolution" process. 
++
++      After all code has been emitted, then the compiler closes all
++      the types (this basically tells the Reflection.Emit library to
++      finish up the types), resources, and definition of the entry
++      point are done at this point, and the output is saved to
++      disk. 
++
++      The following list will give you an idea of where the
++      different pieces of the compiler live:
++
++      Infrastructure:
++
++          driver.cs:
++              This drives the compilation process: loading of
++              command line options; parsing the inputs files;
++              loading the referenced assemblies; resolving the type
++              hierarchy and emitting the code. 
++
++          codegen.cs:
++              
++              The state tracking for code generation. 
++
++          attribute.cs:
++
++              Code to do semantic analysis and emit the attributes
++              is here.
++
++          rootcontext.cs:
++
++              Keeps track of the types defined in the source code,
++              as well as the assemblies loaded.  
++
++          typemanager.cs:
++
++              This contains the MCS type system.
++
++          report.cs:
++
++              Error and warning reporting methods.
++
++          support.cs:
++
++              Assorted utility functions used by the compiler.
++              
++      Parsing
++
++          cs-tokenizer.cs:
++
++              The tokenizer for the C# language, it includes also
++              the C# pre-processor.
++
++          cs-parser.jay, cs-parser.cs:
++
++              The parser is implemented using a C# port of the Yacc
++              parser.  The parser lives in the cs-parser.jay file,
++              and cs-parser.cs is the generated parser.
++
++          location.cs:
++
++              The `location' structure is a compact representation
++              of a file, line, column where a token, or a high-level
++              construct appears.  This is used to report errors.
++
++      Expressions:
++        
++          ecore.cs
++      
++              Basic expression classes, and interfaces most shared
++              code and static methods are here.
++
++          expression.cs:
++
++              Most of the different kinds of expressions classes
++              live in this file.
++
++          assign.cs:
++
++              The assignment expression got its own file.
++
++          constant.cs:
++
++              The classes that represent the constant expressions.
++
++          literal.cs
++              
++              Literals are constants that have been entered manually
++              in the source code, like `1' or `true'.  The compiler
++              needs to tell constants from literals apart during the
++              compilation process, as literals sometimes have some
++              implicit extra conversions defined for them. 
++
++          cfold.cs:
++
++              The constant folder for binary expressions.
++
++      Statements
++
++          statement.cs:
++
++              All of the abstract syntax tree elements for
++              statements live in this file.  This also drives the
++              semantic analysis process.
++
++          iterators.cs:
++
++              Contains the support for implementing iterators from
++              the C# 2.0 specification.
++
++      Declarations, Classes, Structs, Enumerations
++
++          decl.cs
++
++              This contains the base class for Members and
++              Declaration Spaces.   A declaration space introduces
++              new names in types, so classes, structs, delegates and
++              enumerations derive from it.
++
++          class.cs:
++              
++              Methods for holding and defining class and struct
++              information, and every member that can be in these
++              (methods, fields, delegates, events, etc).
++
++              The most interesting type here is the `TypeContainer'
++              which is a derivative of the `DeclSpace' 
++
++          delegate.cs:
++
++              Handles delegate definition and use. 
++
++          enum.cs:
++
++              Handles enumerations.
++
++          interface.cs:
++
++              Holds and defines interfaces.  All the code related to
++              interface declaration lives here.
++
++          parameter.cs:
++
++              During the parsing process, the compiler encapsulates
++              parameters in the Parameter and Parameters classes.
++              These classes provide definition and resolution tools
++              for them.
++
++          pending.cs:
++
++              Routines to track pending implementations of abstract
++              methods and interfaces.  These are used by the
++              TypeContainer-derived classes to track whether every
++              method required is implemented.
++
++      
++* The parsing process
++
++      All the input files that make up a program need to be read in
++      advance, because C# allows declarations to happen after an
++      entity is used, for example, the following is a valid program:
++
++      class X : Y {
++              static void Main ()
++              {
++                      a = "hello"; b = "world";
++              }
++              string a;
++      }
++      
++      class Y {
++              public string b;
++      }
++
++      At the time the assignment expression `a = "hello"' is parsed,
++      it is not know whether a is a class field from this class, or
++      its parents, or whether it is a property access or a variable
++      reference.  The actual meaning of `a' will not be discovered
++      until the semantic analysis phase.
++
++** The Tokenizer and the pre-processor
++
++      The tokenizer is contained in the file `cs-tokenizer.cs', and
++      the main entry point is the `token ()' method.  The tokenizer
++      implements the `yyParser.yyInput' interface, which is what the
++      Yacc/Jay parser will use when fetching tokens.  
++
++      Token definitions are generated by jay during the compilation
++      process, and those can be references from the tokenizer class
++      with the `Token.' prefix. 
++
++      Each time a token is returned, the location for the token is
++      recorded into the `Location' property, that can be accessed by
++      the parser.  The parser retrieves the Location properties as
++      it builds its internal representation to allow the semantic
++      analysis phase to produce error messages that can pin point
++      the location of the problem. 
++
++      Some tokens have values associated with it, for example when
++      the tokenizer encounters a string, it will return a
++      LITERAL_STRING token, and the actual string parsed will be
++      available in the `Value' property of the tokenizer.   The same
++      mechanism is used to return integers and floating point
++      numbers. 
++
++      C# has a limited pre-processor that allows conditional
++      compilation, but it is not as fully featured as the C
++      pre-processor, and most notably, macros are missing.  This
++      makes it simple to implement in very few lines and mesh it
++      with the tokenizer.
++
++      The `handle_preprocessing_directive' method in the tokenizer
++      handles all the pre-processing, and it is invoked when the '#'
++      symbol is found as the first token in a line.  
++
++      The state of the pre-processor is contained in a Stack called
++      `ifstack', this state is used to track the if/elif/else/endif
++      nesting and the current state.  The state is encoded in the
++      top of the stack as a number of values `TAKING',
++      `TAKEN_BEFORE', `ELSE_SEEN', `PARENT_TAKING'.
++
++** Locations
++
++      Locations are encoded as a 32-bit number (the Location
++      struct) that map each input source line to a linear number.
++      As new files are parsed, the Location manager is informed of
++      the new file, to allow it to map back from an int constant to
++      a file + line number.
++
++      Prior to parsing/tokenizing any source files, the compiler
++      generates a list of all the source files and then reserves the
++      low N bits of the location to hold the source file, where N is
++      large enough to hold at least twice as many source files as were
++      specified on the command line (to allow for a #line in each file).
++      The upper 32-N bits are the line number in that file.
++
++      The token 0 is reserved for ``anonymous'' locations, ie. if we
++      don't know the location (Location.Null).
++
++      The tokenizer also tracks the column number for a token, but
++      this is currently not being used or encoded.  It could
++      probably be encoded in the low 9 bits, allowing for columns
++      from 1 to 512 to be encoded.
++
++* The Parser
++
++      The parser is written using Jay, which is a port of Berkeley
++      Yacc to Java, that I later ported to C#. 
++
++      Many people ask why the grammar of the parser does not match
++      exactly the definition in the C# specification.  The reason is
++      simple: the grammar in the C# specification is designed to be
++      consumed by humans, and not by a computer program.  Before
++      you can feed this grammar to a tool, it needs to be simplified
++      to allow the tool to generate a correct parser for it. 
++
++      In the Mono C# compiler, we use a class for each of the
++      statements and expressions in the C# language.  For example,
++      there is a `While' class for the the `while' statement, a
++      `Cast' class to represent a cast expression and so on.
++
++      There is a Statement class, and an Expression class which are
++      the base classes for statements and expressions. 
++
++** Namespaces
++      
++      Using list.
++
++* Internal Representation
++
++** Expressions
++
++      Expressions in the Mono C# compiler are represented by the
++      `Expression' class.  This is an abstract class that particular
++      kinds of expressions have to inherit from and override a few
++      methods.
++
++      The base Expression class contains two fields: `eclass' which
++      represents the "expression classification" (from the C#
++      specs) and the type of the expression.
++
++      Expressions have to be resolved before they are can be used.
++      The resolution process is implemented by overriding the
++      `DoResolve' method.  The DoResolve method has to set the
++      `eclass' field and the `type', perform all error checking and
++      computations that will be required for code generation at this
++      stage. 
++
++      The return value from DoResolve is an expression.  Most of the
++      time an Expression derived class will return itself (return
++      this) when it will handle the emission of the code itself, or
++      it can return a new Expression.
++
++      For example, the parser will create an "ElementAccess" class
++      for:
++
++              a [0] = 1;
++
++      During the resolution process, the compiler will know whether
++      this is an array access, or an indexer access.  And will
++      return either an ArrayAccess expression or an IndexerAccess
++      expression from DoResolve.
++
++      All errors must be reported during the resolution phase
++      (DoResolve) and if an error is detected the DoResolve method
++      will return null which is used to flag that an error condition
++      has ocurred, this will be used to stop compilation later on.
++      This means that anyone that calls Expression.Resolve must
++      check the return value for null which would indicate an error
++      condition.
++
++      The second stage that Expressions participate in is code
++      generation, this is done by overwriting the "Emit" method of
++      the Expression class.  No error checking must be performed
++      during this stage.
++
++** Simple Names, MemberAccess
++
++      One of the most important classes in the compiler is
++      "SimpleName" which represents a simple name (from the C#
++      specification).  The names during the resolution time are
++      bound to field names, parameter names or local variable names.
++
++      More complicated expressions like:
++
++              Math.Sin
++
++      Are composed using the MemberAccess class which contains a
++      name (Math) and a SimpleName (Sin), this helps driving the
++      resolution process.
++
++** Types
++
++      The parser creates expressions to represent types during
++      compilation.  For example:
++
++         class Sample {
++
++              Version vers;
++
++         }
++
++
++      That will produce a "SimpleName" expression for the "Version"
++      word.  And in this particular case, the parser will introduce
++      "Version vers" as a field declaration.
++
++      During the resolution process for the fields, the compiler
++      will have to resolve the word "Version" to a type.  This is
++      done by using the "ResolveAsType" method in Expression instead
++      of using "Resolve".
++
++      ResolveAsType just turns on a different set of code paths for
++      things like SimpleNames and does a different kind of error
++      checking than the one used by regular expressions. 
++
++
++** Constants
++
++      Constants in the Mono C# compiler are represented by the
++      abstract class `Constant'.  Constant is in turn derived from
++      Expression.  The base constructor for `Constant' just sets the
++      expression class to be an `ExprClass.Value', Constants are
++      born in a fully resolved state, so the `DoResolve' method
++      only returns a reference to itself.
++
++      Each Constant should implement the `GetValue' method which
++      returns an object with the actual contents of this constant, a
++      utility virtual method called `AsString' is used to render a
++      diagnostic message.  The output of AsString is shown to the
++      developer when an error or a warning is triggered.
++
++      Constant classes also participate in the constant folding
++      process.  Constant folding is invoked by those expressions
++      that can be constant folded invoking the functionality
++      provided by the ConstantFold class (cfold.cs).   
++
++      Each Constant has to implement a number of methods to convert
++      itself into a Constant of a different type.  These methods are
++      called `ConvertToXXXX' and they are invoked by the wrapper
++      functions `ToXXXX'.  These methods only perform implicit
++      numeric conversions.  Explicit conversions are handled by the
++      `Cast' expression class.
++
++      The `ToXXXX' methods are the entry point, and provide error
++      reporting in case a conversion can not be performed.
++
++** Constant Folding
++
++      The C# language requires constant folding to be implemented.
++      Constant folding is hooked up in the Binary.Resolve method.
++      If both sides of a binary expression are constants, then the
++      ConstantFold.BinaryFold routine is invoked.  
++
++      This routine implements all the binary operator rules, it
++      is a mirror of the code that generates code for binary
++      operators, but that has to be evaluated at runtime.
++
++      If the constants can be folded, then a new constant expression
++      is returned, if not, then the null value is returned (for
++      example, the concatenation of a string constant and a numeric
++      constant is deferred to the runtime). 
++
++** Side effects
++
++      a [i++]++ 
++      a [i++] += 5;
++
++** Statements
++
++* The semantic analysis 
++
++      Hence, the compiler driver has to parse all the input files.
++      Once all the input files have been parsed, and an internal
++      representation of the input program exists, the following
++      steps are taken:
++
++              * The interface hierarchy is resolved first.
++                As the interface hierarchy is constructed,
++                TypeBuilder objects are created for each one of
++                them. 
++
++              * Classes and structure hierarchy is resolved next,
++                TypeBuilder objects are created for them.
++
++              * Constants and enumerations are resolved.
++
++              * Method, indexer, properties, delegates and event
++                definitions are now entered into the TypeBuilders. 
++
++              * Elements that contain code are now invoked to
++                perform semantic analysis and code generation.
++
++* Output Generation
++
++** Code Generation
++
++      The EmitContext class is created any time that IL code is to
++      be generated (methods, properties, indexers and attributes all
++      create EmitContexts).  
++
++      The EmitContext keeps track of the current namespace and type
++      container.  This is used during name resolution.
++
++      An EmitContext is used by the underlying code generation
++      facilities to track the state of code generation:
++
++              * The ILGenerator used to generate code for this
++                method.
++
++              * The TypeContainer where the code lives, this is used
++                to access the TypeBuilder.
++
++              * The DeclSpace, this is used to resolve names through
++                RootContext.LookupType in the various statements and
++                expressions. 
++      
++      Code generation state is also tracked here:
++
++              * CheckState:
++
++                This variable tracks the `checked' state of the
++                compilation, it controls whether we should generate
++                code that does overflow checking, or if we generate
++                code that ignores overflows.
++                
++                The default setting comes from the command line
++                option to generate checked or unchecked code plus
++                any source code changes using the checked/unchecked
++                statements or expressions.  Contrast this with the
++                ConstantCheckState flag.
++
++              * ConstantCheckState
++                
++                The constant check state is always set to `true' and
++                cant be changed from the command line.  The source
++                code can change this setting with the `checked' and
++                `unchecked' statements and expressions.
++                
++              * IsStatic
++                
++                Whether we are emitting code inside a static or
++                instance method
++                
++              * ReturnType
++                
++                The value that is allowed to be returned or NULL if
++                there is no return type.
++                
++              * ReturnLabel 
++
++                A `Label' used by the code if it must jump to it.
++                This is used by a few routines that deals with exception
++                handling.
++
++              * HasReturnLabel
++
++                Whether we have a return label defined by the toplevel
++                driver.
++                
++              * ContainerType
++                
++                Points to the Type (extracted from the
++                TypeContainer) that declares this body of code
++                summary>
++                
++                
++              * IsConstructor
++                
++                Whether this is generating code for a constructor
++
++              * CurrentBlock
++
++                Tracks the current block being generated.
++
++              * ReturnLabel;
++              
++                The location where return has to jump to return the
++                value
++
++      A few variables are used to track the state for checking in
++      for loops, or in try/catch statements:
++
++              * InFinally
++              
++                Whether we are in a Finally block
++
++              * InTry
++
++                Whether we are in a Try block
++
++              * InCatch
++                
++                Whether we are in a Catch block
++
++              * InUnsafe
++                Whether we are inside an unsafe block
++
++      Methods exposed by the EmitContext:
++
++              * EmitTopBlock()
++
++                This emits a toplevel block. 
++
++                This routine is very simple, to allow the anonymous
++                method support to roll its two-stage version of this
++                routine on its own.
++
++              * NeedReturnLabel ():
++
++                This is used to flag during the resolution phase that 
++                the driver needs to initialize the `ReturnLabel'
++
++* Anonymous Methods
++
++      The introduction of anonymous methods in the compiler changed
++      various ways of doing things in the compiler.  The most
++      significant one is the hard split between the resolution phase
++      and the emission phases of the compiler.
++
++      For instance, routines that referenced local variables no
++      longer can safely create temporary variables during the
++      resolution phase: they must do so from the emission phase,
++      since the variable might have been "captured", hence access to
++      it can not be done with the local-variable operations from the
++      runtime.
++
++      The code emission is in:
++
++              EmitTopBlock ()
++
++      Which drives the process, it first resolves the topblock, then
++      emits the required metadata (local variable definitions) and
++      finally emits the code.
++              
++* Miscellaneous
++
++** Error Processing.
++
++      Errors are reported during the various stages of the
++      compilation process.  The compiler stops its processing if
++      there are errors between the various phases.  This simplifies
++      the code, because it is safe to assume always that the data
++      structures that the compiler is operating on are always
++      consistent.
++
++      The error codes in the Mono C# compiler are the same as those
++      found in the Microsoft C# compiler, with a few exceptions
++      (where we report a few more errors, those are documented in
++      mcs/errors/errors.txt).  The goal is to reduce confusion to
++      the users, and also to help us track the progress of the
++      compiler in terms of the errors we report. 
++
++      The Report class provides error and warning display functions,
++      and also keeps an error count which is used to stop the
++      compiler between the phases.  
++
++      A couple of debugging tools are available here, and are useful
++      when extending or fixing bugs in the compiler.  If the
++      `--fatal' flag is passed to the compiler, the Report.Error
++      routine will throw an exception.  This can be used to pinpoint
++      the location of the bug and examine the variables around the
++      error location.
++
++      Warnings can be turned into errors by using the `--werror'
++      flag to the compiler. 
++
++      The report class also ignores warnings that have been
++      specified on the command line with the `--nowarn' flag.
++
++      Finally, code in the compiler uses the global variable
++      RootContext.WarningLevel in a few places to decide whether a
++      warning is worth reporting to the user or not.  
++
++* Debugging the compiler
++
++      Sometimes it is convenient to find *how* a particular error
++      message is being reported from, to do that, you might want to use
++      the --fatal flag to mcs.  The flag will instruct the compiler to 
++      abort with a stack trace execution when the error is reported.
++
++      You can use this with -warnaserror to obtain the same effect
++      with warnings. 
++
++* Editing the compiler sources
++
++      The compiler sources are intended to be edited with 134 columns of width
++
author	Miguel de Icaza <miguel@gnome.org>
	Tue, 12 Sep 2006 16:37:52 +0000 (16:37 -0000)
committer	Miguel de Icaza <miguel@gnome.org>
	Tue, 12 Sep 2006 16:37:52 +0000 (16:37 -0000)
		1	2
mcs/docs/Makefile	patch \|	diff1 \|	diff2 \|	blob \| history
mcs/docs/compiler	patch \|	blob1 \|	blob2 \|	history
mcs/docs/compiler.txt	patch \|	\|	\|	blob