1 The Internals of the Mono C# Compiler
9 The Mono C# compiler is a C# compiler written in C# itself.
10 Its goals are to provide a free and alternate implementation
11 of the C# language. The Mono C# compiler generates ECMA CIL
12 images through the use of the System.Reflection.Emit API which
13 enable the compiler to be platform independent.
15 * How the compiler fits together
17 The driver, the tokenizer, the parser, the internal
18 representation, the resolution process, the code generation
23 All the input files that make up a program need to be read in
24 advance, because C# allows declarations to happen after an
25 entity is used, for example, the following is a valid program:
30 a = "hello"; b = "world";
39 At the time the assignment expression `a = "hello"' is parsed,
40 it is not know whether a is a class field from this class, or
41 its parents, or whether it is a property access or a variable
44 Hence, the compiler driver has to parse all the input files.
45 Once all the input files have been parsed, and an internal
46 representation of the input program exists, the following
49 * The interface hierarchy is resolved first.
50 As the interface hierarchy is constructed,
51 TypeBuilder objects are created for each one of
54 * Classes and structure hierarchy is resolved next,
55 TypeBuilder objects are created for them.
57 * Constants and enumerations are resolved.
59 * Method, indexer, properties, delegates and event
60 definitions are now entered into the TypeBuilders.
62 * Elements that contain code are now invoked to
63 perform semantic analysis and code generation.
65 ** The Tokenizer and the pre-processor
67 The tokenizer is contained in the file `cs-tokenizer.cs', and
68 the main entry point is the `token ()' method. The tokenizer
69 implements the `yyParser.yyInput' interface, which is what the
70 Yacc/Jay parser will use when fetching tokens.
72 Token definitions are generated by jay during the compilation
73 process, and those can be references from the tokenizer class
74 with the `Token.' prefix.
76 Each time a token is returned, the location for the token is
77 recorded into the `Location' property, that can be accessed by
78 the parser. The parser retrieves the Location properties as
79 it builds its internal representation to allow the semantic
80 analysis phase to produce error messages that can pin point
81 the location of the problem.
83 C# has a limited pre-processor that allows conditional
84 compilation, but it is not as fully featured as the C
85 pre-processor, and most notable, macros are missing. This
86 makes it simple to implement in very few lines and mesh it
89 The `handle_preprocessing_directive' method in the tokenizer
90 handles all the pre-processing, and it is invoked when the '#'
91 symbol is found as the first token in a line. The state of
92 the pre-processor is contained in a Stack called `ifstack',
93 this state is used to track the if/elif/else/endif states.
94 The state is encoded in the top of the stack as a number of
95 values `TAKING', `TAKEN_BEFORE', `ELSE_SEEN',
100 Locations are encoded as a 32-bit number (the Location
101 struct) that map each input source line to a linear number.
102 As new files are parsed, the Location manager is informed of
103 the new file, to allow it to map back from an int constant to
104 a file + line number.
106 The tokenizer also tracks the column number for a token, but
107 this is currently not being used or encoded. It could
108 probably be encoded in the low 9 bits, allowing for columns
109 from 1 to 512 to be encoded.
113 * Internal Representatio
117 *** The Expression Class
119 The utility functions that can be called by all children of
124 Constants in the Mono C# compiler are reprensented by the
125 abstract class `Constant'. Constant is in turn derived from
126 Expression. The base constructor for `Constant' just sets the
127 expression class to be an `ExprClass.Value', Constants are
128 born in a fully resolved state, so the `DoResolve' method
129 only returns a reference to itself.
131 Each Constant should implement the `GetValue' method which
132 returns an object with the actual contents of this constant, a
133 utility virtual method called `AsString' is used to render a
134 diagnostic message. The output of AsString is shown to the
135 developer when an error or a warning is triggered.
137 Constant classes also participate in the constant folding
138 process. Constant folding is invoked by those expressions
139 that can be constant folded invoking the functionality
140 provided by the ConstantFold class (cfold.cs).
142 Each Constant has to implement a number of methods to convert
143 itself into a Constant of a different type. These methods are
144 called `ConvertToXXXX' and they are invoked by the wrapper
145 functions `ToXXXX'. These methods only perform implicit
146 numeric conversions. Explicit conversions are handled by the
147 `Cast' expression class.
149 The `ToXXXX' methods are the entry point, and provide error
150 reporting in case a conversion can not be performed.