2 Exception Handling In the Mono Runtime
3 --------------------------------------
8 There are many types of exceptions which the runtime needs to
11 - exceptions thrown from managed code using the 'throw' or 'rethrow' CIL
14 - exceptions thrown by some IL instructions like InvalidCastException thrown
15 by the 'castclass' CIL instruction.
17 - exceptions thrown by runtime code
19 - synchronous signals received while in managed code
21 - synchronous signals received while in native code
23 - asynchronous signals
25 Since exception handling is very arch dependent, parts of the
26 exception handling code reside in the arch specific
27 exceptions-<ARCH>.c files. The architecture independent parts
28 are in mini-exceptions.c. The different exception types listed
29 above are generated in different parts of the runtime, but
30 ultimately, they all end up in the mono_handle_exception ()
31 function in mini-exceptions.c.
33 * Exceptions throw programmatically from managed code
34 -----------------------------------------------------
36 These exceptions are thrown from managed code using 'throw' or
37 'rethrow' CIL instructions. The JIT compiler will translate
38 them to a call to a helper function called
39 'mono_arch_throw/rethrow_exception'.
41 These helper functions do not exist at compile time, they are
42 created dynamically at run time by the code in the
43 exceptions-<ARCH>.c files.
45 They perform various stack manipulation magic, then call a
46 helper function usually named throw_exception (), which does
47 further processing in C code, then calls
48 mono_handle_exception() to do the rest.
50 * Exceptions thrown implicitly from managed code
51 ------------------------------------------------
53 These exceptions are thrown by some IL instructions when
54 something goes wrong. When the JIT needs to throw such an
55 exception, it emits a forward conditional branch and remembers
56 its position, along with the exception which needs to be
57 emitted. This is usually done in macros named
58 EMIT_COND_SYSTEM_EXCEPTION in the mini-<ARCH>.c files.
60 After the machine code for the method is emitted, the JIT
61 calls the arch dependent mono_arch_emit_exceptions () function
62 which will add the exception throwing code to the end of the
63 method, and patches up the previous forward branches so they
64 will point to this code.
66 This has the advantage that the rarely-executed exception
67 throwing code is kept separate from the method body, leading
68 to better icache performance.
70 The exception throwing code braches to the dynamically
71 generated mono_arch_throw_corlib_exception helper function,
72 which will create the proper exception object, does some stack
73 manipulation, then calls throw_exception ().
75 * Exceptions thrown by runtime code
76 -----------------------------------
78 These exceptions are usually thrown by the implementations of
79 InternalCalls (icalls). First an appropriate exception object
80 is created with the help of various helper functions in
81 metadata/exception.c, which has a separate helper function for
82 allocating each kind of exception object used by the runtime
83 code. Then the mono_raise_exception () function is called to
84 actually throw the exception. That function never returns.
88 if (something_is_wrong)
89 mono_raise_exception (mono_get_exception_index_out_of_range ());
91 mono_raise_exception () simply passes the exception to the JIT
92 side through an API, where it will be received by helper
93 created by mono_arch_throw_exception (). From now on, it is
94 treated as an exception thrown from managed code.
99 For performance reasons, the runtime does not do same checks
100 required by the CLI spec. Instead, it relies on the CPU to do
101 them. The two main checks which are omitted are null-pointer
102 checks, and arithmetic checks. When a null pointer is
103 dereferenced by JITted code, the CPU will notify the kernel
104 through an interrupt, and the kernel will send a SIGSEGV
105 signal to the process. The runtime installs a signal handler
106 for SIGSEGV, which is sigsegv_signal_handler () in mini.c. The
107 signal handler creates the appropriate exception object and
108 calls mono_handle_exception () with it. Arithmetic exceptions
109 like division by zero are handled similarly.
111 * Synchronous signals in native code
112 ------------------------------------
114 Receiving a signal such as SIGSEGV while in native code means
115 something very bad has happened. Because of this, the runtime
116 will abort after trying to print a managed plus a native stack
117 trace. The logic is in the mono_handle_native_sigsegv ()
120 Note that there are two kinds of native code which can be the
121 source of the signal:
123 - code inside the runtime
124 - code inside a native library loaded by an application, ie. libgtk+
126 * Stack overflow checking
127 -------------------------
129 Stack overflow exceptions need special handling. When a thread
130 overflows its stack, the kernel sends it a normal SIGSEGV
131 signal, but the signal handler tries to execute on the same as
132 the thread leading to a further SIGSEGV which will terminate
133 the thread. A solution is to use an alternative signal stack
134 supported by UNIX operating systems through the sigaltstack
135 (2) system call. When a thread starts up, the runtime will
136 install an altstack using the mono_setup_altstack () function
137 in mini-exceptions.c. When a SIGSEGV is received, the signal
138 handler checks whenever the fault address is near the bottom
139 of the threads normal stack. If it is, a
140 StackOverflowException is created instead of a
141 NullPointerException. This exception is handled like any other
142 exception, with some minor differences.
144 There are two reasons why sigaltstack is disabled by default:
146 * The main problem with sigaltstack() is that the stack
147 employed by it is not visible to the GC and it is possible
148 that the GC will miss it.
150 * Working sigaltstack support is very much os/kernel/libc
151 dependent, so it is disabled by default.
154 * Asynchronous signals
155 ----------------------
157 Async signals are used by the runtime to notify a thread that
158 it needs to change its state somehow. Currently, it is used
159 for implementing thread abort/suspend/resume.
161 Handling async signals correctly is a very hard problem,
162 since the receiving thread can be in basically any state upon
163 receipt of the signal. It can execute managed code, native
164 code, it can hold various managed/native locks, or it can be
165 in a process of acquiring them, it can be starting up,
166 shutting down etc. Most of the C APIs used by the runtime are
167 not asynch-signal safe, meaning it is not safe to call them
168 from an async signal handler. In particular, the pthread
169 locking functions are not async-safe, so if a signal handler
170 interrupted code which was in the process of acquiring a lock,
171 and the signal handler tries to acquire a lock, the thread
172 will deadlock. Unfortunately, the current signal handling
173 code does acquire locks, so sometimes it does deadlock.
175 When receiving an async signal, the signal handler first tries
176 to determine whenever the thread was executing managed code
177 when it was interrupted. If it did, then it is safe to
178 interrupt it, so a ThreadAbortException is constructed and
179 thrown. If the thread was executing native code, then it is
180 generally not safe to interrupt it. In this case, the runtime
181 sets a flag then returns from the signal handler. That flag is
182 checked every time the runtime returns from native code to
183 managed code, and the exception is thrown then. Also, a
184 platform specific mechanism is used to cause the thread to
185 interrupt any blocking operation it might be doing.
187 The async signal handler is in sigusr1_signal_handler () in
188 mini.c, while the logic which determines whenever an exception
189 is safe to be thrown is in mono_thread_request_interruption
192 * Stack unwinding during exception handling
193 -------------------------------------------
195 The execution state of a thread during exception handling is
196 stored in an arch-specific structure called MonoContext. This
197 structure contains the values of all the CPU registers
198 relevant during exception handling, which usually means:
200 - IP (instruction pointer)
203 - callee saved registers
205 Callee saved registers are the registers which are required by
206 any procedure to be saved/restored before/after using
207 them. They are usually defined by each platforms ABI
208 (Application Binary Interface). For example, on x86, they are
211 The code which calls mono_handle_exception () is required to
212 construct the initial MonoContext. How this is done depends on
213 the caller. For exceptions thrown from managed code, the
214 mono_arch_throw_exception helper function saves the values of
215 the required registers and passes them to throw_exception (),
216 which will save them in the MonoContext structure. For
217 exceptions thrown from signal handlers, the MonoContext
218 stucture is initialized from the signal info received from the
221 During exception handling, the runtime needs to 'unwind' the
222 stack, i.e. given the state of the thread at a stack frame,
223 construct the state at its callers. Since this is platform
224 specific, it is done by a platform specific function called
225 mono_arch_find_jit_info ().
227 Two kinds of stack frames need handling:
229 - Managed frames are easier. The JIT will store some
230 information about each managed method, like which
231 callee-saved registers it uses. Based on this information,
232 mono_arch_find_jit_info () can find the values of the
233 registers on the thread stack, and restore them.
235 - Native frames are problematic, since we have no information
236 about how to unwind through them. Some compilers generate
237 unwind information for code, some don't. Also, there is no
238 general purpose library to obtain and decode this unwind
239 information. So the runtime uses a different solution. When
240 managed code needs to call into native code, it does through
241 a managed->native wrapper function, which is generated by
242 the JIT. This function is responsible for saving the machine
243 state into a per-thread structure called MonoLMF (Last
244 Managed Frame). These LMF structures are stored on the
245 threads stack, and are linked together using one of their
246 fields. When the unwinder encounters a native frame, it
247 simply pops one entry of the LMF 'stack', and uses it to
248 restore the frame state to the moment before control passed
249 to native code. In effect, all successive native frames are
255 1. Async signal safety
256 ----------------------
258 The current async signal handling code is not async safe, so
259 it can and does deadlock in practice. It needs to be rewritten
260 to avoid taking locks at least until it can determine that it
261 was interrupting managed code.
263 Another problem is the managed stack frame unwinding code. It
264 blindly assumes that if the IP points into a managed frame,
265 then all the callee saved registers + the stack pointer are
266 saved on the stack. This is not true if the thread was
267 interrupted while executing the method prolog/epilog.
269 2. Raising exceptions from native code
270 --------------------------------------
272 Currently, exceptions are raised by calling
273 mono_raise_exception () in the middle of runtime code. This
276 - No cleanup is done, ie. if the caller of the function which
277 throws an exception has taken locks, or allocated memory,
278 that is not cleaned up. For this reason, it is only safe to
279 call mono_raise_exception () 'very close' to managed code,
280 ie. in the icall functions themselves.
282 - To allow mono_raise_exception () to unwind through native
283 code, we need to save the LMF structures which can add a lot
284 of overhead even in the common case when no exception is
285 thrown. So this is not zero-cost exception handling.
287 An alternative might be to use a JNI style
288 set-pending-exception API. Runtime code could call
289 mono_set_pending_exception (), then return to its caller with
290 an error indication allowing the caller to clean up. When
291 execution returns to managed code, then managed->native
292 wrapper could check whenever there is a pending exception and
293 throw it if neccesary. Since we already check for pending
294 thread interruption, this would have no overhead, allowing us
295 to drop the LMF saving/restoring code, or significant parts of
301 There is an OSS project called libunwind which is a standalone
302 stack unwinding library. It is currently in development, but
303 it is used by default by gcc on ia64 for its stack
304 unwinding. The mono runtime also uses it on ia64. It has
305 several advantages in relation to our current unwinding code:
307 - it has a platform independent API, i.e. the same unwinding
308 code can be used on multiple platforms.
310 - it can generate unwind tables which are correct at every
311 instruction, i.e. can be used for unwinding from async
314 - given sufficient unwind info generated by a C compiler, it
315 can unwind through C code.
317 - most of its API is async-safe
319 - it implements the gcc C++ exception handling API, so in
320 theory it can be used to implement mixed-language exception
321 handling (i.e. C++ exception caught in mono, mono exception
326 The biggest problem with libuwind is its platform support. ia64 support is
327 complete/well tested, while support for other platforms is missing/incomplete.
329 http://www.hpl.hp.com/research/linux/libunwind/