docs/exception-handling.txt

   1
   2                  Exception Handling In the Mono Runtime
   3                  --------------------------------------
   4
   5 Introduction
   6 ------------
   7
   8   There are many types of exceptions which the runtime needs to handle. These
   9 are:
  10 - exceptions thrown from managed code using the 'throw' or 'rethrow' CIL
  11   instructions.
  12 - exceptions thrown by some IL instructions like InvalidCastException thrown
  13   by the 'castclass' CIL instruction.
  14 - exceptions thrown by runtime code
  15 - synchronous signals received while in managed code
  16 - synchronous signals received while in native code
  17 - asynchronous signals
  18
  19 Since exception handling is very arch dependent, parts of the exception
  20 handling code reside in the arch specific exceptions-<ARCH>.c files. The
  21 architecture independent parts are in mini-exceptions.c. The different
  22 exception types listed above are generated in different parts of the runtime,
  23 but ultimately, they all end up in the mono_handle_exception () function in
  24 mini-exceptions.c.
  25
  26 Exceptions throw programmatically from managed code
  27 ---------------------------------------------------
  28
  29 These exceptions are thrown from managed code using 'throw' or 'rethrow' CIL
  30 instructions. The JIT compiler will translate them to a call to a helper
  31 function called 'mono_arch_throw/rethrow_exception'. These helper functions do
  32 not exist at compile time, they are created  dynamically at run time by the
  33 code in the exceptions-<ARCH>.c files. They perform various stack
  34 manipulation magic, then call a helper function usually named throw_exception (), which does further processing in C code, then calls mono_handle_exception ()
  35 to do the rest.
  36
  37 Exceptions thrown implicitly from managed code
  38 ----------------------------------------------
  39
  40 These exceptions are thrown by some IL instructions when something goes wrong.
  41 When the JIT needs to throw such an exception, it emits a forward conditional
  42 branch and remembers its position, along with the exception which needs to
  43 be emitted. This is usually done in macros named EMIT_COND_SYSTEM_EXCEPTION in
  44 the mini-<ARCH>.c files. After the machine code for the method is emitted, the
  45 JIT calls the arch dependent mono_arch_emit_exceptions () function which will
  46 add the exception throwing code to the end of the method, and patches up the
  47 previous forward branches so they will point to this code. This has the
  48 advantage that the rarely-executed exception throwing code is kept separate
  49 from the method body, leading to better icache performance.
  50 The exception throwing code braches to the dynamically generated
  51 mono_arch_throw_corlib_exception helper function, which will create the
  52 proper exception object, does some stack manipulation, then calls
  53 throw_exception ().
  54
  55 Exceptions thrown by runtime code
  56 ---------------------------------
  57
  58 These exceptions are usually thrown by the implementations of InternalCalls
  59 (icalls). First an appropriate exception object is created with the help of
  60 various helper functions in metadata/exception.c, which has a separate helper
  61 function for allocating each kind of exception object used by the runtime code.
  62 Then the mono_raise_exception () function is called to actually throw the
  63 exception. That function never returns.
  64
  65 An example:
  66    if (something_is_wrong)
  67           mono_raise_exception (mono_get_exception_index_out_of_range ());
  68
  69 mono_raise_exception () simply passes the exception to the JIT side through
  70 an API, where it will be received by helper created by mono_arch_throw_exception (). From now on, it is treated as an exception thrown from managed code.
  71
  72 Synchronous signals
  73 -------------------
  74
  75 For performance reasons, the runtime does not do same checks required by the
  76 CLI spec. Instead, it relies on the CPU to do them. The two main checks which
  77 are omitted are null-pointer checks, and arithmetic checks. When a null
  78 pointer is dereferenced by JITted code, the CPU will notify the kernel through
  79 an interrupt, and the kernel will send a SIGSEGV signal to the process. The
  80 runtime installs a signal handler for SIGSEGV, which is
  81 sigsegv_signal_handler () in mini.c. The signal handler creates the appropriate
  82 exception object and calls mono_handle_exception () with it. Arithmetic
  83 exceptions like division by zero are handled similarly.
  84
  85 Synchronous signals in native code
  86 ----------------------------------
  87
  88 Receiving a signal such as SIGSEGV while in native code means something very
  89 bad has happened. Despite this, the current runtime code treats such signals
  90 the same as when they are received while in managed code, ie. it constructs
  91 a NullPointerException and attempts to handle it. Note that there are two
  92 kinds of native code which can be the source of the signal:
  93 - code inside the runtime
  94 - code inside a native library loaded by an application, ie. libgtk+
  95
  96 Stack overflow checking
  97 -----------------------
  98
  99   Stack overflow exceptions need special handling. When a thread overflows its
 100 stack, the kernel sends it a normal SIGSEGV signal, but the signal handler
 101 tries to execute on the same as the thread leading to a further SIGSEGV which
 102 will terminate the thread. A solution is to use an alternative signal stack
 103 supported by UNIX operating systems through the sigaltstack (2) system call.
 104 When a thread starts up, the runtime will install an altstack using the
 105 mono_setup_altstack () function in mini-exceptions.c. When a SIGSEGV is
 106 received, the signal handler checks whenever the fault address is near the
 107 bottom of the threads normal stack. If it is, a StackOverflowException is
 108 created instead of a NullPointerException. This exception is handled like
 109 any other exception, with some minor differences.
 110   Working sigaltstack support is very much os/kernel/libc dependent, so it is
 111 disabled by default.
 112
 113 Asynchronous signals
 114 --------------------
 115
 116   Async signals are used by the runtime to notify a thread that it needs to
 117 change its state somehow. Currently, it is used for implementing
 118 thread abort/suspend/resume.
 119
 120   Handling async signals correctly is a very hard problem, since the receiving
 121 thread can be in basically any state upon receipt of the signal. It can
 122 execute managed code, native code, it can hold various managed/native locks, or
 123 it can be in a process of acquiring them, it can be starting up, shutting down
 124 etc. Most of the C APIs used by the runtime are not asynch-signal safe,
 125 meaning it is not safe to call them from an async signal handler. In
 126 particular, the pthread locking functions are not async-safe, so if a
 127 signal handler interrupted code which was in the process of acquiring a lock,
 128 and the signal handler tries to acquire a lock, the thread will deadlock.
 129 Unfortunately, the current signal handling code does acquire locks, so
 130 sometimes it does deadlock.
 131
 132 When receiving an async signal, the signal handler first tries to determine
 133 whenever the thread was executing managed code when it was interrupted. If
 134 it did, then it is safe to interrupt it, so a ThreadAbortException is
 135 constructed and thrown. If the thread was executing native code, then it is
 136 generally not safe to interrupt it. In this case, the runtime sets a flag
 137 then returns from the signal handler. That flag is checked every time the
 138 runtime returns from native code to managed code, and the exception is thrown
 139 then. Also, a platform specific mechanism is used to cause the thread to
 140 interrupt any blocking operation it might be doing.
 141
 142 The async signal handler is in sigusr1_signal_handler () in mini.c, while
 143 the logic which determines whenever an exception is safe to be thrown is in
 144 mono_thread_request_interruption ().
 145
 146 Stack unwinding during exception handling
 147 -----------------------------------------
 148
 149 The execution state of a thread during exception handling is stored in an
 150 arch-specific structure called MonoContext. This structure contains the values
 151 of all the CPU registers relevant during exception handling, which
 152 usually means:
 153 - IP (instruction pointer)
 154 - SP (stack pointer)
 155 - FP (frame pointer)
 156 - callee saved registers
 157
 158 Callee saved registers are the registers which are required by any procedure
 159 to be saved/restored before/after using them. They are usually defined by
 160 each platforms ABI (Application Binary Interface). For example, on x86, they
 161 are EBX, ESI and EDI.
 162
 163 The code which calls mono_handle_exception () is required to construct the
 164 initial MonoContext. How this is done depends on the caller. For exceptions
 165 thrown from managed code, the mono_arch_throw_exception helper function
 166 saves the values of the required registers and passes them to throw_exception (), which will save them in the MonoContext structure. For exceptions thrown from
 167 signal handlers, the MonoContext stucture is initialized from the signal info
 168 received from the kernel.
 169
 170 During exception handling, the runtime needs to 'unwind' the stack, i.e.
 171 given the state of the thread at a stack frame, construct the state at its
 172 callers. Since this is platform specific, it is done by a platform specific
 173 function called mono_arch_find_jit_info ().
 174
 175 Two kinds of stack frames need handling:
 176 - Managed frames are easier. The JIT will store some information about each
 177   managed method, like which callee-saved registers it uses. Based on this
 178   information, mono_arch_find_jit_info () can find the values of the registers
 179   on the thread stack, and restore them.
 180 - Native frames are problematic, since we have no information about how to
 181   unwind through them. Some compilers generate unwind information for code,
 182   some don't. Also, there is no general purpose library to obtain and decode
 183   this unwind information. So the runtime uses a different solution. When
 184   managed code needs to call into native code, it does through a
 185   managed->native wrapper function, which is generated by the JIT. This
 186   function is responsible for saving the machine state into a per-thread
 187   structure called MonoLMF (Last Managed Frame). These LMF structures are
 188   stored on the threads stack, and are linked together using one of their
 189   fields. When the unwinder encounters a native frame, it simply pops
 190   one entry of the LMF 'stack', and uses it to restore the frame state to the
 191   moment before control passed to native code. In effect, all successive native
 192   frames are skipped together.
 193
 194 Problems/future work
 195 --------------------
 196
 197 1. Async signal safety
 198 ----------------------
 199
 200 The current async signal handling code is not async safe, so it can and does
 201 deadlock in practice. It needs to be rewritten to avoid taking locks at least
 202 until it can determine that it was interrupting managed code.
 203
 204 Another problem is the managed stack frame unwinding code. It blindly assumes
 205 that if the IP points into a managed frame, then all the callee saved
 206 registers + the stack pointer are saved on the stack. This is not true if
 207 the thread was interrupted while executing the method prolog/epilog.
 208
 209 2. Raising exceptions from native code
 210 --------------------------------------
 211
 212 Currently, exceptions are raised by calling mono_raise_exception () in
 213 the middle of runtime code. This has two problems:
 214 - No cleanup is done, ie. if the caller of the function which throws an
 215   exception has taken locks, or allocated memory, that is not cleaned up. For
 216   this reason, it is only safe to call mono_raise_exception () 'very close' to
 217   managed code, ie. in the icall functions themselves.
 218 - To allow mono_raise_exception () to unwind through native code, we need to
 219   save the LMF structures which can add a lot of overhead even in the common
 220   case when no exception is thrown. So this is not zero-cost exception handling.
 221
 222   An alternative might be to use a JNI style set-pending-exception API.
 223 Runtime code could call mono_set_pending_exception (), then return to its
 224 caller with an error indication allowing the caller to clean up. When execution
 225 returns to managed code, then managed->native wrapper could check whenever
 226 there is a pending exception and throw it if neccesary. Since we already check
 227 for pending thread interruption, this would have no overhead, allowing us
 228 to drop the LMF saving/restoring code, or significant parts of it.
 229
 230 3. Signals received while in native code
 231 ----------------------------------------
 232
 233 Receiving a SIGSEGV while in native code should be seen as a catastrophic
 234 error, and the runtime should abort the process immediately after trying to
 235 print some diagnostics. This is how SIGSEGVs are handled in all other
 236 production VMs. This is also a requirement for dropping LMF support.
 237
 238 4. libunwind
 239 ------------
 240
 241 There is an OSS project called libunwind which is a standalone stack unwinding
 242 library. It is currently in development, but it is used by default by gcc on
 243 ia64 for its stack unwinding. The mono runtime also uses it on ia64. It has
 244 several advantages in relation to our current unwinding code:
 245 - it has a platform independent API, i.e. the same unwinding code can be used
 246   on multiple platforms.
 247 - it can generate unwind tables which are correct at every instruction, i.e.
 248   can be used for unwinding from async signals.
 249 - given sufficient unwind info generated by a C compiler, it can unwind through
 250   C code.
 251 - most of its API is async-safe
 252 - it implements the gcc C++ exception handling API, so in theory it can
 253   be used to implement mixed-language exception handling (i.e. C++ exception
 254   caught in mono, mono exception caught in C++).
 255 - it is MIT licensed
 256
 257 The biggest problem with libuwind is its platform support. ia64 support is
 258 complete/well tested, while support for other platforms is missing/incomplete.
 259
 260 http://www.hpl.hp.com/research/linux/libunwind/
 261