docs/exception-handling.txt

   1
   2                  Exception Handling In the Mono Runtime
   3                  --------------------------------------
   4
   5 * Introduction
   6 --------------
   7
   8         There are many types of exceptions which the runtime needs to
   9         handle. These are:
  10
  11         - exceptions thrown from managed code using the 'throw' or 'rethrow' CIL
  12           instructions.
  13
  14         - exceptions thrown by some IL instructions like InvalidCastException thrown
  15           by the 'castclass' CIL instruction.
  16
  17         - exceptions thrown by runtime code
  18
  19         - synchronous signals received while in managed code
  20
  21         - synchronous signals received while in native code
  22
  23         - asynchronous signals
  24
  25         Since exception handling is very arch dependent, parts of the
  26         exception handling code reside in the arch specific
  27         exceptions-<ARCH>.c files. The architecture independent parts
  28         are in mini-exceptions.c. The different exception types listed
  29         above are generated in different parts of the runtime, but
  30         ultimately, they all end up in the mono_handle_exception ()
  31         function in mini-exceptions.c.
  32
  33 * Exceptions throw programmatically from managed code
  34 -----------------------------------------------------
  35
  36         These exceptions are thrown from managed code using 'throw' or
  37         'rethrow' CIL instructions. The JIT compiler will translate
  38         them to a call to a helper function called
  39         'mono_arch_throw/rethrow_exception'.
  40
  41         These helper functions do not exist at compile time, they are
  42         created dynamically at run time by the code in the
  43         exceptions-<ARCH>.c files.
  44
  45         They perform various stack manipulation magic, then call a
  46         helper function usually named throw_exception (), which does
  47         further processing in C code, then calls
  48         mono_handle_exception() to do the rest.
  49
  50 * Exceptions thrown implicitly from managed code
  51 ------------------------------------------------
  52
  53         These exceptions are thrown by some IL instructions when
  54         something goes wrong.  When the JIT needs to throw such an
  55         exception, it emits a forward conditional branch and remembers
  56         its position, along with the exception which needs to be
  57         emitted. This is usually done in macros named
  58         EMIT_COND_SYSTEM_EXCEPTION in the mini-<ARCH>.c files.
  59
  60         After the machine code for the method is emitted, the JIT
  61         calls the arch dependent mono_arch_emit_exceptions () function
  62         which will add the exception throwing code to the end of the
  63         method, and patches up the previous forward branches so they
  64         will point to this code.
  65
  66         This has the advantage that the rarely-executed exception
  67         throwing code is kept separate from the method body, leading
  68         to better icache performance.
  69
  70         The exception throwing code braches to the dynamically
  71         generated mono_arch_throw_corlib_exception helper function,
  72         which will create the proper exception object, does some stack
  73         manipulation, then calls throw_exception ().
  74
  75 * Exceptions thrown by runtime code
  76 -----------------------------------
  77
  78         These exceptions are usually thrown by the implementations of
  79         InternalCalls (icalls). First an appropriate exception object
  80         is created with the help of various helper functions in
  81         metadata/exception.c, which has a separate helper function for
  82         allocating each kind of exception object used by the runtime
  83         code.  Then the mono_raise_exception () function is called to
  84         actually throw the exception. That function never returns.
  85
  86         An example:
  87
  88            if (something_is_wrong)
  89                   mono_raise_exception (mono_get_exception_index_out_of_range ());
  90
  91         mono_raise_exception () simply passes the exception to the JIT
  92         side through an API, where it will be received by helper
  93         created by mono_arch_throw_exception (). From now on, it is
  94         treated as an exception thrown from managed code.
  95
  96 * Synchronous signals
  97 ---------------------
  98
  99         For performance reasons, the runtime does not do same checks
 100         required by the CLI spec. Instead, it relies on the CPU to do
 101         them. The two main checks which are omitted are null-pointer
 102         checks, and arithmetic checks. When a null pointer is
 103         dereferenced by JITted code, the CPU will notify the kernel
 104         through an interrupt, and the kernel will send a SIGSEGV
 105         signal to the process. The runtime installs a signal handler
 106         for SIGSEGV, which is sigsegv_signal_handler () in mini.c. The
 107         signal handler creates the appropriate exception object and
 108         calls mono_handle_exception () with it. Arithmetic exceptions
 109         like division by zero are handled similarly.
 110
 111 * Synchronous signals in native code
 112 ------------------------------------
 113
 114         Receiving a signal such as SIGSEGV while in native code means
 115         something very bad has happened. Because of this, the runtime
 116         will abort after trying to print a managed plus a native stack
 117         trace. The logic is in the mono_handle_native_sigsegv ()
 118         function.
 119
 120         Note that there are two kinds of native code which can be the
 121         source of the signal:
 122
 123         - code inside the runtime
 124         - code inside a native library loaded by an application, ie. libgtk+
 125
 126 * Stack overflow checking
 127 -------------------------
 128
 129         Stack overflow exceptions need special handling. When a thread
 130         overflows its stack, the kernel sends it a normal SIGSEGV
 131         signal, but the signal handler tries to execute on the same as
 132         the thread leading to a further SIGSEGV which will terminate
 133         the thread. A solution is to use an alternative signal stack
 134         supported by UNIX operating systems through the sigaltstack
 135         (2) system call.  When a thread starts up, the runtime will
 136         install an altstack using the mono_setup_altstack () function
 137         in mini-exceptions.c. When a SIGSEGV is received, the signal
 138         handler checks whenever the fault address is near the bottom
 139         of the threads normal stack. If it is, a
 140         StackOverflowException is created instead of a
 141         NullPointerException. This exception is handled like any other
 142         exception, with some minor differences.
 143
 144         There are two reasons why sigaltstack is disabled by default:
 145
 146         * The main problem with sigaltstack() is that the stack
 147         employed by it is not visible to the GC and it is possible
 148         that the GC will miss it.
 149
 150         * Working sigaltstack support is very much os/kernel/libc
 151         dependent, so it is disabled by default.
 152
 153
 154 * Asynchronous signals
 155 ----------------------
 156
 157         Async signals are used by the runtime to notify a thread that
 158         it needs to change its state somehow. Currently, it is used
 159         for implementing thread abort/suspend/resume.
 160
 161           Handling async signals correctly is a very hard problem,
 162         since the receiving thread can be in basically any state upon
 163         receipt of the signal. It can execute managed code, native
 164         code, it can hold various managed/native locks, or it can be
 165         in a process of acquiring them, it can be starting up,
 166         shutting down etc. Most of the C APIs used by the runtime are
 167         not asynch-signal safe, meaning it is not safe to call them
 168         from an async signal handler. In particular, the pthread
 169         locking functions are not async-safe, so if a signal handler
 170         interrupted code which was in the process of acquiring a lock,
 171         and the signal handler tries to acquire a lock, the thread
 172         will deadlock.  Unfortunately, the current signal handling
 173         code does acquire locks, so sometimes it does deadlock.
 174
 175         When receiving an async signal, the signal handler first tries
 176         to determine whenever the thread was executing managed code
 177         when it was interrupted. If it did, then it is safe to
 178         interrupt it, so a ThreadAbortException is constructed and
 179         thrown. If the thread was executing native code, then it is
 180         generally not safe to interrupt it. In this case, the runtime
 181         sets a flag then returns from the signal handler. That flag is
 182         checked every time the runtime returns from native code to
 183         managed code, and the exception is thrown then. Also, a
 184         platform specific mechanism is used to cause the thread to
 185         interrupt any blocking operation it might be doing.
 186
 187         The async signal handler is in sigusr1_signal_handler () in
 188         mini.c, while the logic which determines whenever an exception
 189         is safe to be thrown is in mono_thread_request_interruption
 190         ().
 191
 192 * Stack unwinding during exception handling
 193 -------------------------------------------
 194
 195         The execution state of a thread during exception handling is
 196         stored in an arch-specific structure called MonoContext. This
 197         structure contains the values of all the CPU registers
 198         relevant during exception handling, which usually means:
 199
 200         - IP (instruction pointer)
 201         - SP (stack pointer)
 202         - FP (frame pointer)
 203         - callee saved registers
 204
 205         Callee saved registers are the registers which are required by
 206         any procedure to be saved/restored before/after using
 207         them. They are usually defined by each platforms ABI
 208         (Application Binary Interface). For example, on x86, they are
 209         EBX, ESI and EDI.
 210
 211         The code which calls mono_handle_exception () is required to
 212         construct the initial MonoContext. How this is done depends on
 213         the caller. For exceptions thrown from managed code, the
 214         mono_arch_throw_exception helper function saves the values of
 215         the required registers and passes them to throw_exception (),
 216         which will save them in the MonoContext structure. For
 217         exceptions thrown from signal handlers, the MonoContext
 218         stucture is initialized from the signal info received from the
 219         kernel.
 220
 221         During exception handling, the runtime needs to 'unwind' the
 222         stack, i.e.  given the state of the thread at a stack frame,
 223         construct the state at its callers. Since this is platform
 224         specific, it is done by a platform specific function called
 225         mono_arch_find_jit_info ().
 226
 227         Two kinds of stack frames need handling:
 228
 229         - Managed frames are easier. The JIT will store some
 230           information about each managed method, like which
 231           callee-saved registers it uses. Based on this information,
 232           mono_arch_find_jit_info () can find the values of the
 233           registers on the thread stack, and restore them.
 234
 235         - Native frames are problematic, since we have no information
 236           about how to unwind through them. Some compilers generate
 237           unwind information for code, some don't. Also, there is no
 238           general purpose library to obtain and decode this unwind
 239           information. So the runtime uses a different solution. When
 240           managed code needs to call into native code, it does through
 241           a managed->native wrapper function, which is generated by
 242           the JIT. This function is responsible for saving the machine
 243           state into a per-thread structure called MonoLMF (Last
 244           Managed Frame). These LMF structures are stored on the
 245           threads stack, and are linked together using one of their
 246           fields. When the unwinder encounters a native frame, it
 247           simply pops one entry of the LMF 'stack', and uses it to
 248           restore the frame state to the moment before control passed
 249           to native code. In effect, all successive native frames are
 250           skipped together.
 251
 252 Problems/future work
 253 --------------------
 254
 255 1. Async signal safety
 256 ----------------------
 257
 258         The current async signal handling code is not async safe, so
 259         it can and does deadlock in practice. It needs to be rewritten
 260         to avoid taking locks at least until it can determine that it
 261         was interrupting managed code.
 262
 263         Another problem is the managed stack frame unwinding code. It
 264         blindly assumes that if the IP points into a managed frame,
 265         then all the callee saved registers + the stack pointer are
 266         saved on the stack. This is not true if the thread was
 267         interrupted while executing the method prolog/epilog.
 268
 269 2. Raising exceptions from native code
 270 --------------------------------------
 271
 272         Currently, exceptions are raised by calling
 273         mono_raise_exception () in the middle of runtime code. This
 274         has two problems:
 275
 276         - No cleanup is done, ie. if the caller of the function which
 277           throws an exception has taken locks, or allocated memory,
 278           that is not cleaned up. For this reason, it is only safe to
 279           call mono_raise_exception () 'very close' to managed code,
 280           ie. in the icall functions themselves.
 281
 282         - To allow mono_raise_exception () to unwind through native
 283           code, we need to save the LMF structures which can add a lot
 284           of overhead even in the common case when no exception is
 285           thrown. So this is not zero-cost exception handling.
 286
 287         An alternative might be to use a JNI style
 288         set-pending-exception API.  Runtime code could call
 289         mono_set_pending_exception (), then return to its caller with
 290         an error indication allowing the caller to clean up. When
 291         execution returns to managed code, then managed->native
 292         wrapper could check whenever there is a pending exception and
 293         throw it if neccesary. Since we already check for pending
 294         thread interruption, this would have no overhead, allowing us
 295         to drop the LMF saving/restoring code, or significant parts of
 296         it.
 297
 298 4. libunwind
 299 ------------
 300
 301         There is an OSS project called libunwind which is a standalone
 302         stack unwinding library. It is currently in development, but
 303         it is used by default by gcc on ia64 for its stack
 304         unwinding. The mono runtime also uses it on ia64. It has
 305         several advantages in relation to our current unwinding code:
 306
 307         - it has a platform independent API, i.e. the same unwinding
 308           code can be used on multiple platforms.
 309
 310         - it can generate unwind tables which are correct at every
 311           instruction, i.e.  can be used for unwinding from async
 312           signals.
 313
 314         - given sufficient unwind info generated by a C compiler, it
 315           can unwind through C code.
 316
 317         - most of its API is async-safe
 318
 319         - it implements the gcc C++ exception handling API, so in
 320           theory it can be used to implement mixed-language exception
 321           handling (i.e. C++ exception caught in mono, mono exception
 322           caught in C++).
 323
 324         - it is MIT licensed
 325
 326         The biggest problem with libuwind is its platform support. ia64 support is
 327         complete/well tested, while support for other platforms is missing/incomplete.
 328
 329         http://www.hpl.hp.com/research/linux/libunwind/
 330