Exception Handling In the Mono Runtime
                 --------------------------------------

* Introduction
--------------

	There are many types of exceptions which the runtime needs to
	handle. These are:

	- exceptions thrown from managed code using the 'throw' or 'rethrow' CIL
	  instructions.

	- exceptions thrown by some IL instructions like InvalidCastException thrown
	  by the 'castclass' CIL instruction.

	- exceptions thrown by runtime code

	- synchronous signals received while in managed code

	- synchronous signals received while in native code

	- asynchronous signals
	
	Since exception handling is very arch dependent, parts of the
	exception handling code reside in the arch specific
	exceptions-<ARCH>.c files. The architecture independent parts
	are in mini-exceptions.c. The different exception types listed
	above are generated in different parts of the runtime, but
	ultimately, they all end up in the mono_handle_exception ()
	function in mini-exceptions.c.
	
* Exceptions throw programmatically from managed code
-----------------------------------------------------

	These exceptions are thrown from managed code using 'throw' or
	'rethrow' CIL instructions. The JIT compiler will translate
	them to a call to a helper function called
	'mono_arch_throw/rethrow_exception'. 

	These helper functions do not exist at compile time, they are
	created dynamically at run time by the code in the
	exceptions-<ARCH>.c files. 

	They perform various stack manipulation magic, then call a
	helper function usually named throw_exception (), which does
	further processing in C code, then calls
	mono_handle_exception() to do the rest.
	
* Exceptions thrown implicitly from managed code
------------------------------------------------

	These exceptions are thrown by some IL instructions when
	something goes wrong.  When the JIT needs to throw such an
	exception, it emits a forward conditional branch and remembers
	its position, along with the exception which needs to be
	emitted. This is usually done in macros named
	EMIT_COND_SYSTEM_EXCEPTION in the mini-<ARCH>.c files.

	After the machine code for the method is emitted, the JIT
	calls the arch dependent mono_arch_emit_exceptions () function
	which will add the exception throwing code to the end of the
	method, and patches up the previous forward branches so they
	will point to this code.

	This has the advantage that the rarely-executed exception
	throwing code is kept separate from the method body, leading
	to better icache performance.  

	The exception throwing code braches to the dynamically
	generated mono_arch_throw_corlib_exception helper function,
	which will create the proper exception object, does some stack
	manipulation, then calls throw_exception ().
	
* Exceptions thrown by runtime code
-----------------------------------

	These exceptions are usually thrown by the implementations of
	InternalCalls (icalls). First an appropriate exception object
	is created with the help of various helper functions in
	metadata/exception.c, which has a separate helper function for
	allocating each kind of exception object used by the runtime
	code.  Then the mono_raise_exception () function is called to
	actually throw the exception. That function never returns.
	
	An example:

	   if (something_is_wrong)
		  mono_raise_exception (mono_get_exception_index_out_of_range ());
	
	mono_raise_exception () simply passes the exception to the JIT
	side through an API, where it will be received by helper
	created by mono_arch_throw_exception (). From now on, it is
	treated as an exception thrown from managed code.
	
* Synchronous signals
---------------------
	
	For performance reasons, the runtime does not do same checks
	required by the CLI spec. Instead, it relies on the CPU to do
	them. The two main checks which are omitted are null-pointer
	checks, and arithmetic checks. When a null pointer is
	dereferenced by JITted code, the CPU will notify the kernel
	through an interrupt, and the kernel will send a SIGSEGV
	signal to the process. The runtime installs a signal handler
	for SIGSEGV, which is sigsegv_signal_handler () in mini.c. The
	signal handler creates the appropriate exception object and
	calls mono_handle_exception () with it. Arithmetic exceptions
	like division by zero are handled similarly.
	
* Synchronous signals in native code
------------------------------------

	Receiving a signal such as SIGSEGV while in native code means
	something very bad has happened. Because of this, the runtime
	will abort after trying to print a managed plus a native stack
	trace. The logic is in the mono_handle_native_sigsegv ()
	function.

	Note that there are two kinds of native code which can be the
	source of the signal:

	- code inside the runtime
	- code inside a native library loaded by an application, ie. libgtk+
	
* Stack overflow checking
-------------------------

	Stack overflow exceptions need special handling. When a thread
	overflows its stack, the kernel sends it a normal SIGSEGV
	signal, but the signal handler tries to execute on the same as
	the thread leading to a further SIGSEGV which will terminate
	the thread. A solution is to use an alternative signal stack
	supported by UNIX operating systems through the sigaltstack
	(2) system call.  When a thread starts up, the runtime will
	install an altstack using the mono_setup_altstack () function
	in mini-exceptions.c. When a SIGSEGV is received, the signal
	handler checks whenever the fault address is near the bottom
	of the threads normal stack. If it is, a
	StackOverflowException is created instead of a
	NullPointerException. This exception is handled like any other
	exception, with some minor differences.

	There are two reasons why sigaltstack is disabled by default:

	* The main problem with sigaltstack() is that the stack
	employed by it is not visible to the GC and it is possible
	that the GC will miss it. 

	* Working sigaltstack support is very much os/kernel/libc
	dependent, so it is disabled by default.

	
* Asynchronous signals
----------------------

	Async signals are used by the runtime to notify a thread that
	it needs to change its state somehow. Currently, it is used
	for implementing thread abort/suspend/resume.
	
	  Handling async signals correctly is a very hard problem,
	since the receiving thread can be in basically any state upon
	receipt of the signal. It can execute managed code, native
	code, it can hold various managed/native locks, or it can be
	in a process of acquiring them, it can be starting up,
	shutting down etc. Most of the C APIs used by the runtime are
	not asynch-signal safe, meaning it is not safe to call them
	from an async signal handler. In particular, the pthread
	locking functions are not async-safe, so if a signal handler
	interrupted code which was in the process of acquiring a lock,
	and the signal handler tries to acquire a lock, the thread
	will deadlock.  Unfortunately, the current signal handling
	code does acquire locks, so sometimes it does deadlock.
	
	When receiving an async signal, the signal handler first tries
	to determine whenever the thread was executing managed code
	when it was interrupted. If it did, then it is safe to
	interrupt it, so a ThreadAbortException is constructed and
	thrown. If the thread was executing native code, then it is
	generally not safe to interrupt it. In this case, the runtime
	sets a flag then returns from the signal handler. That flag is
	checked every time the runtime returns from native code to
	managed code, and the exception is thrown then. Also, a
	platform specific mechanism is used to cause the thread to
	interrupt any blocking operation it might be doing.
	
	The async signal handler is in sigusr1_signal_handler () in
	mini.c, while the logic which determines whenever an exception
	is safe to be thrown is in mono_thread_request_interruption
	().
	
* Stack unwinding during exception handling
-------------------------------------------

	The execution state of a thread during exception handling is
	stored in an arch-specific structure called MonoContext. This
	structure contains the values of all the CPU registers
	relevant during exception handling, which usually means:
	
	- IP (instruction pointer)
	- SP (stack pointer)
	- FP (frame pointer)
	- callee saved registers
	
	Callee saved registers are the registers which are required by
	any procedure to be saved/restored before/after using
	them. They are usually defined by each platforms ABI
	(Application Binary Interface). For example, on x86, they are
	EBX, ESI and EDI.
	
	The code which calls mono_handle_exception () is required to
	construct the initial MonoContext. How this is done depends on
	the caller. For exceptions thrown from managed code, the
	mono_arch_throw_exception helper function saves the values of
	the required registers and passes them to throw_exception (),
	which will save them in the MonoContext structure. For
	exceptions thrown from signal handlers, the MonoContext
	stucture is initialized from the signal info received from the
	kernel.
	
	During exception handling, the runtime needs to 'unwind' the
	stack, i.e.  given the state of the thread at a stack frame,
	construct the state at its callers. Since this is platform
	specific, it is done by a platform specific function called
	mono_arch_find_jit_info ().
	
	Two kinds of stack frames need handling:

	- Managed frames are easier. The JIT will store some
	  information about each managed method, like which
	  callee-saved registers it uses. Based on this information,
	  mono_arch_find_jit_info () can find the values of the
	  registers on the thread stack, and restore them.

	- Native frames are problematic, since we have no information
	  about how to unwind through them. Some compilers generate
	  unwind information for code, some don't. Also, there is no
	  general purpose library to obtain and decode this unwind
	  information. So the runtime uses a different solution. When
	  managed code needs to call into native code, it does through
	  a managed->native wrapper function, which is generated by
	  the JIT. This function is responsible for saving the machine
	  state into a per-thread structure called MonoLMF (Last
	  Managed Frame). These LMF structures are stored on the
	  threads stack, and are linked together using one of their
	  fields. When the unwinder encounters a native frame, it
	  simply pops one entry of the LMF 'stack', and uses it to
	  restore the frame state to the moment before control passed
	  to native code. In effect, all successive native frames are
	  skipped together.
	
Problems/future work
--------------------

1. Async signal safety
----------------------

	The current async signal handling code is not async safe, so
	it can and does deadlock in practice. It needs to be rewritten
	to avoid taking locks at least until it can determine that it
	was interrupting managed code.
	
	Another problem is the managed stack frame unwinding code. It
	blindly assumes that if the IP points into a managed frame,
	then all the callee saved registers + the stack pointer are
	saved on the stack. This is not true if the thread was
	interrupted while executing the method prolog/epilog.
	
2. Raising exceptions from native code
--------------------------------------

	Currently, exceptions are raised by calling
	mono_raise_exception () in the middle of runtime code. This
	has two problems:

	- No cleanup is done, ie. if the caller of the function which
	  throws an exception has taken locks, or allocated memory,
	  that is not cleaned up. For this reason, it is only safe to
	  call mono_raise_exception () 'very close' to managed code,
	  ie. in the icall functions themselves.

	- To allow mono_raise_exception () to unwind through native
	  code, we need to save the LMF structures which can add a lot
	  of overhead even in the common case when no exception is
	  thrown. So this is not zero-cost exception handling.
	
	An alternative might be to use a JNI style
	set-pending-exception API.  Runtime code could call
	mono_set_pending_exception (), then return to its caller with
	an error indication allowing the caller to clean up. When
	execution returns to managed code, then managed->native
	wrapper could check whenever there is a pending exception and
	throw it if neccesary. Since we already check for pending
	thread interruption, this would have no overhead, allowing us
	to drop the LMF saving/restoring code, or significant parts of
	it.
	
4. libunwind
------------

	There is an OSS project called libunwind which is a standalone
	stack unwinding library. It is currently in development, but
	it is used by default by gcc on ia64 for its stack
	unwinding. The mono runtime also uses it on ia64. It has
	several advantages in relation to our current unwinding code:

	- it has a platform independent API, i.e. the same unwinding
	  code can be used on multiple platforms.

	- it can generate unwind tables which are correct at every
	  instruction, i.e.  can be used for unwinding from async
	  signals.

	- given sufficient unwind info generated by a C compiler, it
	  can unwind through C code.

	- most of its API is async-safe

	- it implements the gcc C++ exception handling API, so in
	  theory it can be used to implement mixed-language exception
	  handling (i.e. C++ exception caught in mono, mono exception
	  caught in C++).

	- it is MIT licensed
	
	The biggest problem with libuwind is its platform support. ia64 support is
	complete/well tested, while support for other platforms is missing/incomplete. 
	
	http://www.hpl.hp.com/research/linux/libunwind/