* use a pool of MBState structures to speedup monoburg instead of using a mempool. * the decode tables in the burg-generated could use short instead of int (this should save about 1 KB) * track the use of ESP, so that we can avoid the x86_lea in the epilog Other Ideas: * the ORP people avoids optimizations inside catch handlers - just to save memory (for example allocation of strings - instead they allocate strings when the code is executed (like the --shared option)). But there are only a few functions using catch handlers, so I consider this a minor issue. * some performance critical functions should be inlined. These include: - mono_mempool_alloc and mono_mempool_alloc0 - EnterCriticalSection and LeaveCriticalSection - TlsSetValue - mono_metadata_row_col - mono_g_hash_table_lookup - mono_domain_get * if a function which involves locking is called from another function which acquires the same lock, it might be useful to create a separate _inner version of the function which does not re-acquire the lock. This is a perf win only if the function is called a lot of times, like mono_get_method. * we can avoid calls to class init trampolines if the are multiple calls to the same trampoline in the same basic block. See: http://bugzilla.ximian.com/show_bug.cgi?id=51096 Usability --------- * Remove the various optimization list of flags description, have an extra --help-optimizations flag. * Remove the various graph options, have a separate --help-graph for that list.