docs/precise-gc

   1 While the Boehm GC is quite good, we need to move to a
   2 precise, generational GC for better performance and smaller
   3 memory usage (no false-positives memory retentions with big
   4 allocations).
   5
   6 This is a large task, but it can be done in steps.
   7
   8 1) use the GCJ support to mark reference fields in objects, so
   9 scanning the heap is faster. This is mostly done already, needs
  10 checking that it is always used correctly (big objects, arrays).
  11 There are also some data structures we use in the runtime that are
  12 currently untyped that are allocated in the Gc heap and used to
  13 keep references to GC objects. We need to make them typed as to
  14 precisely track GC references or make them non-GC memory,
  15 by using more the GC hnadle support code (MonoGHashTable, MonoDomain,
  16 etc).
  17
  18 2) don't include in the static roots the .bss and .data segments
  19 to save in scanning time and limit false-positives. This is mostly
  20 done already.
  21
  22 3) keep track precisely of stack locations and registers in native
  23 code generation. This basically requires the regalloc rewrite code
  24 first, if we don't want to duplicate much of it. This is the hardest
  25 task of all, since proving it's correctness is very hard. Some tricks,
  26 like having a build that injects GC.Collect() after every few simple
  27 operations may help. We also need to decide if we want to handle safe
  28 points at calls and back jumps only or at every instruction. The latter
  29 case is harder to implement and requires we keep around much more data
  30 (it potentially makes for faster stop-the-world phases).
  31 The first case requires us to be able to advance a thread until it
  32 reaches the next safe point: this can be done with the same techniques
  33 used by a debugger. We already need something like this to handle
  34 safely aborts happening in the middle of a prolog in managed code,
  35 for example, so this could be an additional sub-task that can be done
  36 separately from the GC work.
  37 Note that we can adapt the libgc code to use the info we collect
  38 when scanning the stack in managed methods and still use the conservative
  39 approach for the unmanaged stack, until we have our own collector,
  40 which requires we define a proper icall interface to switch from managed
  41 to unmanaged code (hwo to we handle object references in the icall
  42 implementations, for example).
  43
  44 4) we could make use of the generational capabilities of the
  45 Boehm GC, but not with the current method involving signals which
  46 may create incompatibilities and is not supported on all platforms.
  47 We need to start using write barriers: they will be required anyway
  48 for the generational GC we'll use. When a field holding a reference
  49 is changed in an object (or an item in an array), we mark the card
  50 or page where the field is stored as dirty. Later, when a collection
  51 is run, only objects in pages marked as dirty are scanned for
  52 references instead of the whole heap. This could take a few days to
  53 implement and probably much more time to debug if all the cases were
  54 not catched:-)
  55
  56 5) actually write the new generational and precise collector. There are
  57 several examples out there as open source projects, though the CLR
  58 needs some specific semantics so the code needs to be written from
  59 scratch anyway. Compared to item 3 this is relatively easer and it can
  60 be tested outside of mono, too, until mono is ready to use it.
  61 The important features needed:
  62 *) precise, so there is no false positive memory retention
  63 *) generational to reduce collection times
  64 *) pointer-hopping allocation to reduce alloc time
  65 *) possibly per-thread lock-free allocation
  66 *) handle weakrefs and finalizers with the CLR semantics
  67
  68 Note: some GC engines use a single mmap area, because it makes
  69 handling generations and the implementation much easier, but this also
  70 limits the expension of the heap, so people may need to use a command-line
  71 option to set the max heap size etc. It would be better to have a design
  72 that allows mmapping a few megabytes chunks at a time.
  73
  74 The different tasks can be done in parallel. 1, 2 and 4 can be done in time
  75 for the mono 1.2 release. Parts of 3 and 5 could be done as well.
  76 The complete switch is supposed to happen with the mono 2.0 release.
  77