gc-7.2/doc/gcdescr.html

   1 <HTML>
   2 <HEAD>
   3     <TITLE> Conservative GC Algorithmic Overview </TITLE>
   4     <AUTHOR> Hans-J. Boehm, HP Labs (Some of this was written at SGI)</author>
   5 </HEAD>
   6 <BODY>
   7 <H1> <I>This is under construction, and may always be.</i> </h1>
   8 <H1> Conservative GC Algorithmic Overview </h1>
   9 <P>
  10 This is a description of the algorithms and data structures used in our
  11 conservative garbage collector.  I expect the level of detail to increase
  12 with time.  For a survey of GC algorithms, see for example
  13 <A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's
  14 excellent paper</a>.  For an overview of the collector interface,
  15 see <A HREF="gcinterface.html">here</a>.
  16 <P>
  17 This description is targeted primarily at someone trying to understand the
  18 source code.  It specifically refers to variable and function names.
  19 It may also be useful for understanding the algorithms at a higher level.
  20 <P>
  21 The description here assumes that the collector is used in default mode.
  22 In particular, we assume that it used as a garbage collector, and not just
  23 a leak detector.  We initially assume that it is used in stop-the-world,
  24 non-incremental mode, though the presence of the incremental collector
  25 will be apparent in the design.
  26 We assume the default finalization model, but the code affected by that
  27 is very localized.
  28 <H2> Introduction </h2>
  29 The garbage collector uses a modified mark-sweep algorithm.  Conceptually
  30 it operates roughly in four phases, which are performed occasionally
  31 as part of a memory allocation:
  32
  33 <OL>
  34
  35 <LI>
  36 <I>Preparation</i> Each object has an associated mark bit.
  37 Clear all mark bits, indicating that all objects
  38 are potentially unreachable.
  39
  40 <LI>
  41 <I>Mark phase</i> Marks all objects that can be reachable via chains of
  42 pointers from variables.  Often the collector has no real information
  43 about the location of pointer variables in the heap, so it
  44 views all static data areas, stacks and registers as potentially containing
  45 pointers.  Any bit patterns that represent addresses inside
  46 heap objects managed by the collector are viewed as pointers.
  47 Unless the client program has made heap object layout information
  48 available to the collector, any heap objects found to be reachable from
  49 variables are again scanned similarly.
  50
  51 <LI>
  52 <I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked,
  53 objects, and returns them to an appropriate free list for reuse.  This is
  54 not really a separate phase; even in non incremental mode this is operation
  55 is usually performed on demand during an allocation that discovers an empty
  56 free list.  Thus the sweep phase is very unlikely to touch a page that
  57 would not have been touched shortly thereafter anyway.
  58
  59 <LI>
  60 <I>Finalization phase</i> Unreachable objects which had been registered
  61 for finalization are enqueued for finalization outside the collector.
  62
  63 </ol>
  64
  65 <P>
  66 The remaining sections describe the memory allocation data structures,
  67 and then the last 3 collection phases in more detail. We conclude by
  68 outlining some of the additional features implemented in the collector.
  69
  70 <H2>Allocation</h2>
  71 The collector includes its own memory allocator.  The allocator obtains
  72 memory from the system in a platform-dependent way.  Under UNIX, it
  73 uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>.
  74 <P>
  75 Most static data used by the allocator, as well as that needed by the
  76 rest of the garbage collector is stored inside the
  77 <TT>_GC_arrays</tt> structure.
  78 This allows the garbage collector to easily ignore the collectors own
  79 data structures when it searches for root pointers.  Other allocator
  80 and collector internal data structures are allocated dynamically
  81 with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not
  82 allow for deallocation, and is therefore used only for permanent data
  83 structures.
  84 <P>
  85 The allocator allocates objects of different <I>kinds</i>.
  86 Different kinds are handled somewhat differently by certain parts
  87 of the garbage collector.  Certain kinds are scanned for pointers,
  88 others are not.  Some may have per-object type descriptors that
  89 determine pointer locations.  Or a specific kind may correspond
  90 to one specific object layout.  Two built-in kinds are uncollectable.
  91 One (<TT>STUBBORN</tt>) is immutable without special precautions.
  92 In spite of that, it is very likely that most C clients of the
  93 collector currently
  94 use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects.
  95 The <A HREF="http://gcc.gnu.org/java">gcj</a> runtime also makes
  96 heavy use of a kind (allocated with GC_gcj_malloc) that stores
  97 type information at a known offset in method tables.
  98 <P>
  99 The collector uses a two level allocator.  A large block is defined to
 100 be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2,
 101 typically on the order of the page size.
 102 <P>
 103 Large block sizes are rounded up to
 104 the next multiple of <TT>HBLKSIZE</tt> and then allocated by
 105 <TT>GC_allochblk</tt>.  Recent versions of the collector
 106 use an approximate best fit algorithm by keeping free lists for
 107 several large block sizes.
 108 The actual
 109 implementation of <TT>GC_allochblk</tt>
 110 is significantly complicated by black-listing issues
 111 (see below).
 112 <P>
 113 Small blocks are allocated in chunks of size <TT>HBLKSIZE</tt>.
 114 Each chunk is
 115 dedicated to only one object size and kind.
 116 <P>
 117 The allocator maintains
 118 separate free lists for each size and kind of object.
 119 Associated with each kind is an array of free list pointers,
 120 with entry <TT>freelist[</tt><I>i</i><TT>]</tt> pointing to
 121 a free list of size <I>i</i> objects.
 122 In recent versions of the
 123 collector, index <TT>i</tt> is expressed in granules, which are the
 124 minimum allocatable unit, typically 8 or 16 bytes.
 125 The free lists themselves are
 126 linked through the first word in each object (see <TT>obj_link()</tt>
 127 macro).
 128 <P>
 129 Once a large block is split for use in smaller objects, it can only
 130 be used for objects of that size, unless the collector discovers a completely
 131 empty chunk.  Completely empty chunks are restored to the appropriate
 132 large block free list.
 133 <P>
 134 In order to avoid allocating blocks for too many distinct object sizes,
 135 the collector normally does not directly allocate objects of every possible
 136 request size.  Instead request are rounded up to one of a smaller number
 137 of allocated sizes, for which free lists are maintained.  The exact
 138 allocated sizes are computed on demand, but subject to the constraint
 139 that they increase roughly in geometric progression.  Thus objects
 140 requested early in the execution are likely to be allocated with exactly
 141 the requested size, subject to alignment constraints.
 142 See <TT>GC_init_size_map</tt> for details.
 143 <P>
 144 The actual size rounding operation during small object allocation is
 145 implemented as a table lookup in <TT>GC_size_map</tt> which maps
 146 a requested allocation size in bytes to a number of granules.
 147 <P>
 148 Both collector initialization and computation of allocated sizes are
 149 handled carefully so that they do not slow down the small object fast
 150 allocation path.  An attempt to allocate before the collector is initialized,
 151 or before the appropriate <TT>GC_size_map</tt> entry is computed,
 152 will take the same path as an allocation attempt with an empty free list.
 153 This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>)
 154 which performs the appropriate initialization checks.
 155 <P>
 156 In non-incremental mode, we make a decision about whether to garbage collect
 157 whenever an allocation would otherwise have failed with the current heap size.
 158 If the total amount of allocation since the last collection is less than
 159 the heap size divided by <TT>GC_free_space_divisor</tt>, we try to
 160 expand the heap.  Otherwise, we initiate a garbage collection.  This ensures
 161 that the amount of garbage collection work per allocated byte remains
 162 constant.
 163 <P>
 164 The above is in fact an oversimplification of the real heap expansion
 165 and GC triggering heuristic, which adjusts slightly for root size
 166 and certain kinds of
 167 fragmentation.  In particular:
 168 <UL>
 169 <LI> Programs with a large root set size and
 170 little live heap memory will expand the heap to amortize the cost of
 171 scanning the roots.
 172 <LI> Versions 5.x of the collector actually collect more frequently in
 173 nonincremental mode.  The large block allocator usually refuses to split
 174 large heap blocks once the garbage collection threshold is
 175 reached.  This often has the effect of collecting well before the
 176 heap fills up, thus reducing fragmentation and working set size at the
 177 expense of GC time.  Versions 6.x choose an intermediate strategy depending
 178 on how much large object allocation has taken place in the past.
 179 (If the collector is configured to unmap unused pages, versions 6.x
 180 use the 5.x strategy.)
 181 <LI> In calculating the amount of allocation since the last collection we
 182 give partial credit for objects we expect to be explicitly deallocated.
 183 Even if all objects are explicitly managed, it is often desirable to collect
 184 on rare occasion, since that is our only mechanism for coalescing completely
 185 empty chunks.
 186 </ul>
 187 <P>
 188 It has been suggested that this should be adjusted so that we favor
 189 expansion if the resulting heap still fits into physical memory.
 190 In many cases, that would no doubt help.  But it is tricky to do this
 191 in a way that remains robust if multiple application are contending
 192 for a single pool of physical memory.
 193
 194 <H2>Mark phase</h2>
 195
 196 At each collection, the collector marks all objects that are
 197 possibly reachable from pointer variables.  Since it cannot generally
 198 tell where pointer variables are located, it scans the following
 199 <I>root segments</i> for pointers:
 200 <UL>
 201 <LI>The registers.  Depending on the architecture, this may be done using
 202 assembly code, or by calling a <TT>setjmp</tt>-like function which saves
 203 register contents on the stack.
 204 <LI>The stack(s).  In the case of a single-threaded application,
 205 on most platforms this
 206 is done by scanning the memory between (an approximation of) the current
 207 stack pointer and <TT>GC_stackbottom</tt>.  (For Itanium, the register stack
 208 scanned separately.)  The <TT>GC_stackbottom</tt> variable is set in
 209 a highly platform-specific way depending on the appropriate configuration
 210 information in <TT>gcconfig.h</tt>.  Note that the currently active
 211 stack needs to be scanned carefully, since callee-save registers of
 212 client code may appear inside collector stack frames, which may
 213 change during the mark process.  This is addressed by scanning
 214 some sections of the stack "eagerly", effectively capturing a snapshot
 215 at one point in time.
 216 <LI>Static data region(s).  In the simplest case, this is the region
 217 between <TT>DATASTART</tt> and <TT>DATAEND</tt>, as defined in
 218 <TT>gcconfig.h</tt>.  However, in most cases, this will also involve
 219 static data regions associated with dynamic libraries.  These are
 220 identified by the mostly platform-specific code in <TT>dyn_load.c</tt>.
 221 </ul>
 222 The marker maintains an explicit stack of memory regions that are known
 223 to be accessible, but that have not yet been searched for contained pointers.
 224 Each stack entry contains the starting address of the block to be scanned,
 225 as well as a descriptor of the block.  If no layout information is
 226 available for the block, then the descriptor is simply a length.
 227 (For other possibilities, see <TT>gc_mark.h</tt>.)
 228 <P>
 229 At the beginning of the mark phase, all root segments
 230 (as described above) are pushed on the
 231 stack by <TT>GC_push_roots</tt>.  (Registers and eagerly processed
 232 stack sections are processed by pushing the referenced objects instead
 233 of the stack section itself.)  If <TT>ALL_INTERIOR_PTRS</tt> is not
 234 defined, then stack roots require special treatment.  In this case, the
 235 normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt>
 236 explicitly checks for interior pointers and pushes descriptors for target
 237 objects.
 238 <P>
 239 The marker is structured to allow incremental marking.
 240 Each call to <TT>GC_mark_some</tt> performs a small amount of
 241 work towards marking the heap.
 242 It maintains
 243 explicit state in the form of <TT>GC_mark_state</tt>, which
 244 identifies a particular sub-phase.  Some other pieces of state, most
 245 notably the mark stack, identify how much work remains to be done
 246 in each sub-phase.  The normal progression of mark states for
 247 a stop-the-world collection is:
 248 <OL>
 249 <LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked
 250 objects.  In this case <TT>GC_objects_are_marked</tt> will simultaneously
 251 be false, so the mark state is advanced to
 252 <LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push
 253 uncollectable objects, roots, and then mark everything reachable from them.
 254 <TT>Scan_ptr</tt> is advanced through the heap until all uncollectable
 255 objects are pushed, and objects reachable from them are marked.
 256 At that point, the next call to <TT>GC_mark_some</tt> calls
 257 <TT>GC_push_roots</tt> to push the roots.  It the advances the
 258 mark state to
 259 <LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is
 260 empty, all reachable objects are marked.  Once in this state, we work
 261 only on emptying the mark stack.  Once this is completed, the state
 262 changes to
 263 <LI> <TT>MS_NONE</tt> indicating that reachable objects are marked.
 264 </ol>
 265
 266 The core mark routine <TT>GC_mark_from</tt>, is called
 267 repeatedly by several of the sub-phases when the mark stack starts to fill
 268 up.  It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state
 269 to empty the mark stack.
 270 The routine is designed to only perform a limited amount of marking at
 271 each call, so that it can also be used by the incremental collector.
 272 It is fairly carefully tuned, since it usually consumes a large majority
 273 of the garbage collection time.
 274 <P>
 275 The fact that it perform a only a small amount of work per call also
 276 allows it to be used as the core routine of the parallel marker.  In that
 277 case it is normally invoked on thread-private mark stacks instead of the
 278 global mark stack.  More details can be found in
 279 <A HREF="scale.html">scale.html</a>
 280 <P>
 281 The marker correctly handles mark stack overflows.  Whenever the mark stack
 282 overflows, the mark state is reset to <TT>MS_INVALID</tt>.
 283 Since there are already marked objects in the heap,
 284 this eventually forces a complete
 285 scan of the heap, searching for pointers, during which any unmarked objects
 286 referenced by marked objects are again pushed on the mark stack.  This
 287 process is repeated until the mark phase completes without a stack overflow.
 288 Each time the stack overflows, an attempt is made to grow the mark stack.
 289 All pieces of the collector that push regions onto the mark stack have to be
 290 careful to ensure forward progress, even in case of repeated mark stack
 291 overflows.  Every mark attempt results in additional marked objects.
 292 <P>
 293 Each mark stack entry is processed by examining all candidate pointers
 294 in the range described by the entry.  If the region has no associated
 295 type information, then this typically requires that each 4-byte aligned
 296 quantity (8-byte aligned with 64-bit pointers) be considered a candidate
 297 pointer.
 298 <P>
 299 We determine whether a candidate pointer is actually the address of
 300 a heap block.  This is done in the following steps:
 301 <NL>
 302 <LI> The candidate pointer is checked against rough heap bounds.
 303 These heap bounds are maintained such that all actual heap objects
 304 fall between them.  In order to facilitate black-listing (see below)
 305 we also include address regions that the heap is likely to expand into.
 306 Most non-pointers fail this initial test.
 307 <LI> The candidate pointer is divided into two pieces; the most significant
 308 bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and
 309 the least significant bits specify an offset within that page.
 310 (A hardware page may actually consist of multiple such pages.
 311 HBLKSIZE is usually the page size divided by a small power of two.)
 312 <LI>
 313 The page address part of the candidate pointer is looked up in a
 314 <A HREF="tree.html">table</a>.
 315 Each table entry contains either 0, indicating that the page is not part
 316 of the garbage collected heap, a small integer <I>n</i>, indicating
 317 that the page is part of large object, starting at least <I>n</i> pages
 318 back, or a pointer to a descriptor for the page.  In the first case,
 319 the candidate pointer i not a true pointer and can be safely ignored.
 320 In the last two cases, we can obtain a descriptor for the page containing
 321 the beginning of the object.
 322 <LI>
 323 The starting address of the referenced object is computed.
 324 The page descriptor contains the size of the object(s)
 325 in that page, the object kind, and the necessary mark bits for those
 326 objects.  The size information can be used to map the candidate pointer
 327 to the object starting address.  To accelerate this process, the page header
 328 also contains a pointer to a precomputed map of page offsets to displacements
 329 from the beginning of an object.  The use of this map avoids a
 330 potentially slow integer remainder operation in computing the object
 331 start address.
 332 <LI>
 333 The mark bit for the target object is checked and set.  If the object
 334 was previously unmarked, the object is pushed on the mark stack.
 335 The descriptor is read from the page descriptor.  (This is computed
 336 from information <TT>GC_obj_kinds</tt> when the page is first allocated.)
 337 </nl>
 338 <P>
 339 At the end of the mark phase, mark bits for left-over free lists are cleared,
 340 in case a free list was accidentally marked due to a stray pointer.
 341
 342 <H2>Sweep phase</h2>
 343
 344 At the end of the mark phase, all blocks in the heap are examined.
 345 Unmarked large objects are immediately returned to the large object free list.
 346 Each small object page is checked to see if all mark bits are clear.
 347 If so, the entire page is returned to the large object free list.
 348 Small object pages containing some reachable object are queued for later
 349 sweeping, unless we determine that the page contains very little free
 350 space, in which case it is not examined further.
 351 <P>
 352 This initial sweep pass touches only block headers, not
 353 the blocks themselves.  Thus it does not require significant paging, even
 354 if large sections of the heap are not in physical memory.
 355 <P>
 356 Nonempty small object pages are swept when an allocation attempt
 357 encounters an empty free list for that object size and kind.
 358 Pages for the correct size and kind are repeatedly swept until at
 359 least one empty block is found.  Sweeping such a page involves
 360 scanning the mark bit array in the page header, and building a free
 361 list linked through the first words in the objects themselves.
 362 This does involve touching the appropriate data page, but in most cases
 363 it will be touched only just before it is used for allocation.
 364 Hence any paging is essentially unavoidable.
 365 <P>
 366 Except in the case of pointer-free objects, we maintain the invariant
 367 that any object in a small object free list is cleared (except possibly
 368 for the link field).  Thus it becomes the burden of the small object
 369 sweep routine to clear objects.  This has the advantage that we can
 370 easily recover from accidentally marking a free list, though that could
 371 also be handled by other means.  The collector currently spends a fair
 372 amount of time clearing objects, and this approach should probably be
 373 revisited.
 374 <P>
 375 In most configurations, we use specialized sweep routines to handle common
 376 small object sizes.  Since we allocate one mark bit per word, it becomes
 377 easier to examine the relevant mark bits if the object size divides
 378 the word length evenly.  We also suitably unroll the inner sweep loop
 379 in each case.  (It is conceivable that profile-based procedure cloning
 380 in the compiler could make this unnecessary and counterproductive.  I
 381 know of no existing compiler to which this applies.)
 382 <P>
 383 The sweeping of small object pages could be avoided completely at the expense
 384 of examining mark bits directly in the allocator.  This would probably
 385 be more expensive, since each allocation call would have to reload
 386 a large amount of state (e.g. next object address to be swept, position
 387 in mark bit table) before it could do its work.  The current scheme
 388 keeps the allocator simple and allows useful optimizations in the sweeper.
 389
 390 <H2>Finalization</h2>
 391 Both <TT>GC_register_disappearing_link</tt> and
 392 <TT>GC_register_finalizer</tt> add the request to a corresponding hash
 393 table.  The hash table is allocated out of collected memory, but
 394 the reference to the finalizable object is hidden from the collector.
 395 Currently finalization requests are processed non-incrementally at the
 396 end of a mark cycle.
 397 <P>
 398 The collector makes an initial pass over the table of finalizable objects,
 399 pushing the contents of unmarked objects onto the mark stack.
 400 After pushing each object, the marker is invoked to mark all objects
 401 reachable from it.  The object itself is not explicitly marked.
 402 This assures that objects on which a finalizer depends are neither
 403 collected nor finalized.
 404 <P>
 405 If in the process of marking from an object the
 406 object itself becomes marked, we have uncovered
 407 a cycle involving the object.  This usually results in a warning from the
 408 collector.  Such objects are not finalized, since it may be
 409 unsafe to do so.  See the more detailed
 410 <A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html"> discussion of finalization semantics</a>.
 411 <P>
 412 Any objects remaining unmarked at the end of this process are added to
 413 a queue of objects whose finalizers can be run.  Depending on collector
 414 configuration, finalizers are dequeued and run either implicitly during
 415 allocation calls, or explicitly in response to a user request.
 416 (Note that the former is unfortunately both the default and not generally safe.
 417 If finalizers perform synchronization, it may result in deadlocks.
 418 Nontrivial finalizers generally need to perform synchronization, and
 419 thus require a different collector configuration.)
 420 <P>
 421 The collector provides a mechanism for replacing the procedure that is
 422 used to mark through objects.  This is used both to provide support for
 423 Java-style unordered finalization, and to ignore certain kinds of cycles,
 424 <I>e.g.</i> those arising from C++ implementations of virtual inheritance.
 425
 426 <H2>Generational Collection and Dirty Bits</h2>
 427 We basically use the concurrent and generational GC algorithm described in
 428 <A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi91.ps.Z">"Mostly Parallel Garbage Collection"</a>,
 429 by Boehm, Demers, and Shenker.
 430 <P>
 431 The most significant modification is that
 432 the collector always starts running in the allocating thread.
 433 There is no separate garbage collector thread.  (If parallel GC is
 434 enabled, helper threads may also be woken up.)
 435 If an allocation attempt either requests a large object, or encounters
 436 an empty small object free list, and notices that there is a collection
 437 in progress, it immediately performs a small amount of marking work
 438 as described above.
 439 <P>
 440 This change was made both because we wanted to easily accommodate
 441 single-threaded environments, and because a separate GC thread requires
 442 very careful control over the scheduler to prevent the mutator from
 443 out-running the collector, and hence provoking unneeded heap growth.
 444 <P>
 445 In incremental mode, the heap is always expanded when we encounter
 446 insufficient space for an allocation.  Garbage collection is triggered
 447 whenever we notice that more than
 448 <TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt>
 449 bytes of allocation have taken place.
 450 After <TT>GC_full_freq</tt> minor collections a major collection
 451 is started.
 452 <P>
 453 All collections initially run uninterrupted until a predetermined
 454 amount of time (50 msecs by default) has expired.  If this allows
 455 the collection to complete entirely, we can avoid correcting
 456 for data structure modifications during the collection.  If it does
 457 not complete, we return control to the mutator, and perform small
 458 amounts of additional GC work during those later allocations that
 459 cannot be satisfied from small object free lists. When marking completes,
 460 the set of modified pages is retrieved, and we mark once again from
 461 marked objects on those pages, this time with the mutator stopped.
 462 <P>
 463 We keep track of modified pages using one of several distinct mechanisms:
 464 <OL>
 465 <LI>
 466 Through explicit mutator cooperation.  Currently this requires
 467 the use of <TT>GC_malloc_stubborn</tt>, and is rarely used.
 468 <LI>
 469 (<TT>MPROTECT_VDB</tt>) By write-protecting physical pages and
 470 catching write faults.  This is
 471 implemented for many Unix-like systems and for win32.  It is not possible
 472 in a few environments.
 473 <LI>
 474 (<TT>PROC_VDB</tt>) By retrieving dirty bit information from /proc.
 475 (Currently only Sun's
 476 Solaris supports this.  Though this is considerably cleaner, performance
 477 may actually be better with mprotect and signals.)
 478 <LI>
 479 (<TT>PCR_VDB</tt>) By relying on an external dirty bit implementation, in this
 480 case the one in Xerox PCR.
 481 <LI>
 482 (<TT>DEFAULT_VDB</tt>) By treating all pages as dirty.  This is the default if
 483 none of the other techniques is known to be usable, and
 484 <TT>GC_malloc_stubborn</tt> is not used.  Practical only for testing, or if
 485 the vast majority of objects use <TT>GC_malloc_stubborn</tt>.
 486 </ol>
 487
 488 <H2>Black-listing</h2>
 489
 490 The collector implements <I>black-listing</i> of pages, as described
 491 in
 492 <A HREF="http://www.acm.org/pubs/citations/proceedings/pldi/155090/p197-boehm/">
 493 Boehm, ``Space Efficient Conservative Collection'', PLDI '93</a>, also available
 494 <A HREF="papers/pldi93.ps.Z">here</a>.
 495 <P>
 496 During the mark phase, the collector tracks ``near misses'', i.e. attempts
 497 to follow a ``pointer'' to just outside the garbage-collected heap, or
 498 to a currently unallocated page inside the heap.  Pages that have been
 499 the targets of such near misses are likely to be the targets of
 500 misidentified ``pointers'' in the future.  To minimize the future
 501 damage caused by such misidentifications they will be allocated only to
 502 small pointerfree objects.
 503 <P>
 504 The collector understands two different kinds of black-listing.  A
 505 page may be black listed for interior pointer references
 506 (<TT>GC_add_to_black_list_stack</tt>), if it was the target of a near
 507 miss from a location that requires interior pointer recognition,
 508 <I>e.g.</i> the stack, or the heap if <TT>GC_all_interior_pointers</tt>
 509 is set.  In this case, we also avoid allocating large blocks that include
 510 this page.
 511 <P>
 512 If the near miss came from a source that did not require interior
 513 pointer recognition, it is black-listed with
 514 <TT>GC_add_to_black_list_normal</tt>.
 515 A page black-listed in this way may appear inside a large object,
 516 so long as it is not the first page of a large object.
 517 <P>
 518 The <TT>GC_allochblk</tt> routine respects black-listing when assigning
 519 a block to a particular object kind and size.  It occasionally
 520 drops (i.e. allocates and forgets) blocks that are completely black-listed
 521 in order to avoid excessively long large block free lists containing
 522 only unusable blocks.  This would otherwise become an issue
 523 if there is low demand for small pointerfree objects.
 524
 525 <H2>Thread support</h2>
 526 We support several different threading models.  Unfortunately Pthreads,
 527 the only reasonably well standardized thread model, supports too narrow
 528 an interface for conservative garbage collection.  There appears to be
 529 no completely portable way to allow the collector
 530 to coexist with various Pthreads
 531 implementations.  Hence we currently support only the more
 532 common Pthreads implementations.
 533 <P>
 534 In particular, it is very difficult for the collector to stop all other
 535 threads in the system and examine the register contents.  This is currently
 536 accomplished with very different mechanisms for some Pthreads
 537 implementations.  The Solaris implementation temporarily disables much
 538 of the user-level threads implementation by stopping kernel-level threads
 539 ("lwp"s).  The Linux/HPUX/OSF1 and Irix implementations sends signals to
 540 individual Pthreads and has them wait in the signal handler.
 541 <P>
 542 The Linux and Irix implementations use
 543 only documented Pthreads calls, but rely on extensions to their semantics.
 544 The Linux implementation <TT>linux_threads.c</tt> relies on only very
 545 mild extensions to the pthreads semantics, and already supports a large number
 546 of other Unix-like pthreads implementations.  Our goal is to make this the
 547 only pthread support in the collector.
 548 <P>
 549 (The Irix implementation is separate only for historical reasons and should
 550 clearly be merged.  The current Solaris implementation probably performs
 551 better in the uniprocessor case, but does not support thread operations in the
 552 collector.  Hence it cannot support the parallel marker.)
 553 <P>
 554 All implementations must
 555 intercept thread creation and a few other thread-specific calls to allow
 556 enumeration of threads and location of thread stacks.  This is current
 557 accomplished with <TT># define</tt>'s in <TT>gc.h</tt>
 558 (really <TT>gc_pthread_redirects.h</tt>), or optionally
 559 by using ld's function call wrapping mechanism under Linux.
 560 <P>
 561 Recent versions of the collector support several facilities to enhance
 562 the processor-scalability and thread performance of the collector.
 563 These are discussed in more detail <A HREF="scale.html">here</a>.
 564 We briefly outline the data approach to thread-local allocation in the
 565 next section.
 566 <H2>Thread-local allocation</h2>
 567 If thread-local allocation is enabled, the collector keeps separate
 568 arrays of free lists for each thread.  Thread-local allocation
 569 is currently only supported on a few platforms.
 570 <P>
 571 The free list arrays associated
 572 with each thread are only used to satisfy requests for objects that
 573 are  both very small, and belong to one of a small number of well-known
 574 kinds.  These currently include "normal" and pointer-free objects.
 575 Depending on the configuration, "gcj" objects may also be included.
 576 <P>
 577 Thread-local free list entries contain either a pointer to the first
 578 element of a free list, or they contain a counter of the number of
 579 allocation granules, corresponding to objects of this size,
 580 allocated so far.  Initially they contain the
 581 value one, i.e. a small counter value.
 582 <P>
 583 Thread-local allocation allocates directly through the global
 584 allocator, if the object is of a size or kind not covered by the
 585 local free lists.
 586 <P>
 587 If there is an appropriate local free list, the allocator checks whether it
 588 contains a sufficiently small counter value.  If so, the counter is simply
 589 incremented by the counter value, and the global allocator is used.
 590 In this way, the initial few allocations of a given size bypass the local
 591 allocator.  A thread that only allocates a handful of objects of a given
 592 size will not build up its own free list for that size.  This avoids
 593 wasting space for unpopular objects sizes or kinds.
 594 <P>
 595 Once the counter passes a threshold, <TT>GC_malloc_many</tt> is called
 596 to allocate roughly <TT>HBLKSIZE</tt> space and put it on the corresponding
 597 local free list.  Further allocations of that size and kind then use
 598 this free list, and no longer need to acquire the allocation lock.
 599 The allocation procedure is otherwise similar to the global free lists.
 600 The local free lists are also linked using the first word in the object.
 601 In most cases this means they require considerably less time.
 602 <P>
 603 Local free lists are treated buy most of the rest of the collector
 604 as though they were in-use reachable data.  This requires some care,
 605 since pointer-free objects are not normally traced, and hence a special
 606 tracing procedure is required to mark all objects on pointer-free and
 607 gcj local free lists.
 608 <P>
 609 On thread exit, any remaining thread-local free list entries are
 610 transferred back to the global free list.
 611 <P>
 612 Note that if the collector is configured for thread-local allocation,
 613 GC versions before 7 do not invoke the thread-local allocator by default.
 614 <TT>GC_malloc</tt> only uses thread-local allocation in version 7 and later.
 615 <P>
 616 For some more details see <A HREF="scale.html">here</a>, and the
 617 technical report entitled
 618 <A HREF="http://www.hpl.hp.com/techreports/2000/HPL-2000-165.html">
 619 ``Fast Multiprocessor Memory Allocation and Garbage Collection''
 620 </a>
 621 <P>
 622 <HR>
 623 <P>
 624 Comments are appreciated.  Please send mail to
 625 <A HREF="mailto:gc@linux.hpl.hp.com"><tt>gc@linux.hpl.hp.com</tt></a>
 626 (GC mailing list) or
 627 <A HREF="mailto:Hans.Boehm@hp.com"><TT>Hans.Boehm@hp.com</tt></a>
 628 <P>
 629 This is a modified copy of a page written while the author was at SGI.
 630 The original was <A HREF="http://reality.sgi.com/boehm/gcdescr.html">here</a>.
 631 </body>
 632 </html>