The collector uses a large amount of conditional compilation in order to deal with platform dependencies. This violates a number of known coding standards. On the other hand, it seems to be the only practical way to support this many platforms without excessive code duplication. A few guidelines have mostly been followed in order to keep this manageable: 1) #if and #ifdef directives are properly indented whenever easily possible. All known C compilers allow whitespace between the "#" and the "if" to make this possible. ANSI C also allows white space before the "#", though we avoid that. It has the known disadvantages that it differs from the normal GNU conventions, and that it makes patches larger than otherwise necessary. In my opinion, it's still well worth it, for the same reason that we indent ordinary "if" statements. 2) Whenever possible, tests are performed on the macros defined in gcconfig.h instead of directly testing platform-specific predefined macros. This makes it relatively easy to adapt to new compilers with a different set of predefined macros. Currently these macros generally identify platforms instead of features. In many cases, this is a mistake. Many of the tested configuration macros are at least somewhat defined in either include/private/gcconfig.h or in Makefile.direct. Here is an attempt at documenting these macros: (Thanks to Walter Bright for suggesting this. This is a work in progress) MACRO EXPLANATION ----- ----------- GC_DEBUG Tested by gc.h. Causes all-upper-case macros to expand to calls to debug versions of collector routines. GC_DEBUG_REPLACEMENT Tested by gc.h. Causes GC_MALLOC/REALLOC() to be defined as GC_debug_malloc/realloc_replacement(). GC_NO_THREAD_REDIRECTS Tested by gc.h. Prevents redirection of thread creation routines etc. to GC_ versions. Requires the programmer to explicitly handle thread registration. GC_NO_THREAD_DECLS Tested by gc.h. MS Windows only. Do not declare Windows thread creation routines and do not include windows.h. GC_UNDERSCORE_STDCALL Tested by gc.h. Explicitly prefix exported/imported WINAPI (__stdcall) symbols with '_' (underscore). Could be used with MinGW (for x86) compiler (in conjunction with GC_DLL) to follow MS conventions for __stdcall symbols naming. _ENABLE_ARRAYNEW #define'd by the Digital Mars C++ compiler when operator new[] and delete[] are separately overloadable. Used in gc_cpp.h. _DLL Tested by gc_config_macros.h. Defined by Visual C++ if runtime dynamic libraries are in use. Used (only if none of GC_DLL, GC_NOT_DLL, __GNUC__ are defined) to test whether __declspec(dllimport) needs to be added to declarations to support the case in which the collector is in a DLL. GC_DLL Defined by user if dynamic libraries are being built or used. Also set by gc.h if _DLL is defined (except for mingw) while GC_NOT_DLL and __GNUC__ are both undefined. This is the macro that is tested internally to determine whether the GC is in its own dynamic library. May need to be set by clients before including gc.h. Note that inside the GC implementation it indicates that the collector is in its own dynamic library, should export its symbols, etc. But in clients it indicates that the GC resides in a different DLL, its entry points should be referenced accordingly, and precautions may need to be taken to properly deal with statically allocated variables in the main program. Used for MS Windows. Also used by GCC v4+ (only when the dynamic shared library is being built) to hide internally used symbols. GC_NOT_DLL User-settable macro that overrides _DLL, e.g. if runtime dynamic libraries are used, but the collector is in a static library. Tested by gc_config_macros.h. GC_REQUIRE_WCSDUP Force GC to export GC_wcsdup() (the Unicode version of GC_strdup); could be useful in the leak-finding mode. These define arguments influence the collector configuration: FIND_LEAK Causes GC_find_leak to be initially set. This causes the collector to assume that all inaccessible objects should have been explicitly deallocated, and reports exceptions. Finalization and the test program are not usable in this mode. GC_FINDLEAK_DELAY_FREE Turns on deferred freeing of objects in the leak-finding mode letting the collector to detect alter-object-after-free errors as well as detect leaked objects sooner (instead of only when program terminates). Has no effect if SHORT_DBG_HDRS. GC_ABORT_ON_LEAK Causes the application to be terminated once leaked or smashed (corrupted on use-after-free) objects are found (after printing the information about that objects). SUNOS5SIGS Solaris-like signal handling. This is probably misnamed, since it really doesn't guarantee much more than POSIX. Currently set only for Solaris2.X, HPUX, and DRSNX. Should probably be set for some other platforms. PCR Set if the collector is being built as part of the Xerox Portable Common Runtime. USE_COMPILER_TLS Assume the existence of __thread-style thread-local storage. Set automatically for thread-local allocation with the HP/UX vendor compiler. Usable with gcc on sufficiently up-to-date ELF platforms. IMPORTANT: Any of the _THREADS options must normally also be defined in the client before including gc.h. This redefines thread primitives to invoke the GC_ versions instead. Alternatively, linker-based symbol interception can be used on a few platforms. GC_THREADS Should set the appropriate one of the below macros, except GC_WIN32_PTHREADS, which must be set explicitly. Tested by gc.h. GC_SOLARIS_THREADS Enables support for Solaris pthreads. Must also define _REENTRANT. GC_IRIX_THREADS Enables support for Irix pthreads. See README.sgi. GC_HPUX_THREADS Enables support for HP/UX 11 pthreads. Also requires _REENTRANT or _POSIX_C_SOURCE=199506L. See README.hp. GC_LINUX_THREADS Enables support for Xavier Leroy's Linux threads or NPTL threads. See README.linux. _REENTRANT may also be required. GC_OSF1_THREADS Enables support for Tru64 pthreads. GC_FREEBSD_THREADS Enables support for FreeBSD pthreads. Appeared to run into some underlying thread problems. GC_NETBSD_THREADS Enables support for NetBSD pthreads. GC_OPENBSD_THREADS Enables support for OpenBSD pthreads. GC_DARWIN_THREADS Enables support for Mac OS X pthreads. GC_AIX_THREADS Enables support for IBM AIX threads. GC_DGUX386_THREADS Enables support for DB/UX on I386 threads. See README.DGUX386. (Probably has not been tested recently.) GC_WIN32_THREADS Enables support for Win32 threads. That makes sense for this Makefile only under Cygwin. GC_WIN32_PTHREADS Enables support for Ming32 pthreads. This cannot be enabled automatically by GC_THREADS, which would assume Win32 native threads. PTW32_STATIC_LIB Causes the static version of the Mingw pthreads library to be used. Requires GC_WIN32_PTHREADS. GC_PTHREADS_PARAMARK Causes pthread-based parallel mark implementation to be used even if GC_WIN32_PTHREADS is undefined. (Useful for WinCE.) ALL_INTERIOR_POINTERS Allows all pointers to the interior of objects to be recognized. (See gc_priv.h for consequences.) Alternatively, GC_all_interior_pointers can be set at process initialization time. SMALL_CONFIG Tries to tune the collector for small heap sizes, usually causing it to use less space in such situations. Incremental collection no longer works in this case. Also, removes some statistic-printing code. Turns off some optimization algorithms (like data prefetching in the mark routine). GC_DISABLE_INCREMENTAL Turn off the incremental collection support. NO_INCREMENTAL Causes the gctest program to not invoke the incremental collector. This has no impact on the generated library, only on the test program. (This is often useful for debugging failures unrelated to incremental GC.) LARGE_CONFIG Tunes the collector for unusually large heaps. Necessary for heaps larger than about 4 GiB on most (64-bit) machines. Recommended for heaps larger than about 500 MiB. Not recommended for embedded systems. Could be used in conjunction with SMALL_CONFIG to generate smaller code (by disabling incremental collection support, statistic printing and some optimization algorithms). DONT_ADD_BYTE_AT_END Meaningful only with ALL_INTERIOR_POINTERS or GC_all_interior_pointers = 1. Normally ALL_INTERIOR_POINTERS causes all objects to be padded so that pointers just past the end of an object can be recognized. This can be expensive. (The padding is normally more than one byte due to alignment constraints.) DONT_ADD_BYTE_AT_END disables the padding. NO_EXECUTE_PERMISSION May cause some or all of the heap to not have execute permission, i.e. it may be impossible to execute code from the heap. Currently this only affects the incremental collector on UNIX machines. It may greatly improve its performance, since this may avoid some expensive cache synchronization. Alternatively, GC_set_pages_executable can be called at the process initialization time. GC_NO_OPERATOR_NEW_ARRAY Declares that the C++ compiler does not support the new syntax "operator new[]" for allocating and deleting arrays. See gc_cpp.h for details. No effect on the C part of the collector. This is defined implicitly in a few environments. Must also be defined by clients that use gc_cpp.h. REDIRECT_MALLOC= Causes malloc to be defined as alias for X. Unless the following macros are defined, realloc is also redirected to GC_realloc, and free is redirected to GC_free. Calloc and str[n]dup are redefined in terms of the new malloc. X should be either GC_malloc or GC_malloc_uncollectable, or GC_debug_malloc_replacement. (The latter invokes GC_debug_malloc with dummy source location information, but still results in properly remembered call stacks on Linux/X86 and Solaris/SPARC. It requires that the following two macros also be used.) The former is occasionally useful for working around leaks in code you don't want to (or can't) look at. It may not work for existing code, but it often does. Neither works on all platforms, since some ports use malloc or calloc to obtain system memory. (Probably works for UNIX, and Win32.) If you build with DBG_HDRS_ALL, you should only use GC_debug_malloc_replacement as a malloc replacement. REDIRECT_REALLOC= Causes GC_realloc to be redirected to X. The canonical use is REDIRECT_REALLOC=GC_debug_realloc_replacement, together with REDIRECT_MALLOC=GC_debug_malloc_replacement to generate leak reports with call stacks for both malloc and realloc. This also requires REDIRECT_FREE. REDIRECT_FREE= Causes free to be redirected to X. The canonical use is REDIRECT_FREE=GC_debug_free. IGNORE_FREE Turns calls to free into a no-op. Only useful with REDIRECT_MALLOC. NO_DEBUGGING Removes GC_dump and the debugging routines it calls. Reduces code size slightly at the expense of debuggability. DEBUG_THREADS Turn on printing additional thread-support debugging information. JAVA_FINALIZATION Makes it somewhat safer to finalize objects out of order by specifying a nonstandard finalization mark procedure (see finalize.c). Objects reachable from finalizable objects will be marked in a separate post-pass, and hence their memory won't be reclaimed. Not recommended unless you are implementing a language that specifies these semantics. Since 5.0, determines only the initial value of GC_java_finalization variable. FINALIZE_ON_DEMAND Causes finalizers to be run only in response to explicit GC_invoke_finalizers() calls. In 5.0 this became runtime adjustable, and this only determines the initial value of GC_finalize_on_demand. ATOMIC_UNCOLLECTABLE Includes code for GC_malloc_atomic_uncollectable. This is useful if either the vendor malloc implementation is poor, or if REDIRECT_MALLOC is used. MARK_BIT_PER_GRANULE Requests that a mark bit (or often byte) be allocated for each allocation granule, as opposed to each object. This often improves speed, possibly at some cost in space and/or cache footprint. Normally it is best to let this decision be made automatically depending on platform. MARK_BIT_PER_OBJ Requests that a mark bit be allocated for each object instead of allocation granule. The opposite of MARK_BIT_PER_GRANULE. HBLKSIZE= Explicitly sets the heap block size (where ddd is a power of 2 between 512 and 16384). Each heap block is devoted to a single size and kind of object. For the incremental collector it makes sense to match the most likely page size. Otherwise large values result in more fragmentation, but generally better performance for large heaps. USE_MMAP Use MMAP instead of sbrk to get new memory. Works for Linux, FreeBSD, Cygwin, Solaris and Irix. USE_MUNMAP Causes memory to be returned to the OS under the right circumstances. This currently disables VM-based incremental collection (except for Win32 with GetWriteWatch() available). Works under some Unix, Linux and Windows versions. Requires USE_MMAP except for Windows. MUNMAP_THRESHOLD= Set the desired memory blocks unmapping threshold (the number of sequential garbage collections for which a candidate block for unmapping should remain free). GC_FORCE_UNMAP_ON_GCOLLECT Set "unmap as much as possible on explicit GC" mode on by default. The mode could be changed at run-time. Has no effect unless unmapping is turned on. Has no effect on implicitly-initiated garbage collections. MMAP_STACKS (for Solaris threads) Use mmap from /dev/zero rather than GC_scratch_alloc() to get stack memory. PRINT_BLACK_LIST Whenever a black list entry is added, i.e. whenever the garbage collector detects a value that looks almost, but not quite, like a pointer, print both the address containing the value, and the value of the near-bogus-pointer. Can be used to identify regions of memory that are likely to contribute misidentified pointers. KEEP_BACK_PTRS Add code to save back pointers in debugging headers for objects allocated with the debugging allocator. If all objects through GC_MALLOC with GC_DEBUG defined, this allows the client to determine how particular or randomly chosen objects are reachable for debugging/profiling purposes. The gc_backptr.h interface is implemented only if this is defined. GC_ASSERTIONS Enable some internal GC assertion checking. Currently this facility is only used in a few places. It is intended primarily for debugging of the garbage collector itself, but could also... DBG_HDRS_ALL Make sure that all objects have debug headers. Increases the reliability (from 99.9999% to 100% mod. bugs) of some of the debugging code (especially KEEP_BACK_PTRS). Makes SHORT_DBG_HDRS possible. Assumes that all client allocation is done through debugging allocators. SHORT_DBG_HDRS Assume that all objects have debug headers. Shorten the headers to minimize object size, at the expense of checking for writes past the end of an object. This is intended for environments in which most client code is written in a "safe" language, such as Scheme or Java. Assumes that all client allocation is done using the GC_debug_ functions, or through the macros that expand to these, or by redirecting malloc to GC_debug_malloc_replacement. (Also eliminates the field for the requested object size.) Occasionally could be useful for debugging of client code. Slows down the collector somewhat, but not drastically. SAVE_CALL_COUNT= Set the number of call frames saved with objects allocated through the debugging interface. Affects the amount of information generated in leak reports. Only matters on platforms on which we can quickly generate call stacks, currently Linux/(X86 & SPARC) and Solaris/SPARC and platforms that provide execinfo.h. Default is zero. On X86, client code should NOT be compiled with -fomit-frame-pointer. SAVE_CALL_NARGS= Set the number of functions arguments to be saved with each call frame. Default is zero. Ignored if we don't know how to retrieve arguments on the platform. CHECKSUMS Reports on erroneously clear dirty bits, and unexpectedly altered stubborn objects, at substantial performance cost. Use only for debugging of the incremental collector. Not compatible with USE_MUNMAP or threads. GC_GCJ_SUPPORT Includes support for gcj (and possibly other systems that include a pointer to a type descriptor in each allocated object). Building this way requires an ANSI C compiler. USE_I686_PREFETCH Causes the collector to issue Pentium III style prefetch instructions. No effect except on X86 Linux platforms. Assumes a very recent gcc-compatible compiler and assembler. (Gas prefetcht0 support was added around May 1999.) Empirically the code appears to still run correctly on Pentium II processors, though with no performance benefit. May not run on other X86 processors? In some cases this improves performance by 15% or so. USE_3DNOW_PREFETCH Causes the collector to issue AMD 3DNow style prefetch instructions. Same restrictions as USE_I686_PREFETCH. Minimally tested. Didn't appear to be an obvious win on a K6-2/500. USE_PPC_PREFETCH Causes the collector to issue PowerPC style prefetch instructions. No effect except on PowerPC OS X platforms. Performance impact untested. GC_USE_LD_WRAP In combination with the old flags listed in README.linux causes the collector some system and pthread calls in a more transparent fashion than the usual macro-based approach. Requires GNU ld, and currently probably works only with Linux. GC_USE_DLOPEN_WRAP Causes the collector to redefine malloc and intercepted pthread routines with their real names, and causes it to use dlopen and dlsym to refer to the original versions. This makes it possible to build an LD_PRELOADable malloc replacement library. THREAD_LOCAL_ALLOC Defines GC_malloc(), GC_malloc_atomic() and GC_gcj_malloc() to use a per-thread set of free-lists. These then allocate in a way that usually does not involve acquisition of a global lock. Recommended for multiprocessors. Requires explicit GC_INIT() call, unless REDIRECT_MALLOC is defined and GC_malloc is used first. USE_COMPILER_TLS Causes thread local allocation to use the compiler-supported "__thread" thread-local variables. This is the default in HP/UX. It may help performance on recent Linux installations. (It failed for me on RedHat 8, but appears to work on RedHat 9.) PARALLEL_MARK Allows the marker to run in multiple threads. Recommended for multiprocessors. DONT_USE_SIGNALANDWAIT (Win32 only) Use an alternate implementation for marker threads (if PARALLEL_MARK defined) synchronization routines based on InterlockedExchange() (instead of AO_fetch_and_add()) and on multiple event objects (one per each marker instead of that based on Win32 SignalObjectAndWait() using a single event object). This is the default for WinCE. GC_WINMAIN_REDIRECT (Win32 only) Redirect (rename) an application WinMain to GC_WinMain; implement the "real" WinMain which starts a new thread to call GC_WinMain after initializing the GC. Useful for WinCE. Incompatible with GC_DLL. GC_REGISTER_MEM_PRIVATE (Win32 only) Force to register MEM_PRIVATE R/W sections as data roots. Might be needed for some WinCE 6.0+ custom builds. (May result in numerous "Data Abort" messages logged to WinCE debugging console.) Incompatible with GCC toolchains for WinCE. NO_GETENV Prevents the collector from looking at environment variables. These may otherwise alter its configuration, or turn off GC altogether. I don't know of a reason to disable this, except possibly if the resulting process runs as a privileged user. (This is on by default for WinCE.) EMPTY_GETENV_RESULTS Define to workaround a reputed Wine bug in getenv (getenv() may return an empty string instead of NULL for a missing entry). GC_READ_ENV_FILE (Win32 only) Read environment variables from the GC "env" file (named as the program name plus ".gc.env" extension). Useful for WinCE targets (which have no getenv()). In the file, every variable is specified in a separate line and the format is as "=" (without spaces). A comment line may start with any character except for the Latin letters, the digits and the underscore ('_'). The file encoding is Latin-1. USE_GLOBAL_ALLOC (Win32 only) Use GlobalAlloc() instead of VirtualAlloc() to allocate the heap. May be needed to work around a Windows NT/2000 issue. Incompatible with USE_MUNMAP. See README.win32 for details. MAKE_BACK_GRAPH Enable GC_PRINT_BACK_HEIGHT environment variable. See README.environment for details. Experimental. Limited platform support. Implies DBG_HDRS_ALL. All allocation should be done using the debug interface. GC_PRINT_BACK_HEIGHT Permanently turn on back-height printing mode (useful when NO_GETENV). See the similar environment variable description in README.environment. Requires MAKE_BACK_GRAPH defined. STUBBORN_ALLOC Allows allocation of "hard to change" objects, and thus makes incremental collection easier. Was enabled by default until 6.0. Rarely used, to my knowledge. HANDLE_FORK (Unix and Cygwin only) Attempt by default to make GC_malloc() work in a child process fork()'ed from a multi-threaded parent. Not fully POSIX-compliant and could be disabled at runtime (before GC_INIT). TEST_WITH_SYSTEM_MALLOC Causes gctest to allocate (and leak) large chunks of memory with the standard system malloc. This will cause the root set and collected heap to grow significantly if malloc'ed memory is somehow getting traced by the collector. This has no impact on the generated library; it only affects the test. POINTER_MASK=<0x...> Causes candidate pointers to be AND'ed with the given mask before being considered. If either this or the following macro is defined, it will be assumed that all pointers stored in the heap need to be processed this way. Stack and register pointers will be considered both with and without processing. These macros are normally needed only to support systems that use high-order pointer tags. EXPERIMENTAL. POINTER_SHIFT= Causes the collector to left shift candidate pointers by the indicated amount before trying to interpret them. Applied after POINTER_MASK. EXPERIMENTAL. See also the preceding macro. ENABLE_TRACE Enables the GC_TRACE=addr environment setting to do its job. By default this is not supported in order to keep the marker as fast as possible. DARWIN_DONT_PARSE_STACK Causes the Darwin port to discover thread stack bounds in the same way as other pthread ports, without trying to walk the frames on the stack. This is recommended only as a fall-back for applications that don't support proper stack unwinding. GC_NO_THREADS_DISCOVERY (Darwin and Win32+DLL only) Exclude DllMain-based (on Windows) and task-threads-based (on Darwin) thread registration support. GC_DISCOVER_TASK_THREADS (Darwin and Win32+DLL only) Compile the collector with the implicitly turned on task-threads-based (on Darwin) or DllMain-based (on Windows) approach of threads registering. Only for compatibility and for the case when it is not possible to call GC_use_threads_discovery() early (before other GC calls). USE_PROC_FOR_LIBRARIES Causes the Linux collector to treat writable memory mappings (as reported by /proc) as roots, if it doesn't have other information about them. It no longer traverses dynamic loader data structures to find dynamic library static data. This may be required for applications that store pointers in mmapped segments without informing the collector. But it typically performs poorly, especially since it will scan inactive but cached NPTL thread stacks completely. IGNORE_DYNAMIC_LOADING Don't define DYNAMIC_LOADING even if supported by the platform (that is, build the collector with disabled tracing of dynamic library data roots). NO_PROC_STAT Causes the collector to avoid relying on Linux "/proc/self/stat". NO_GETCONTEXT Causes the collector to not assume the existence of the getcontext() function on linux-like platforms. This currently happens implicitly on Darwin, Hurd, or ARM or MIPS hardware. It is explicitly needed for some old versions of FreeBSD. STATIC=static Causes various GC_ symbols that could logically be declared static to be declared (this is the default if NO_DEBUGGING is specified). Reduces the number of visible symbols (letting the optimizer do its work better), which is probably cleaner, but may make some kinds of debugging and profiling harder. GC_DLL Build dynamic-link library (or dynamic shared object). For Unix this causes the exported symbols to have 'default' visibility (ignored unless GCC v4+) and the internal ones to have 'hidden' visibility. DONT_USE_USER32_DLL (Win32 only) Don't use "user32" DLL import library (containing MessageBox() entry); useful for a static GC library. GC_PREFER_MPROTECT_VDB Choose MPROTECT_VDB manually in case of multiple virtual dirty bit strategies are implemented (at present useful on Win32 and Solaris to force MPROTECT_VDB strategy instead of the default GWW_VDB or PROC_VDB ones). GC_IGNORE_GCJ_INFO Disable GCJ-style type information (useful for debugging on WinCE). GC_PRINT_VERBOSE_STATS Permanently turn on verbose logging (useful for debugging and profiling on WinCE). GC_ONLY_LOG_TO_FILE Don't redirect GC stdout and stderr to the log file specified by GC_LOG_FILE environment variable. Has effect only when the variable is set (to anything other than "0"). GC_DONT_EXPAND Don't expand the heap unless explicitly requested or forced to. GC_USE_ENTIRE_HEAP Causes the non-incremental collector to use the entire heap before collecting. This sometimes results in more large block fragmentation, since very large blocks will tend to get broken up during each GC cycle. It is likely to result in a larger working set, but lower collection frequencies, and hence fewer instructions executed in the collector. This macro controls only the default GC_use_entire_heap value. GC_INITIAL_HEAP_SIZE= Set the desired default initial heap size in bytes. GC_FREE_SPACE_DIVISOR= Set alternate default GC_free_space_divisor value. GC_TIME_LIMIT= Set alternate default GC_time_limit value (setting this to GC_TIME_UNLIMITED will essentially disable incremental collection while leaving generational collection enabled). GC_FULL_FREQ= Set alternate default number of partial collections between full collections (matters only if incremental collection is on). NO_CANCEL_SAFE (Posix platforms with threads only) Don't bother trying to make the collector safe for thread cancellation; cancellation is not used. (Note that if cancellation is used anyway, threads may end up getting cancelled in unexpected places.) Even without this option, PTHREAD_CANCEL_ASYNCHRONOUS is never safe with the collector. (We could argue about its safety without the collector.) UNICODE (Win32 only) Use the Unicode variant ('W') of the Win32 API instead of ANSI/ASCII one ('A'). Useful for WinCE. PLATFORM_ANDROID Compile for Android NDK platform. SN_TARGET_PS3 Compile for Sony PS/3. USE_GET_STACKBASE_FOR_MAIN (Linux only) Use pthread_attr_getstack() instead of __libc_stack_end (or instead of any hard-coded value) for getting the primordial thread stack base (useful if the client modifies the program's address space).