gc-7.2/libatomic_ops/doc/README.txt

   1 Usage:
   2
   3 0) If possible, do this on a multiprocessor, especially if you are planning
   4 on modifying or enhancing the package.  It will work on a uniprocessor,
   5 but the tests are much more likely to pass in the presence of serious problems.
   6
   7 1) Type ./configure --prefix=<install dir>; make; make check
   8 in the directory containing unpacked source.  The usual GNU build machinery
   9 is used, except that only static, but position-independent, libraries
  10 are normally built.  On Windows, read README_win32.txt instead.
  11
  12 2) Applications should include atomic_ops.h.  Nearly all operations
  13 are implemented by header files included from it.  It is sometimes
  14 necessary, and always recommended to also link against libatomic_ops.a.
  15 To use the almost non-blocking stack or malloc implementations,
  16 see the corresponding README files, and also link against libatomic_gpl.a
  17 before linking against libatomic_ops.a.
  18
  19 OVERVIEW:
  20 Atomic_ops.h defines a large collection of operations, each one of which is
  21 a combination of an (optional) atomic memory operation, and a memory barrier.
  22 Also defines associated feature-test macros to determine whether a particular
  23 operation is available on the current target hardware (either directly or
  24 by synthesis).  This is an attempt to replace various existing files with
  25 similar goals, since they usually do not handle differences in memory
  26 barrier styles with sufficient generality.
  27
  28 If this is included after defining AO_REQUIRE_CAS, then the package
  29 will make an attempt to emulate compare-and-swap in a way that (at least
  30 on Linux) should still be async-signal-safe.  As a result, most other
  31 atomic operations will then be defined using the compare-and-swap
  32 emulation.  This emulation is slow, since it needs to disable signals.
  33 And it needs to block in case of contention.  If you care about performance
  34 on a platform that can't directly provide compare-and-swap, there are
  35 probably better alternatives.  But this allows easy ports to some such
  36 platforms (e.g. PA_RISC).  The option is ignored if compare-and-swap
  37 can be implemented directly.
  38
  39 If atomic_ops.h is included after defining AO_USE_PTHREAD_DEFS, then all
  40 atomic operations will be emulated with pthread locking.  This is NOT
  41 async-signal-safe.  And it is slow.  It is intended primarily for debugging
  42 of the atomic_ops package itself.
  43
  44 Note that the implementation reflects our understanding of real processor
  45 behavior.  This occasionally diverges from the documented behavior.  (E.g.
  46 the documented X86 behavior seems to be weak enough that it is impractical
  47 to use.  Current real implementations appear to be much better behaved.)
  48 We of course are in no position to guarantee that future processors
  49 (even HPs) will continue to behave this way, though we hope they will.
  50
  51 This is a work in progress.  Corrections/additions for other platforms are
  52 greatly appreciated.  It passes rudimentary tests on X86, Itanium, and
  53 Alpha.
  54
  55 OPERATIONS:
  56
  57 Most operations operate on values of type AO_t, which are unsigned integers
  58 whose size matches that of pointers on the given architecture.  Exceptions
  59 are:
  60
  61 - AO_test_and_set operates on AO_TS_t, which is whatever size the hardware
  62 supports with good performance.  In some cases this is the length of a cache
  63 line.  In some cases it is a byte.  In many cases it is equivalent to AO_t.
  64
  65 - A few operations are implemented on smaller or larger size integers.
  66 Such operations are indicated by the appropriate prefix:
  67
  68 AO_char_... Operates on unsigned char values.
  69 AO_short_... Operates on unsigned short values.
  70 AO_int_... Operates on unsigned int values.
  71
  72 (Currently a very limited selection of these is implemented.  We're
  73 working on it.)
  74
  75 The defined operations are all of the form AO_[<size>_]<op><barrier>(<args>).
  76
  77 The <op> component specifies an atomic memory operation.  It may be
  78 one of the following, where the corresponding argument and result types
  79 are also specified:
  80
  81 void nop()
  82         No atomic operation.  The barrier may still be useful.
  83 AO_t load(const volatile AO_t * addr)
  84         Atomic load of *addr.
  85 void store(volatile AO_t * addr, AO_t new_val)
  86         Atomically store new_val to *addr.
  87 AO_t fetch_and_add(volatile AO_t *addr, AO_t incr)
  88         Atomically add incr to *addr, and return the original value of *addr.
  89 AO_t fetch_and_add1(volatile AO_t *addr)
  90         Equivalent to AO_fetch_and_add(addr, 1).
  91 AO_t fetch_and_sub1(volatile AO_t *addr)
  92         Equivalent to AO_fetch_and_add(addr, (AO_t)(-1)).
  93 void or(volatile AO_t *addr, AO_t incr)
  94         Atomically or incr into *addr.
  95 int compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val)
  96         Atomically compare *addr to old_val, and replace *addr by new_val
  97         if the first comparison succeeds.  Returns nonzero if the comparison
  98         succeeded and *addr was updated.
  99 AO_TS_VAL_t test_and_set(volatile AO_TS_t * addr)
 100         Atomically read the binary value at *addr, and set it.  AO_TS_VAL_t
 101         is an enumeration type which includes two values AO_TS_SET and
 102         AO_TS_CLEAR.  An AO_TS_t location is capable of holding an
 103         AO_TS_VAL_t, but may be much larger, as dictated by hardware
 104         constraints.  Test_and_set logically sets the value to AO_TS_SET.
 105         It may be reset to AO_TS_CLEAR with the AO_CLEAR(AO_TS_t *) macro.
 106         AO_TS_t locations should be initialized to AO_TS_INITIALIZER.
 107         The values of AO_TS_SET and AO_TS_CLEAR are hardware dependent.
 108         (On PA-RISC, AO_TS_SET is zero!)
 109
 110 Test_and_set is a more limited version of compare_and_swap.  Its only
 111 advantage is that it is more easily implementable on some hardware.  It
 112 should thus be used if only binary test-and-set functionality is needed.
 113
 114 If available, we also provide compare_and_swap operations that operate
 115 on wider values.  Since standard data types for double width values
 116 may not be available, these explicitly take pairs of arguments for the
 117 new and/or old value.  Unfortunately, there are two common variants,
 118 neither of which can easily and efficiently emulate the other.
 119 The first performs a comparison against the entire value being replaced,
 120 where the second replaces a double-width replacement, but performs
 121 a single-width comparison:
 122
 123 int compare_double_and_swap_double(volatile AO_double_t * addr,
 124                                    AO_t old_val1, AO_t old_val2,
 125                                    AO_t new_val1, AO_t new_val2);
 126
 127 int compare_and_swap_double(volatile AO_double_t * addr,
 128                             AO_t old_val1,
 129                             AO_t new_val1, AO_t new_val2);
 130
 131 where AO_double_t is a structure containing AO_val1 and AO_val2 fields,
 132 both of type AO_t.  For compare_and_swap_double, we compare against
 133 the val1 field.  AO_double_t exists only if AO_HAVE_double_t
 134 is defined.
 135
 136 ORDERING CONSTRAINTS:
 137
 138 Each operation name also includes a suffix that specifies the associated
 139 ordering semantics.  The ordering constraint limits reordering of this
 140 operation with respect to other atomic operations and ordinary memory
 141 references.  The current implementation assumes that all memory references
 142 are to ordinary cacheable memory; the ordering guarantee is with respect
 143 to other threads or processes, not I/O devices.  (Whether or not this
 144 distinction is important is platform-dependent.)
 145
 146 Ordering suffixes are one of the following:
 147
 148 <none>: No memory barrier.  A plain AO_nop() really does nothing.
 149 _release: Earlier operations must become visible to other threads
 150           before the atomic operation.
 151 _acquire: Later operations must become visible after this operation.
 152 _read: Subsequent reads must become visible after reads included in
 153        the atomic operation or preceding it.  Rarely useful for clients?
 154 _write: Earlier writes become visible before writes during or after
 155         the atomic operation.  Rarely useful for clients?
 156 _full: Ordered with respect to both earlier and later memory ops.
 157        AO_store_full or AO_nop_full are the normal ways to force a store
 158        to be ordered with respect to a later load.
 159 _release_write: Ordered with respect to earlier writes.  This is
 160                 normally implemented as either a _write or _release
 161                 barrier.
 162 _dd_acquire_read: Ordered with respect to later reads that are data
 163                dependent on this one.  This is needed on
 164                a pointer read, which is later dereferenced to read a
 165                second value, with the expectation that the second
 166                read is ordered after the first one.  On most architectures,
 167                this is equivalent to no barrier.  (This is very
 168                hard to define precisely.  It should probably be avoided.
 169                A major problem is that optimizers tend to try to
 170                eliminate dependencies from the generated code, since
 171                dependencies force the hardware to execute the code
 172                serially.)
 173 _release_read: Ordered with respect to earlier reads.  Useful for
 174                implementing read locks.  Can be implemented as _release,
 175                but not as _read, since _read groups the current operation
 176                with the earlier ones.
 177
 178 We assume that if a store is data-dependent on an a previous load, then
 179 the two are always implicitly ordered.
 180
 181 It is possible to test whether AO_<op><barrier> is available on the
 182 current platform by checking whether AO_HAVE_<op>_<barrier> is defined
 183 as a macro.
 184
 185 Note that we generally don't implement operations that are either
 186 meaningless (e.g. AO_nop_acquire, AO_nop_release) or which appear to
 187 have no clear use (e.g. AO_load_release, AO_store_acquire, AO_load_write,
 188 AO_store_read).  On some platforms (e.g. PA-RISC) many operations
 189 will remain undefined unless AO_REQUIRE_CAS is defined before including
 190 the package.
 191
 192 When typed in the package build directory, the following command
 193 will print operations that are unimplemented on the platform:
 194
 195 make test_atomic; ./test_atomic
 196
 197 The following command generates a file "list_atomic.i" containing the
 198 macro expansions of all implemented operations on the platform:
 199
 200 make list_atomic.i
 201
 202 Future directions:
 203
 204 It currently appears that something roughly analogous to this is very likely
 205 to become part of the C++0x standard.  That effort has pointed out a number
 206 of issues that we expect to address there.  Since some of the solutions
 207 really require compiler support, they may not be completely addressed here.
 208
 209 Known issues include:
 210
 211 We should be more precise in defining the semantics of the ordering
 212 constraints, and if and how we can guarantee sequential consistency.
 213
 214 Dd_acquire_read is very hard or impossible to define in a way that cannot
 215 be invalidated by reasonably standard compiler transformations.
 216
 217 There is probably no good reason to provide operations on standard
 218 integer types, since those may have the wrong alignment constraints.
 219
 220
 221 Example:
 222
 223 If you want to initialize an object, and then "publish" a pointer to it
 224 in a global location p, such that other threads reading the new value of
 225 p are guaranteed to see an initialized object, it suffices to use
 226 AO_release_write(p, ...) to write the pointer to the object, and to
 227 retrieve it in other threads with AO_acquire_read(p).
 228
 229 Platform notes:
 230
 231 All X86: We quietly assume 486 or better.
 232
 233 Microsoft compilers:
 234 Define AO_ASSUME_WINDOWS98 to get access to hardware compare-and-swap
 235 functionality.  This relies on the InterlockedCompareExchange() function
 236 which was apparently not supported in Windows95.  (There may be a better
 237 way to get access to this.)
 238
 239 Gcc on x86:
 240 Define AO_USE_PENTIUM4_INSTRS to use the Pentium 4 mfence instruction.
 241 Currently this is appears to be of marginal benefit.