[arm64] Add sequential memory constraint to swap, cas and add
Previously we were wrongly trying to achieve this by doing a load-acq followed by a store-rel. Assuming we have memory access on address A (before) and C (after) while the atomic operation happens on address B, the order of the instructions would look like this :
[A], [B]load, load-acq, store-rel, [B]store, [C]
In this case a reordering of memory accesses like the following is possible, which clearly breaks the desired constraints :
[B]load, load-acq, [C], [A], store-rel, [B]store
In order to provide the sequential memory constraint we could emit the barriers like this instead :
[A], membar, [B]load, [B]store, membar, [C]
We can additionally save an instruction and emit the barriers like this instead (this is also what gcc does) :
[A], [B]load, store-rel, [B]store, membar, [C]
In this case we only need to worry about the relation between memory accesses on A and the loading of B. We need to consider what happens if B is loaded before the access on A is fulfilled. This only matters if we load an older value for B (rather than the one visible by other threads) before A. In that case the erroneous store will fail due to using exclusive stores. At the retry the value read from B will be the right one.