I’ve been looking over the atomic operations library in SDL and I was
trying to grasp the exact semantics of SDL_AtomicSet() and SDL_AtomicGet().
On almost every system other than x86, a word-sized (32-bit) load or
store to an unaligned address triggers a SIGBUS, and generally means a
bug in the program or poor implementation of some byte fiddling. If the
load/store is aligned, then it is atomic, that is, if 2+ CPUs try to
read/write to the same address, there will never be a half old / half
new mix of bytes read from the same address. Sure, if two CPUs write
0xAAAAAAAA & 0xBBBBBBBB to the same location and then immediately read,
you don’t know if you get all A’s or all B’s, but you never will get
0xAABBBBAA or some other permutation.
Given that, it looks like SDL_AtomicSet() and SDL_AtomicGet() are misnomers.
SDL_AtomicSet() actually performs an atomic exchange, which guarantees
that the old value will be read, replaced with the new one, and the old
one returned, as one single interruptable action. This is much
different than atomically storing a value, which is actually as simple
as “value = 5;”
(When that change is visible to the other CPUs, eh, depends CPU
architecture or, more deterministically, the programmer to use a memory
barrier). An atomic exchange though differs from an atomic set in that
you assume the programmer cares about the return value (which can be
slower on other architectures). If they didn’t, there wouldn’t be a need
to return the original value, you could just clobber it, which is fast
enough. Maybe AtomicSet() should be a store + memory barrier rather than
an atomic exchange, and just have a function like SDL_AtomicExchange()
to actually perform an exchange.
SDL_AtomicGet()… not really sure what its purpose is. Reading from an
aligned memory location is an atomic operation, but that doesn’t look
like what it is trying to do. Most notably, it uses a function called
SDL_CompilerBarrier() which (on MSVC) tells the compiler not to move
reads/writes across that barrier, but out of order CPUs can reorder the
reads in any way they see fit, so any kind of subtly based on that is
likely going to fail on anything other than x86 (x86 has reads ordered
wrt to other reads, writes wrt writes)
int flag = AtomicGet(&sharedFlag);
Produces this psuedo assembly
load sharedFlag, r3 //Compiler must generate load sharedFlag first
load a, r1
load b, r2
add r1, r2
compare r3, 0
jump if not equal "done"
set r2, 0
If we depend on ‘sharedFlag’ being non-zero to mean “a and b can be read
from safely”, then it is possible that the hardware will reorder this so
that ‘a’ and ‘b’ are read before ‘sharedFlag’.
If another thread performed:
a = 10; b= 20;
Then we might have already read stale values for ‘a’ and ‘b’ by the time
the changes to ‘sharedFlag’ are visible to other processors, in effect
computing the wrong value for (a+b). What is needed is a load fence that
guarantees that ‘sharedFlag’ will be read absolutely first before any
more loads will be processed.
My whole point is that it is misleading to say it atomically fetches
something from memory when in fact, all aligned reads are atomic, and it
doesn’t really guarantee anything about how the hardware actually reads it.
Any comments and/or flames?