Hi Bob,
I think there is a race condition in the mutex-based version of
privateWaitLock that is used by the emulation code in the dummy module (and
copies thereof). This is the code currently in SVN
(/trunk/SDL/src/atomic/dummy/SDL_atomic.c, Rev. 4611):
static inline void
privateWaitLock()
{
? if(NULL == lock)
? {
? ? ?lock = SDL_CreateMutex();
? ? ?if (NULL == lock)
? ? ?{
? ? ? ? SDL_SetError(“SDL_atomic.c: can’t create a mutex”);
? ? ? ? return;
? ? ?}
? }
? if (-1 == SDL_LockMutex(lock))
? {
? ? ?SDL_SetError(“SDL_atomic.c: can’t lock mutex”);
? }
}
The problem is that the mutex is initialized on demand, i. e. on the first
call of the function. Since the purpose of atomic operations is to be used
by several threads concurrently, it is perfectly possible to have several
simultaneous “first calls” of privateWaitLock (through any of the emulated
operations).
You are absolutely correct. Thank you for pointing this out!
A scenario showing the race condition in detail, with the worst possible
outcome:
No thread has called an emulated atomic operation yet, so the lock variable
still is NULL.
Threads A and B have each just reached their first call to, for example,
SDL_AtomicIncrementThenFetch8.
A enters privateWaitLock.
A tests if(NULL == lock), which is true, so it enters the respective block.
– The following actions of B either happen in parallel to the actions of A
(on a multi-core system) or after A has been preempted (on any type of
system) right here. –
B enters privateWaitLock.
B tests if(NULL == lock), which still is true, so it enters the same block.
B executes SDL_CreateMutex.
B assigns the result to the lock pointer variable.
– If B is preempted here, we can get away with just a resource leak. –
B executes SDL_LockMutex, which succeeds immediately (B is not blocked).
B returns from privateWaitLock and now reaches the statement (*ptr)+= 1; in
SDL_AtomicIncrementThenFetch8.
– Meanwhile, on a different core, or after preemption… –
A executes SDL_CreateMutex. Another mutex object is created.
A assigns the result to the lock pointer variable.
– The mutex object that B created is now garbage. –
A executes SDL_LockMutex, which succeeds immediately (A is not blocked),
because the mutex locked is a different one than that locked by B.
A returns from privateWaitLock and now reaches the statement (*ptr)+= 1;.
A and B could now mess up (*ptr). Both could get the same result from the
addition, because as we all know, += is not atomic, which is why we need the
mutex.
A executes privateUnlock, which unlocks the mutex pointed to by lock, that A
had created and locked before.
A returns from SDL_AtomicIncrementThenFetch8.
– Time for B again… –
B also executes privateUnlock, which attempts to unlock the mutex pointed to
by lock. This is, however, the mutex created by A, which B had never locked.
The operation will give an error - or maybe worse.
Possible solutions:
- Bad workaround: Require the user to call each atomic operation he intends
to use at least once before the first call to SDL_CreateThread or whatever
else he uses to create additional threads.
- Initialize the mutex on the first call of SDL_CreateThread. With atomic
operations (ha ha), you might even be able to manage a reference count for
it, to destroy it when all additional threads have ended. But that could be
overkill.
- Include the mutex initialization in SDL_Init. This is what I’d prefer - if
you stick with the mutex -, because it allows using the atomic ops even with
non-SDL threads. You never know.
Now for my long-awaited opinion. 
- Use the mutex emulation only as a very last resort.
I agree completely. I currently working on getting rid of emulation
completely. Not sure that I can do it, but I am trying.
I’m quite sure that
spinlocks (like in the current linux version of SDL_atomic.c) will perform
much better for the use cases of SDL_atomic.
Here’s the problem, spin locks are not currently part of SDL. IIRC
there is a spin lock in pthreads, so it might be possible to include
them as a regular part of SDL. But, I’m not convinced I should add
them as part of SDL_atomic.
Of course, the mutex may also
be implemented to spin some time before actually blocking.
The fact that the mutex operations might fail is also a problem. The
SDL_atomic functions don’t return error codes, and they aren’t really needed
either. SDL_SetError alone won’t be “heard”. Abort?
Tough call, I’d like to hear from more people on that one.
- Make it possible to test at compile time, whether a particular operation
is supported natively by the platform. This will allow to pick an
atomic-optimized algorithm on platforms where it will really boost
performance. Other platforms may be better off with an algorithm that is
taylored directly to using a mutex or whatever else.
Yes, I agree. Not simple to accomplish. It feeds into another problem.On Sun, Aug 23, 2009 at 7:30 PM, Martin<name.changed.by.editors at online.de> wrote:
On some platforms the atomic ops are provided as compiler intrinsics and you would like to have those translate into #defines so that you get around the need for a function call to invoke an intrinsic. The trouble is that you might find yourself having to do some thing horrible like doing the equivalent of including windows.h a the top level of people code to do what you want. I would love to see more input on that problem. Pointers: You should definitely have distinct functions for pointers and ints. You might still cast internally, if possible, but at least the user won’t have to. I just went through four major use cases, reference counting, spin locks, fixed length queues, and the readers/writers problem and did not find a single instance where pointer arithmetic was worth having. I was sure that I would find a need for atomic ops on pointers for implementing queues, but no, I did not. The trouble is that you have to wrap the indexes around at the end of the queue. So, unless someone can come up with a use case that benefits from them, I’m not going to put atomic ops on pointers in the library. Signedness: There are certainly many more use cases for atomic ops than I know, but so far I guess that signedness will rarely matter. You could have your own set of typedefs for SDL_atomic (which could include the volatile modifier for convenience, BTW): Type ? ? ? Minimal range atomic7 ? ?0…127 (0x7f) atomic15 ? 0…32767 (0x7fff) atomic31 ? 0…2147483647 (0x7fffffff) atomic63 ? 0…9223372036854775807 (0x7fffffffffffffff) Whether those types have more (i. e. negative) values is unspecified, and the user must not rely on it. For the cases where signedness does matter, you could again make it possible to check support at compile time. Other idea: Make the interface signed, but internally cast or transform the value where the implementation is unsigned. A transformation would require that all simple reads and writes (including initialization) are done through SDL_atomic functions as well. After looking in detail at the atomic operations provided by GCC, Windows, Mac OS X, and QNX, I am dropping support for anything but 32 and 64 bit unsigned values. The majority of platforms only support 32 and 64 bit unsigned values. If you support 8 and 16 bit operations you wind up with half the library being emulated on the most common platforms. Hey, thanks for the input! Please, if you, or anyone else, has anything to add. Please do. Martin Bob Pendleton wrote:
I have a couple of major question that I would like folks to get
opinionated
about. The main question is about atomic operations on pointers. I
originally wrote the atomic ops so that they only worked on unsigned ints.
I
figured people would deal with pointers by casting them to unsigned of the
right size. Mac OS X, and Windows provide atomic ops on signed ints and
provided operations for pointers. On windows there is very little support
for atomic ops on pointers, but there is a little.
Having looked at what the other guys do I rather like providing separate
ops
for ints and pointers. BUT, that would mean casting pointers to signed
ints
on some platforms. Casting pointers to signed ints makes me very very
nervous. I spent too much of my youth working on machines where int and
(void *) were not the same size.
The other thing is also a casting question. How do you feel about casting
signed to unsigned, doing doing addition or subtraction, and then casting
it
back? Or, the reverse. We have the problem that some of the platforms
provide ops for only signed ints and some for only unsigned ints.
Bob Pendleton
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
–
±----------------------------------------------------------