SDL Semaphore implementation broken on Windows?

Hi,

Over the past couple of days, I’ve been battling with SDL, SDL_Mixer and
SMPEG to try to
find an audio hang bug. I believe I’ve found the problem, which I think
is a race
condition inside SDL’s semaphore implementation (at least the Windows
implementation).

The semaphore code uses Windows’ built in semaphore functions, but it
also maintains
a separate count value. This count value is updated with bare increment
and decrement
operations in SemPost and SemWaitTimeout - no locking primitives to
protect them.

In tracking down the apparent audio bug, I found that at some point a
semaphore’s count
value was being decremented to -1, which is clearly not a valid value
for it to take.

I’m still not certain exactly what sequence of operations is occuring
for this to happen,
but I believe that overall it’s a race condition between a thread
calling SemPost (which
increments the count) and the thread on the other end calling SemWait
(which decrements it).

I will try to make a test case to verify this, but I’m not sure if I’ll
be able to
(threading errors being difficult to reproduce even in the best
circumstances).

However, assuming this is the cause of my problems, there is a very
simple fix:
Windows provides InterlockedIncrement() and InterlockedDecrement()
functions to perform
increments and decrements which are guaranteed to be atomic.
So the fix is in thread/win32/SDL_syssem.c:
replace occurrences of --sem->count with InterlockedDecrement(&sem->count);
and replace occurrences of ++sem->count with
InterlockedIncrement(&sem->count);

If anyone can take a look and confirm that there is a race condition in
the code, then
I’d appreciate it (since the audio bug is not 100% reliably
reproducable, I can’t be
certain that I’ve nailed it with this fix, although I am fairly sure).

This is using SDL v1.2.12, built with VC++ 2008 Express, running on a
Core 2 duo processor.
I admit I haven’t checked subversion to see if this has already been
fixed, so I
apologise if this is old news for everyone.

Thanks,

John B

However, assuming this is the cause of my problems, there is a very
simple fix:
Windows provides InterlockedIncrement() and InterlockedDecrement()
functions to perform
increments and decrements which are guaranteed to be atomic.
So the fix is in thread/win32/SDL_syssem.c:
replace occurrences of --sem->count with InterlockedDecrement(&sem->count);
and replace occurrences of ++sem->count with
InterlockedIncrement(&sem->count);

I’m pretty sure the 32-bit ++ and – operators are atomic on x86 architectures.
Are you sure this fixes the problem?

Thanks!
-Sam Lantinga, Lead Software Engineer, Blizzard Entertainment

Sam Lantinga wrote:

However, assuming this is the cause of my problems, there is a very
simple fix:
Windows provides InterlockedIncrement() and InterlockedDecrement()
functions to perform
increments and decrements which are guaranteed to be atomic.
So the fix is in thread/win32/SDL_syssem.c:
replace occurrences of --sem->count with InterlockedDecrement(&sem->count);
and replace occurrences of ++sem->count with
InterlockedIncrement(&sem->count);

I’m pretty sure the 32-bit ++ and – operators are atomic on x86 architectures.
Are you sure this fixes the problem?
I wouldn’t bet my life on it, but I’m 99% sure. Certainly the problem
hasn’t recurred at all since I made that change, and although it isn’t
100% repeatable, it was happening regularly enough (say, 80 to 90% of
the times I attempted to reproduce) that I believe it would have
surfaced again if this hadn’t fixed it.
Of course, it’s always possible that the change just managed to push the
timings off and make it far, far rarer rather than actually eliminating
the problem.

As for ++ and – being atomic, at the assembly code level the ‘INC’ and
’DEC’ instructions are only atomic operations when the ‘LOCK’ prefix is
used (which is presumably how InterlockedIncrement/InterlockedDecrement
are implemented under the hood). I don’t know whether C gives any
guarantees about their atomicity though. I don’t think any compiler
would emit a lock prefix for them under all circumstances, but possibly
it does if the variable being modified is marked as volatile?

Regards,

John B

Atomic operations aren’t enough on multi core systems with multiple
caches, I thought. Something to do with memory barriers, which I doubt
a compiler would mess with for a simple increment/decrement operation.
I don’t pretend to be a multi processor guru though, but from what
little I have read I thought such unprotected updates were a bad idea.
Of course I could be wrong, and I apologise if so…On Dec 28, 2007 3:22 AM, Sam Lantinga wrote:

However, assuming this is the cause of my problems, there is a very
simple fix:
Windows provides InterlockedIncrement() and InterlockedDecrement()
functions to perform
increments and decrements which are guaranteed to be atomic.
So the fix is in thread/win32/SDL_syssem.c:
replace occurrences of --sem->count with InterlockedDecrement(&sem->count);
and replace occurrences of ++sem->count with
InterlockedIncrement(&sem->count);

I’m pretty sure the 32-bit ++ and – operators are atomic on x86 architectures.
Are you sure this fixes the problem?

Over the past couple of days, I’ve been battling with SDL, SDL_Mixer and
SMPEG to try to
find an audio hang bug. I believe I’ve found the problem, which I think
is a race
condition inside SDL’s semaphore implementation (at least the Windows
implementation).

Okay, this fix is checked into subversion with revision 3470. Thanks!

See ya!
-Sam Lantinga, Lead Software Engineer, Blizzard Entertainment

I wouldn’t bet my life on it, but I’m 99% sure. Certainly the problem
hasn’t recurred at all since I made that change, and although it isn’t
100% repeatable, it was happening regularly enough (say, 80 to 90% of
the times I attempted to reproduce) that I believe it would have
surfaced again if this hadn’t fixed it.

Of course, it’s always possible that the change just managed to push the
timings off and make it far, far rarer rather than actually eliminating
the problem.

Yep, that’s certainly possible. Can you grab the SDL snapshot from:
http://www.libsdl.org/tmp/SDL-1.2.zip
and see if it’s fixed in there?

Thanks!
-Sam Lantinga, Lead Software Engineer, Blizzard Entertainment

Hi, here are my 2 cents,

x86 ASM INC and DEC are only atomic on mono-x86, if they are used on
aligned memory.

An alternative (very x86 specific) would be "__asm LOCK INC DWORD PTR
[g_dwSemaphore];"
provided that the memory is aligned on 32 bits boundary.

Anyway, one has to use the InterlockedXXXXX functions under Windows if
he wants to provide guaranteed atomic operation on any system running
Windows, ( XBox 360 included ) without assembly langage and cpu-specific
code

regards,
William.

Sam Lantinga a ?crit :>> However, assuming this is the cause of my problems, there is a very

simple fix:
Windows provides InterlockedIncrement() and InterlockedDecrement()
functions to perform
increments and decrements which are guaranteed to be atomic.
So the fix is in thread/win32/SDL_syssem.c:
replace occurrences of --sem->count with InterlockedDecrement(&sem->count);
and replace occurrences of ++sem->count with
InterlockedIncrement(&sem->count);

I’m pretty sure the 32-bit ++ and – operators are atomic on x86 architectures.
Are you sure this fixes the problem?

Thanks!
-Sam Lantinga, Lead Software Engineer, Blizzard Entertainment


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Sam Lantinga wrote:

Can you grab the SDL snapshot from:
http://www.libsdl.org/tmp/SDL-1.2.zip
and see if it’s fixed in there?
Yep, that snapshot seems to work fine.

Thanks for responding so fast,

John B

Yep, that snapshot seems to work fine.

Great! :slight_smile:

Thanks for responding so fast,

No problem. :slight_smile:

See ya,
-Sam Lantinga, Lead Software Engineer, Blizzard Entertainment