Race condition in SDL_ThreadedTimerCheck?

I have an application that uses SDL_SetTimer that dies on
Hyperthreaded P4s running Windows XP. Looking at
SDL_ThreadedTimerCheck, I see this code:

		SDL_mutexV(SDL_timer_mutex);
		ms = t->cb(t->interval, t->param);
		SDL_mutexP(SDL_timer_mutex);

But without SDL_timer_mutex being held, can’t the memory pointed to by
t be freed? Shouldn’t a copy of *t be made on the stack before the
lock is released and then that copy’s cb, interval and param field be
used?

–Cliff Matthews <@Clifford_T_Matthews>

Last night I mentioned an apparent race condition in SDL. This
morning I’ve built SDL.dll using the mingw cross-compiler running on
Linux. I tried the obvious following fix, but my application died the
same way as before.

Old:

		SDL_mutexV(SDL_timer_mutex);
		ms = t->cb(t->interval, t->param);
		SDL_mutexP(SDL_timer_mutex);

New:
struct _SDL_TimerID timer;

		...

		timer = *t;
		SDL_mutexV(SDL_timer_mutex);
		ms = timer.cb(timer.interval, timer.param);
		SDL_mutexP(SDL_timer_mutex);

After that failed, I tried removing the calls to SDL_mutexV and
SDL_MutexP that immediately bracket the callback invocation. That
got rid of the problem my application was having.

My application only touches memory from within the callback, then does
an SDL_CondSignal. It doesn’t make any SDL timer calls from within
the callback.

My guess is the SDL_mutexV and SDL_mutexP surrounding the callback in
SDL_ThreadedTimerCheck are so that applications can call the SDL timer
routines from within the callback. As such, merely getting rid of the
SDL_mutexV and SDL_mutexP calls is likely to break existing code.
However, since there’s already one subtle race-condition in the code
that apparently isn’t getting hit enough to attract attention, I’m
reluctant to attempt to fix this problem on my own, since my
application is clearly not similar to other applications using SDL
timers.

I’ll poke around a bit more on this, but if there’s anyone
particularly intimate with this section of code who wants to eyeball
it and look for other problems, that might lead to a robust solution
sooner than anything I can do.

–Cliff Matthews <@Clifford_T_Matthews>

Hi Cliff!

I have an application that uses SDL_SetTimer that dies on
Hyperthreaded P4s running Windows XP. Looking at
SDL_ThreadedTimerCheck, I see this code:

  	SDL_mutexV(SDL_timer_mutex);
  	ms = t->cb(t->interval, t->param);
  	SDL_mutexP(SDL_timer_mutex);

But without SDL_timer_mutex being held, can’t the memory pointed to by
t be freed? Shouldn’t a copy of *t be made on the stack before the
lock is released and then that copy’s cb, interval and param field be
used?

Yep, thanks for catching this.

Please try the attached file, to see if it’s fixed.

Thanks,
-Sam Lantinga, Software Engineer, Blizzard Entertainment
-------------- next part --------------
/*
SDL - Simple DirectMedia Layer
Copyright © 1997, 1998 Sam Lantinga

This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Library General Public License for more details.

You should have received a copy of the GNU Library General Public
License along with this library; if not, write to the Free
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

Sam Lantinga
5635-34 Springhouse Dr.
Pleasanton, CA 94588 (USA)
@slouken

*/

#ifdef SAVE_RCSID
static char rcsid =
"@(#) $Id: SDL_timer.c,v 1.3 2002/03/06 11:23:02 slouken Exp $";
#endif

#include <stdlib.h>
#include <stdio.h> /* For the definition of NULL */

#include “SDL_error.h”
#include “SDL_timer.h”
#include “SDL_timer_c.h”
#include “SDL_mutex.h”
#include “SDL_systimer.h”

/* #define DEBUG_TIMERS */

int SDL_timer_started = 0;
int SDL_timer_running = 0;

/* Data to handle a single periodic alarm */
Uint32 SDL_alarm_interval = 0;
SDL_TimerCallback SDL_alarm_callback;

static volatile SDL_bool list_changed = SDL_FALSE;

/* Data used for a thread-based timer */
static int SDL_timer_threaded = 0;

struct _SDL_TimerID {
Uint32 interval;
SDL_NewTimerCallback cb;
void *param;
Uint32 last_alarm;
struct _SDL_TimerID *next;
};

static SDL_TimerID SDL_timers = NULL;
static Uint32 num_timers = 0;
static SDL_mutex *SDL_timer_mutex;

/* Set whether or not the timer should use a thread.
This should not be called while the timer subsystem is running.
*/
int SDL_SetTimerThreaded(int value)
{
int retval;

if ( SDL_timer_started ) {
	SDL_SetError("Timer already initialized");
	retval = -1;
} else {
	retval = 0;
	SDL_timer_threaded = value;
}
return retval;

}

int SDL_TimerInit(void)
{
int retval;

SDL_timer_running = 0;
SDL_SetTimer(0, NULL);
retval = 0;
if ( ! SDL_timer_threaded ) {
	retval = SDL_SYS_TimerInit();
}
if ( SDL_timer_threaded ) {
	SDL_timer_mutex = SDL_CreateMutex();
}
SDL_timer_started = 1;
return(retval);

}

void SDL_TimerQuit(void)
{
SDL_SetTimer(0, NULL);
if ( SDL_timer_threaded < 2 ) {
SDL_SYS_TimerQuit();
}
if ( SDL_timer_threaded ) {
SDL_DestroyMutex(SDL_timer_mutex);
}
SDL_timer_started = 0;
SDL_timer_threaded = 0;
}

void SDL_ThreadedTimerCheck(void)
{
Uint32 now, ms;
SDL_TimerID t, prev, next;
int removed;
SDL_NewTimerCallback callback;
Uint32 interval;
void *param;

now = SDL_GetTicks();

SDL_mutexP(SDL_timer_mutex);
for ( prev = NULL, t = SDL_timers; t; t = next ) {
	removed = 0;
	ms = t->interval - SDL_TIMESLICE;
	next = t->next;
	if ( (t->last_alarm < now) && ((now - t->last_alarm) > ms) ) {
		if ( (now - t->last_alarm) < t->interval ) {
			t->last_alarm += t->interval;
		} else {
			t->last_alarm = now;
		}
		list_changed = SDL_FALSE;

#ifdef DEBUG_TIMERS
printf(“Executing timer %p (thread = %d)\n”,
t, SDL_ThreadID());
#endif
callback = t->cb;
interval = t->interval;
param = t->param;
SDL_mutexV(SDL_timer_mutex);
ms = callback(interval, param);
SDL_mutexP(SDL_timer_mutex);
if ( list_changed ) {
/* Abort, list of timers has been modified /
break;
}
if ( ms != t->interval ) {
if ( ms ) {
t->interval = ROUND_RESOLUTION(ms);
} else { /
Remove the timer from the linked list /
#ifdef DEBUG_TIMERS
printf(“SDL: Removing timer %p\n”, t);
#endif
if ( prev ) {
prev->next = next;
} else {
SDL_timers = next;
}
free(t);
– num_timers;
removed = 1;
}
}
}
/
Don’t update prev if the timer has disappeared */
if ( ! removed ) {
prev = t;
}
}
SDL_mutexV(SDL_timer_mutex);
}

SDL_TimerID SDL_AddTimer(Uint32 interval, SDL_NewTimerCallback callback, void *param)
{
SDL_TimerID t;
if ( ! SDL_timer_mutex ) {
if ( SDL_timer_started ) {
SDL_SetError(“This platform doesn’t support multiple timers”);
} else {
SDL_SetError(“You must call SDL_Init(SDL_INIT_TIMER) first”);
}
return NULL;
}
if ( ! SDL_timer_threaded ) {
SDL_SetError(“Multiple timers require threaded events!”);
return NULL;
}
SDL_mutexP(SDL_timer_mutex);
t = (SDL_TimerID) malloc(sizeof(struct _SDL_TimerID));
if ( t ) {
t->interval = ROUND_RESOLUTION(interval);
t->cb = callback;
t->param = param;
t->last_alarm = SDL_GetTicks();
t->next = SDL_timers;
SDL_timers = t;
++ num_timers;
list_changed = SDL_TRUE;
SDL_timer_running = 1;
}
#ifdef DEBUG_TIMERS
printf(“SDL_AddTimer(%d) = %08x num_timers = %d\n”, interval, (Uint32)t, num_timers);
#endif
SDL_mutexV(SDL_timer_mutex);
return t;
}

SDL_bool SDL_RemoveTimer(SDL_TimerID id)
{
SDL_TimerID t, prev = NULL;
SDL_bool removed;

removed = SDL_FALSE;
SDL_mutexP(SDL_timer_mutex);
/* Look for id in the linked list of timers */
for (t = SDL_timers; t; prev=t, t = t->next ) {
	if ( t == id ) {
		if(prev) {
			prev->next = t->next;
		} else {
			SDL_timers = t->next;
		}
		free(t);
		-- num_timers;
		removed = SDL_TRUE;
		list_changed = SDL_TRUE;
		break;
	}
}

#ifdef DEBUG_TIMERS
printf(“SDL_RemoveTimer(%08x) = %d num_timers = %d thread = %d\n”, (Uint32)id, removed, num_timers, SDL_ThreadID());
#endif
SDL_mutexV(SDL_timer_mutex);
return removed;
}

static void SDL_RemoveAllTimers(SDL_TimerID t)
{
SDL_TimerID freeme;

/* Changed to non-recursive implementation.
   The recursive implementation is elegant, but subject to 
   stack overflow if there are lots and lots of timers.
 */
while ( t ) {
	freeme = t;
	t = t->next;
	free(freeme);
}

}

/* Old style callback functions are wrapped through this */
static Uint32 callback_wrapper(Uint32 ms, void *param)
{
SDL_TimerCallback func = (SDL_TimerCallback) param;
return (*func)(ms);
}

int SDL_SetTimer(Uint32 ms, SDL_TimerCallback callback)
{
int retval;

#ifdef DEBUG_TIMERS
printf(“SDL_SetTimer(%d)\n”, ms);
#endif
retval = 0;
if ( SDL_timer_running ) { /* Stop any currently running timer */
SDL_timer_running = 0;
if ( SDL_timer_threaded ) {
SDL_mutexP(SDL_timer_mutex);
SDL_RemoveAllTimers(SDL_timers);
SDL_timers = NULL;
SDL_mutexV(SDL_timer_mutex);
} else {
SDL_SYS_StopTimer();
}
}
if ( ms ) {
if ( SDL_timer_threaded ) {
retval = (SDL_AddTimer(ms, callback_wrapper,
(void *)callback) != NULL);
} else {
SDL_timer_running = 1;
SDL_alarm_interval = ms;
SDL_alarm_callback = callback;
retval = SDL_SYS_StartTimer();
}
}
return retval;
}

“Sam” == Sam Lantinga writes:

Sam> Hi Cliff!
>> I have an application that uses SDL_SetTimer that dies on
>> Hyperthreaded P4s running Windows XP.  Looking at
>> SDL_ThreadedTimerCheck, I see this code:

>> SDL_mutexV(SDL_timer_mutex); ms = t->cb(t->interval, t->param);
>> SDL_mutexP(SDL_timer_mutex);

>> But without SDL_timer_mutex being held, can't the memory
>> pointed to by t be freed?  Shouldn't a copy of *t be made on
>> the stack before the lock is released and then that copy's cb,
>> interval and param field be used?

Sam> Yep, thanks for catching this.

You’re welcome. The problem actually showed up about a year ago and
I’ve only now had time to track it down. I thought it was an Executor
bug until I peered into the timer code. There still may be Executor
timer bugs, but there are some problems with the SDL timer code that
need to be cleaned up.

Sam> Please try the attached file, to see if it's fixed.

That doesn’t fix it.

I have a fix I’ve been testing that definitely improves things. I
think I saw one mystery crash with my new fix in though, so I’m
reluctant to send anything out until I investigate further.

Other problems in the timer code, I beleive, include not setting
list_changed to SDL_TRUE if the callback returns 0, and the setting of
list_changed to SDL_FALSE unconditionally in the loop.

–Cliff Matthews <@Clifford_T_Matthews>
+1 505 363 5754 Cell

Attached is a diff from the 1.2.8 SDL_timer.c to a version that
includes my patches. I did a horrible job indenting, and my comment
may be a bit longwinded, but my application seems to run fine with
this patch.

I’ve also included a patched SDL_timer.c in case anyone wants to look
carefully at the code and doesn’t have the 1.2.8 source in front of
him. This stuff is tricky and having more eyes look at it is a good
thing.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed…
Name: timer_fix.diff
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20050104/4b1bc65c/attachment.asc
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed…
Name: SDL_timer.c
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20050104/4b1bc65c/attachment.txt
-------------- next part --------------

–Cliff Matthews <@Clifford_T_Matthews>
+1 505 363 5754 Cell

My patch didn’t solve the problem I’m seeing after all. Without the
patch, the problem occurs within seconds of me starting my program.
With it, the program runs for minutes, but left running during lunch,
it died. I’ll pore over the SDL code some more and also look at my
applications use of the SDL timers and try to figure out what’s going
on.

–Cliff Matthews <@Clifford_T_Matthews>

My patch didn’t solve the problem I’m seeing after all. Without the
patch, the problem occurs within seconds of me starting my program.
With it, the program runs for minutes, but left running during lunch,
it died. I’ll pore over the SDL code some more and also look at my
applications use of the SDL timers and try to figure out what’s going
on.

Thanks Cliff!

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment

OK, with this patch applied against 1.2.8, I haven’t seen any problems
with my code running for hours. I think SDL_timer.c still has some
problems caused by the “ms” return value from the callback being
ignored if list_changed is set. That won’t happen with my code, and
since nobody else is reporting timer errors, perhaps it won’t happen
with anyone else’s code, either. I have an idea how that bug can be
fixed, but I’m reluctant to make a change to fix a problem I can’t
easily see.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed…
Name: timer.diff
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20050104/00cd676f/attachment.asc
-------------- next part --------------

–Cliff Matthews <@Clifford_T_Matthews>
+1 505 363 5754 Cell