SDL_CondWait not properly locking mutex?

Jonas_Kulla · September 10, 2013, 4:18pm

Hi list,

I have recently stumbled upon some weird behavior of SDL_CondWait under
Linux:

while (!predicate)
SDL_CondWait(cond, mutex);

SDL_UnlockMutex(mutex);
SDL_LockMutex(mutex);

I had assumed that the SDL sync primitives behaved exactly like their
pthread counterparts,
and looking at the source code, this ought to be true, however, the thread
seems to get stuck
on “LockMutex” even though the mutex is supposed to be unlocked, and I
cannot for the life of
me figure out what is happening.

I have attached a sample C++ program that demonstrates this and also
contains the exact
same code rewritten with pthreads, which works perfectly as expected.

Thanks,
Jonas
-------------- next part --------------
A non-text attachment was scrubbed…
Name: sdlCondBug.c
Type: text/x-csrc
Size: 2008 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20130910/c6fce9cc/attachment.c

Philip_Bennefall · September 10, 2013, 5:35pm

Hello,

Correct me if I am wrong, but isn’t the usual pattern of condition variables that you lock the mutex first, and the condition variable then unlocks that mutex when you call CondWait and then reacquires it before returning? That is my understanding, anyway. So something like this:

int threadFun(void *data)
{
struct Context ctx = (struct Context) data;

SDL_LockMutex(ctx->mutex);

while (!ctx->pred)
SDL_CondWait(ctx->cond, ctx->mutex); // The mutex is locked while this call is in progress and then automatically reacquired upon return.

SDL_UnlockMutex(ctx->mutex);

// …
}

Kind regards,

Philip Bennefall----- Original Message -----
From: Jonas Kulla
To: SDL Development List
Sent: Tuesday, September 10, 2013 6:18 PM
Subject: [SDL] SDL_CondWait not properly locking mutex?

Hi list,

I have recently stumbled upon some weird behavior of SDL_CondWait under Linux:

while (!predicate)
SDL_CondWait(cond, mutex);

SDL_UnlockMutex(mutex);
SDL_LockMutex(mutex);

I had assumed that the SDL sync primitives behaved exactly like their pthread counterparts,
and looking at the source code, this ought to be true, however, the thread seems to get stuck
on “LockMutex” even though the mutex is supposed to be unlocked, and I cannot for the life of
me figure out what is happening.

I have attached a sample C++ program that demonstrates this and also contains the exact
same code rewritten with pthreads, which works perfectly as expected.

Thanks,
Jonas

Jonas_Kulla · September 10, 2013, 5:57pm

2013/9/10 Philip Bennefall

**
Hello,

Correct me if I am wrong, but isn’t the usual pattern of condition
variables that you lock the mutex first, and the condition variable then
unlocks that mutex when you call CondWait and then reacquires it before
returning? That is my understanding, anyway. So something like this:

Yes you are correct. In my example, I lock the mutex in main():

SDL_LockMutex(ctx.mutex);
SDL_Thread *thread = SDL_CreateThread(threadFun, 0, &ctx);

Philip_Bennefall · September 10, 2013, 6:01pm

Hi Jonas,

Yes, you lock it in the main thread but not in the secondary thread where the wait happens. You need to lock it there or wait is unable to release it, since it is not the secondary thread that owns the mutex when wait is invoked. In short, if I have read the scenario correctly, wait will try to unacquire a mutex that is not acquired in the current thread (which is to say the secondary one).

Kind regards,

Philip Bennefall----- Original Message -----
From: Jonas Kulla
To: @Philip_Bennefall ; SDL Development List
Sent: Tuesday, September 10, 2013 7:57 PM
Subject: Re: [SDL] SDL_CondWait not properly locking mutex?

2013/9/10 Philip Bennefall <@Philip_Bennefall>

Hello,

Correct me if I am wrong, but isn't the usual pattern of condition variables that you lock the mutex first, and the condition variable then unlocks that mutex when you call CondWait and then reacquires it before returning? That is my understanding, anyway. So something like this:

Yes you are correct. In my example, I lock the mutex in main():

SDL_LockMutex(ctx.mutex);
SDL_Thread *thread = SDL_CreateThread(threadFun, 0, &ctx);

Ian_Norton · September 10, 2013, 6:36pm

Am I the only one who gets the shudders at one thread locking but a
different one doing condwait?On 10 Sep 2013 18:57, “Jonas Kulla” wrote:

2013/9/10 Philip Bennefall

**
Hello,

Correct me if I am wrong, but isn’t the usual pattern of condition
variables that you lock the mutex first, and the condition variable then
unlocks that mutex when you call CondWait and then reacquires it before
returning? That is my understanding, anyway. So something like this:

Yes you are correct. In my example, I lock the mutex in main():

SDL_LockMutex(ctx.mutex);
SDL_Thread *thread = SDL_CreateThread(threadFun, 0, &ctx);

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Jonas_Kulla · September 10, 2013, 6:37pm

2013/9/10 Philip Bennefall

**
Hi Jonas,

Yes, you lock it in the main thread but not in the secondary thread where
the wait happens. You need to lock it there or wait is unable to release
it, since it is not the secondary thread that owns the mutex when wait is
invoked. In short, if I have read the scenario correctly, wait will try to
unacquire a mutex that is not acquired in the current thread (which is to
say the secondary one).

Kind regards,

Oh, I see =/ I haven’t even considered mutexes this way before, but it
makes sense. It sucks for me though because
the way I was doing it in the example allowed me to prevent access on
shared data structures in between the thread
starting up and the condition blocking. I guess I’ll just have to accept
this tiny racy-ness.
It’s still weird though that it works the way I expected it with pure
pthreads… but that possibly falls under the
"undefined behavior" that I caused by misusing the mutex.

Anyway, thanks a lot for your help!

Jonas

Ian_Norton · September 10, 2013, 6:41pm

Startup race?

Surely your startup function can acquire the lock?On 10 Sep 2013 19:37, “Jonas Kulla” wrote:

2013/9/10 Philip Bennefall

**
Hi Jonas,

Yes, you lock it in the main thread but not in the secondary thread where
the wait happens. You need to lock it there or wait is unable to release
it, since it is not the secondary thread that owns the mutex when wait is
invoked. In short, if I have read the scenario correctly, wait will try to
unacquire a mutex that is not acquired in the current thread (which is to
say the secondary one).

Kind regards,

Oh, I see =/ I haven’t even considered mutexes this way before, but it
makes sense. It sucks for me though because
the way I was doing it in the example allowed me to prevent access on
shared data structures in between the thread
starting up and the condition blocking. I guess I’ll just have to accept
this tiny racy-ness.
It’s still weird though that it works the way I expected it with pure
pthreads… but that possibly falls under the
"undefined behavior" that I caused by misusing the mutex.

Anyway, thanks a lot for your help!

Jonas

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Jonas_Kulla · September 10, 2013, 7:03pm

2013/9/10 Ian Norton

Startup race?

Surely your startup function can acquire the lock?

Yes it can, I was just talking about the period between creating the thread
in the main thread, and this side thread locking the mutex.
I thought I would miss events that would signal the cond in this small time
frame. Looking at my code again though, I realize that it’s
a non-issue because then CondWait() wouldn’t be called at all, as the
predicate would already be set to TRUE. So never mind that ^^"

Jonas

Jonas_Kulla · September 10, 2013, 7:07pm

2013/9/10 Ian Norton

Am I the only one who gets the shudders at one thread locking but a
different one doing condwait?

I’m kinda new to threaded programming. I never thought of mutexes in terms
of thread ownership, but
simply “atomic locks” that any thread can unlock. That’s where my
misunderstanding stems from.

Jonas

Philip_Bennefall · September 10, 2013, 7:13pm

Hi Jonas,

Leaving a race condition in your code is a bad idea; it will almost certainly come back and bother you later and in very unexpected ways. To prevent these sorts of races, observe the following general rule pattern:

In the thread that’s waiting you first acquire the mutex, check the condition, and then wait. When wait returns, check the condition again and if it still is false, call wait again. That, you are already doing. The idea is to check the condition both before and after wait is called, with the lock acquired in both cases. Keep in mind that wait always releases the lock on entry and reacquires it again upon return. So the mutex must always have been acquired previously by the same thread before you invoke wait.
Whenever you change the condition to be true in another thread, acquire the same lock first, make the change and release the lock again. This ensures that the waiting thread never fails to see that the condition is true, since it only checks it when the lock is acquired and the other threads only modify the state of the condition when the same lock is acquired by them.

Does that make sense? In short, always have the lock acquired when you check as well as when you modify the condition. That will prevent notification races.

As for why it works in pthreads, I agree with you that it is undefined behavior since the mutex being locked is a precondition of wait.

Kind regards,

Philip Bennefall----- Original Message -----
From: Jonas Kulla
To: @Philip_Bennefall ; SDL Development List
Sent: Tuesday, September 10, 2013 8:37 PM
Subject: Re: [SDL] SDL_CondWait not properly locking mutex?

2013/9/10 Philip Bennefall <@Philip_Bennefall>

Hi Jonas,

Yes, you lock it in the main thread but not in the secondary thread where the wait happens. You need to lock it there or wait is unable to release it, since it is not the secondary thread that owns the mutex when wait is invoked. In short, if I have read the scenario correctly, wait will try to unacquire a mutex that is not acquired in the current thread (which is to say the secondary one).

Kind regards,

Oh, I see =/ I haven’t even considered mutexes this way before, but it makes sense. It sucks for me though because
the way I was doing it in the example allowed me to prevent access on shared data structures in between the thread
starting up and the condition blocking. I guess I’ll just have to accept this tiny racy-ness.
It’s still weird though that it works the way I expected it with pure pthreads… but that possibly falls under the
"undefined behavior" that I caused by misusing the mutex.

Anyway, thanks a lot for your help!

Jonas

Ian_Norton · September 10, 2013, 7:38pm

Pthread has several muted modes, SDL looks like it is setup to bind
ownership on pthtread platformsOn 10 Sep 2013 20:07, “Jonas Kulla” wrote:

2013/9/10 Ian Norton <@Ian_Norton>

Am I the only one who gets the shudders at one thread locking but a
different one doing condwait?

I’m kinda new to threaded programming. I never thought of mutexes in terms
of thread ownership, but
simply “atomic locks” that any thread can unlock. That’s where my
misunderstanding stems from.

Jonas

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Jonas_Kulla · September 10, 2013, 7:47pm

2013/9/10 Philip Bennefall

**
Hi Jonas,

Leaving a race condition in your code is a bad idea; it will almost
certainly come back and bother you later and in very unexpected ways. To
prevent these sorts of races, observe the following general rule pattern:

In the thread that’s waiting you first acquire the mutex, check the
condition, and then wait. When wait returns, check the condition again and
if it still is false, call wait again. That, you are already doing. The
idea is to check the condition both before and after wait is called, with
the lock acquired in both cases. Keep in mind that wait always releases the
lock on entry and reacquires it again upon return. So the mutex must always
have been acquired previously by the same thread before you invoke wait.

Whenever you change the condition to be true in another thread, acquire
the same lock first, make the change and release the lock again. This
ensures that the waiting thread never fails to see that the condition is
true, since it only checks it when the lock is acquired and the other
threads only modify the state of the condition when the same lock is
acquired by them.

Yes, I didn’t reflect it in my example code, but in my actual code, I am
only ever modifying shared state
(which I count the predicate/condition as part of) with the mutex locked.

Does that make sense? In short, always have the lock acquired when you
check as well as when you modify the condition. That will prevent
notification races.

Yep. The “concern” (it was actually just stupid perfectionism) I had was
that in between the side thread
starting up and the waitcond being called, I could miss a signal, but as
you explained, the condition check
is already protected by the mutex, so waitcond wouldn’t get called in the
first place if an event snuck in.
Ie., it’s a non issue after all.

As for why it works in pthreads, I agree with you that it is undefined

behavior since the mutex being locked is a precondition of wait.

Kind regards,

Philip Bennefall

Thanks again!

Jonas