Lock-free

I had been reading of lock-free data sharing between threads and have my OpenGL render, state control (game loop), animation, and physics threads designed to share data with lock-free ring buffers.? I just came across Valve’s GDC2007 presentation at http://www.valvesoftware.com/publications/2007/GDC2007_SourceMulticore.pdf which stresses that they are using lock-free for nearly everything, and that convinced me even more that it’s a worthy goal.

I’m using SDL to give me a full-screen OpenGL setup (by the way, when will I be able to start an OpenGL 3.0 context with SDL?), for getting timing with SDL_GetTicks(), input handling with SDL_PollEvent(), and SDL_Wait() to make a thread that finishes its loop iteration much faster than a frame to yield processing time to say physics or whatever.? I’m also considering using the sound subsystem and SDL_net.

The initialization stuff doesn’t matter, but in the others, how much locking is there?? This is my main question, and the corollary is how to minimize it.? So again my concern is with input event polling (and I’ll be using a multi-touch touchscreen as input), the wait function, and potentially sound and net.

One significant overhead of locks I’ve been reading about is that they cause a context switch and cost about 4000 cycles on an x86 system (which I assume isn’t just latency and is actually using the CPU so other threads can’t work on the core at that time–correct me if I’m worng), and so I’d want to avoid stuff that would cause a lock more often than once every couple of frames (and only polling input every x frames is unacceptable as a responsive interface is important in my application).? The only explicit blocking I’m doing is vsync (forced in the graphics driver).

I welcome even comments that don’t give a direct answer but are relevant.? Thanks in advance

One significant overhead of locks I’ve been reading about is that they
cause a context switch and cost about 4000 cycles on an x86 system
(which I assume isn’t just latency and is actually using the CPU so
other threads can’t work on the core at that time–correct me if I’m
worng), and so I’d want to avoid stuff that would cause a lock more
often than once every couple of frames (and only polling input every x
frames is unacceptable as a responsive interface is important in my
application). The only explicit blocking I’m doing is vsync (forced in
the graphics driver).

There’s a mutex on the event queue; you can minimize this by using
SDL_PeepEvents() and getting as many events per call as possible.

SDL_PollEvent() involves a call to SDL_PumpEvents() and maybe multiple
locks for each event, so it’s arguably less efficient.

The audio tends to grab a lock every X milliseconds for the duration of
your audio callback. This isn’t tied to framerate, but rather how much
audio data you buffer. This only blocks another thread if that thread is
calling SDL_LockAudio() while the audio callback is running.

I’m sure there are others, but really…you’d be hard-pressed to see any
of this show up in a CPU profile. It could always be faster, but getting
stuck on this part would really be a micro-optimization at best.

(it might be interesting to use atomic operations instead of mutexes,
though, if the CPU supports them, and I’m sure some things could
probably be more reasonable in its locking behaviour, but really…these
things aren’t really big bottlenecks in SDL…the real wins will come
from threading your game’s logic across cores and pushing as much work
to the video hardware as possible).

–ryan.

One significant overhead of locks I’ve been reading about is that they
cause a context switch and cost about 4000 cycles on an x86 system (which I
assume isn’t just latency and is actually using the CPU so other threads
can’t work on the core at that time–correct me if I’m worng), and so I’d
want to avoid stuff that would cause a lock more often than once every
couple of frames (and only polling input every x frames is unacceptable as a
responsive interface is important in my application). The only explicit
blocking I’m doing is vsync (forced in the graphics driver).

The context switch only happens if the mutex blocks. You must do a
context switch if the mutex blocks. If you use atomic operations you
still have to do something when you are blocked so that the processor
can go off an do useful work. The alternative is to busy wait in a
spin lock.

Also, that 4000 cycles is how long? At 1ghz that is 4 microseconds at
2ghz it is 2 nanoseconds. And that is only if it blocks, which it must
do only when another thread is accessing the queue.

Basically you are worrying about something that almost never happens
and has almost no effect when it does. The blocking time is only
important when there are many threads and blocks are happening
frequently.

Bob PendletonOn Tue, Oct 28, 2008 at 4:30 AM, Ryan C. Gordon wrote:

There’s a mutex on the event queue; you can minimize this by using
SDL_PeepEvents() and getting as many events per call as possible.

SDL_PollEvent() involves a call to SDL_PumpEvents() and maybe multiple locks
for each event, so it’s arguably less efficient.

The audio tends to grab a lock every X milliseconds for the duration of your
audio callback. This isn’t tied to framerate, but rather how much audio data
you buffer. This only blocks another thread if that thread is calling
SDL_LockAudio() while the audio callback is running.

I’m sure there are others, but really…you’d be hard-pressed to see any of
this show up in a CPU profile. It could always be faster, but getting stuck
on this part would really be a micro-optimization at best.

(it might be interesting to use atomic operations instead of mutexes,
though, if the CPU supports them, and I’m sure some things could probably be
more reasonable in its locking behaviour, but really…these things aren’t
really big bottlenecks in SDL…the real wins will come from threading your
game’s logic across cores and pushing as much work to the video hardware as
possible).

–ryan.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

±-------------------------------------+

Basically you are worrying about something that almost never happens
and has almost no effect when it does. The blocking time is only
important when there are many threads and blocks are happening
frequently.

Or, of course, if it happens to be in an inner loop of your program. In that case, optimize the heck out of it, just on principle. If it happens multiple times per second, over a significant fraction of all the seconds while your program is active, shave off every cycle you can, especially if your program is not yet complete. That leaves you space to grow.>----- Original Message ----

From: Bob Pendleton
Subject: Re: [SDL] Lock-free

Basically you are worrying about something that almost never happens
and has almost no effect when it does. The blocking time is only
important when there are many threads and blocks are happening
frequently.

Or, of course, if it happens to be in an inner loop of your program. In that case, optimize the heck out of it, just on principle. If it happens multiple times per second, over a significant fraction of all the seconds while your program is active, shave off every cycle you can, especially if your program is not yet complete. That leaves you space to grow.

The topic was reading input events from the keyboard and mouse.
Something that happens so infrequently that it almost has no effect on
actual performance.

Bob PendletonOn Tue, Oct 28, 2008 at 11:38 AM, Mason Wheeler wrote:

----- Original Message ----
From: Bob Pendleton <@Bob_Pendleton>
Subject: Re: [SDL] Lock-free


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

±-------------------------------------+

The topic was reading input events from the keyboard and mouse.
Something that happens so infrequently that it almost has no effect on
actual performance.

…unless, of course, it’s an FPS or similar game, in which the user is constantly making subtle adjustments with the mouse. But yeah, if that’s not the case, I’d agree with you that there’s no need to optimize it.>----- Original Message ----

From: Bob Pendleton
Subject: Re: [SDL] Lock-free

Mason Wheeler wrote:

The topic was reading input events from the keyboard and mouse.
Something that happens so infrequently that it almost has no effect on
actual performance.

…unless, of course, it’s an FPS or similar game, in which the user is constantly making subtle adjustments with the mouse. But yeah, if that’s not the case, I’d agree with you that there’s no need to optimize it.

Given Bob’s analysis that a lock incurs a penalty of 2usec at 2GHz, and
that you’ll be processing (let’s say, but probably less than) 500 mouse
events per second, then you’ll spend at most 1msec blocking on mouse
events, or 0.1% of your processing time. In other words, if you
optimize the heck out of this, the most that you can expect is a 0.1%
speedup (and probably less… as Bob notes, you only pay the 4000-cycle
penalty when you actually block).

Josh>> ----- Original Message ----

From: Bob Pendleton
Subject: Re: [SDL] Lock-free


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Mason Wheeler wrote:

The topic was reading input events from the keyboard and mouse.
Something that happens so infrequently that it almost has no effect on
actual performance.

…unless, of course, it’s an FPS or similar game, in which the user is
constantly making subtle adjustments with the mouse. But yeah, if that’s
not the case, I’d agree with you that there’s no need to optimize it.

Given Bob’s analysis that a lock incurs a penalty of 2usec at 2GHz, and that
you’ll be processing (let’s say, but probably less than) 500 mouse events
per second, then you’ll spend at most 1msec blocking on mouse events, or
0.1% of your processing time. In other words, if you optimize the heck out
of this, the most that you can expect is a 0.1% speedup (and probably
less… as Bob notes, you only pay the 4000-cycle penalty when you actually
block).

Josh

Thanks Josh, you made that much clearer than I did. I appreciate the help :slight_smile:

Bob PendletonOn Tue, Oct 28, 2008 at 1:35 PM, Joshua Gargus wrote:

----- Original Message ----
From: Bob Pendleton <@Bob_Pendleton>
Subject: Re: [SDL] Lock-free


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

±-------------------------------------+

Given Bob’s analysis that a lock incurs a penalty of 2usec at 2GHz, and
that you’ll be processing (let’s say, but probably less than) 500 mouse
events per second, then you’ll spend at most 1msec blocking on mouse

Assume 60 of them, if your framerate is good, not 500. And if you use
SDL_PeepEvents and an array instead of SDL_PollEvent(), you end up with
two locks per frame instead of potentially hundreds.

(…and, with Bob’s context switching commentary: you aren’t ever
competing for that lock, unless you are also using SDL_PushEvent() for
your own app-generated events in another thread.)

There, it’s optimized. :slight_smile:

–ryan.