SDL audiospec thread priority should be higher than the main app thread

Patrick_Baggett · April 22, 2011, 1:49am

It is common knowledge that hardcore players refuse to use most 60hz
LCD monitors, but many find a 120hz LCD monitor to be quite acceptable, such
120hz LCD monitors (and a few 60hz ones) lack
“scaler” hardware and thus can only run at a single native resolution
(direct scanout) as such hardware is associated with higher input latency
(2-3 refreshes delay) which is considered
unacceptable by hardcore players.

3 cheers for hardware vendors capitalizing on crappy software. When the
software loop caps at 60 FPS due to vsync, make a monitor with a higher
vsync to combat that! Does anyone here think that is an
appropriate solution? People are tricked into thinking more Hz = better
games = win. Again, facepalm, for all parties. You know I would buy a 120 Hz
CRT/LCD? Quad buffered stereoscopic rendering at 60
Hz per eye + shutter glasses. It certainly isn’t because I can see so much
more rich detail at 120Hz than at 60Hz. /rant

Actually this has more to do with the fact these monitors are designed for
use with 3D shutter glasses (and are branded as such), the fact that they
can play games at 120hz in non-stereo rendering is a very nice side effect
however.

Worth noting that while there is not much more detail at 120fps/120hz than
60fps/60hz, the motion is far more fluid, which I chalk up to “real motion
blur” (just because we can not perceive the frames in their entirety, does
not mean we do not perceive the blur trails as motion).

In general I favor outputting to a display at 120hz because it gives a
genuine feeling of fluid “real” motion, rather than attempting to mimic this
effect in software processing (everyone perceives differently, after all).

This is a good point that I hadn’t considered. I don’t know how effective
the real motion blur since I’ve never really tried using 120Hz as just
higher framerate instead of stereo rendering, but it is probably less
annoying than the software solutions I’ve seen done with shaders.On Thu, Apr 21, 2011 at 8:44 PM, Forest Hale wrote:

On 04/21/2011 05:51 PM, Patrick Baggett wrote:
On Thu, Apr 21, 2011 at 7:06 PM, Forest Hale <havoc at ghdigital.com<mailto: havoc at ghdigital.com>> wrote:

–
LordHavoc
Author of DarkPlaces Quake1 engine -
LadyHavoc's DarkPlaces Quake Modification
Co-designer of Nexuiz - Nexuiz Classic – Alientrap
“War does not prove who is right, it proves who is left.” - Unknown
“Any sufficiently advanced technology is indistinguishable from a rigged
demo.” - James Klass
“A game is a series of interesting choices.” - Sid Meier

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Forest_Hale · April 22, 2011, 1:59am

This can be more trouble than it is worth on OSX (Cocoa is NOT designed for threading in any way shape or form and mutex locks can be required in places), but yes this technique works adequately on
Windows and Linux, and can work on OSX if one is careful not to make any Cocoa calls in the rendering thread.

While it is certainly a good technique to encourage where possible, it is “somewhat beyond” many indie game developers who are more concerned with gameplay design than specific platform tricks that
can be fragile in terms of portability, so I find it strange to voice harsh criticism for not using this technique which you have stated is “not easy”.

I should also mention that Quake3 used this technique (at least on Windows), but I am not sure whether it updated player movement on events, or waited for a rendering frame to hit.

A bigger question is why ioquake3 is not using this technique - I bet the answer is developer laziness.

I respect developer laziness. :)On 04/21/2011 06:45 PM, Patrick Baggett wrote:

Trivially not true. I’ve already got code that handles OpenGL on one thread, raw input messages on another. Same window.

–
LordHavoc
Author of DarkPlaces Quake1 engine - LadyHavoc's DarkPlaces Quake Modification
Co-designer of Nexuiz - Nexuiz Classic – Alientrap
“War does not prove who is right, it proves who is left.” - Unknown
“Any sufficiently advanced technology is indistinguishable from a rigged demo.” - James Klass
“A game is a series of interesting choices.” - Sid Meier

icculus · April 22, 2011, 2:10am

Cocoa is NOT designed for threading in any way shape or form

You get extra credit for using IOKit for raw mouse input.

–ryan.

Patrick_Baggett · April 22, 2011, 2:10am

Trivially not true. I’ve already got code that handles OpenGL on one
thread, raw input messages on another. Same window.

This can be more trouble than it is worth on OSX (Cocoa is NOT designed for
threading in any way shape or form and mutex locks can be required in
places), but yes this technique works adequately on Windows and Linux, and
can work on OSX if one is careful not to make any Cocoa calls in the
rendering thread.

While it is certainly a good technique to encourage where possible, it is
“somewhat beyond” many indie game developers who are more concerned with
gameplay design than specific platform tricks that can be fragile in terms
of portability, so I find it strange to voice harsh criticism for not using
this technique which you have stated is “not easy”.

Given X hours of time, I wouldn’t want to spend many (if any) on platform
specific foolery, true. It is criticism for the Windows = fail attitude
people take without any real experience, specifically, the “This is more or
less a problem with Windows” when ironically it IS possible to do correctly
on Windows and NOT possible to do on X11, or at least using Xlib in a
thread-safe manner without synchronization.On Thu, Apr 21, 2011 at 8:59 PM, Forest Hale wrote:

On 04/21/2011 06:45 PM, Patrick Baggett wrote:

I should also mention that Quake3 used this technique (at least on
Windows), but I am not sure whether it updated player movement on events, or
waited for a rendering frame to hit.

A bigger question is why ioquake3 is not using this technique - I bet the
answer is developer laziness.

I respect developer laziness.

–
LordHavoc
Author of DarkPlaces Quake1 engine -
LadyHavoc's DarkPlaces Quake Modification
Co-designer of Nexuiz - Nexuiz Classic – Alientrap
“War does not prove who is right, it proves who is left.” - Unknown
“Any sufficiently advanced technology is indistinguishable from a rigged
demo.” - James Klass
“A game is a series of interesting choices.” - Sid Meier

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Kenneth_Bull · April 22, 2011, 2:32am

I mention Windows because I am aware of a problem on Windows. I have
less experience programming with X11, so I am familiar with fewer of
it’s quirks. I have no experience at all on Apples newer than Apple
][e, so you won’t see me commenting on that.

You can use raw input on Windows to get input independent of the
window you’re drawing to, which works fine when running in full
screen. Doesn’t work so well in windowed mode though, since you need
to account for keyboard and mouse focus, screen vs window coordinates,
mouse scaling, overlapping windows and so on. It’s possible, but not
a good solution. Hence the “more or less” in the comment mentioned
above.On 21 April 2011 22:10, Patrick Baggett <baggett.patrick at gmail.com> wrote:

Given X hours of time, I wouldn’t want to spend many (if any) on platform
specific foolery, true.?It is criticism for the Windows = fail attitude
people take without any real experience, specifically, the “This is more or
less a problem with Windows” when ironically it IS possible to do correctly
on Windows and NOT possible to do on X11, or at least using Xlib in a
thread-safe manner without synchronization.

Bob_Pendleton · April 22, 2011, 3:22am

This is because Quake3, like many games, ties the input update speed
directly to the graphics rendering.

This is more or less a problem with Windows. ?For the most part, both
video output and input have to be handled by the same thread. ?You can
queue up your input and deal with it later, but that introduces a
delay between receiving an event and processing it.

Trivially not true. I’ve already got code that handles OpenGL on one thread,
raw input messages on another. Same window.
The magic is that SwapBuffers() on Win32 requires the HDC of the window, and
doesn’t require the thread that called wglMakeCurrent() using that HDC to be
the same thread that created it. So in effect, as long as I never do any
drawing calls that would use that HDC from the message handling thread
(easy), there isn’t any critical section to synchronize on. Even if you use
good old Win32 messages instead of raw input, AttachThreadInput() allows you
to effectively detour message handling to a different thread. Compare that
to X11 which uses Display* from XNextEvent(). That same Display* must be
used in glXSwapBuffers() meaning that messaging handling must synchronize
around message handling.

Not following you here. I assume you are saying that you take a risk
of blocking on XNextEvent() which can keep you from being able to call
glXSwapBuffers() when you need to. Is that correct? If that is the
problem then the usual solution is to use connectionNumber() to get
the fd of the connection from the Display* and then play games with
the equivalent of select() to create a main loop that will react to an
input as soon as it is available without ever blocking in XNextEvent()
and allow you to swap the buffers when you want to. Add in
XEventsQueued() and you get to process all available events without
ever blocking.

OTOH, if I am completely missing what you are talking about, please
let me know.

Bob PendletonOn Thu, Apr 21, 2011 at 8:45 PM, Patrick Baggett <baggett.patrick at gmail.com> wrote:

On Thu, Apr 21, 2011 at 8:14 PM, Kenneth Bull wrote:

On 21 April 2011 20:51, Patrick Baggett <baggett.patrick at gmail.com> wrote:

It isn’t hard but it isn’t easy, and the not being easy is why it isn’t done
more often. Queuing input inserts a delay “sort of”, but in the case of > 60
FPS, the delay is literally?unnoticeable. Consider if a person move their
mouse after frame A but before frame B. I get sent a raw input message that
is immediately handled on another thread, but is buffered. Here’s the
killer: the current time is recorded on that message. Then when I update the
game logic, I process all of the messages in order but I can use the
timestamp to differentiate actions if they have an?implicit?time order. I’ve
found that it makes 0 difference if you take into individual times rather
than if you just simply process in order. So yes, there is a delay, but is
generally in the order of < 1 frametime, and when the FPS is > 60, you can’t
literally perceive that. For < 60, yes, you can barely, but it is generally
preferable that you don’t have to move the mouse MORE to get the same amount
of movement at lower FPS. At < 10 FPS, there is a very noticeable gap
between when you move and when you see it, but if you didn’t buffer the
input, you may not even be able to move the mouse enough to reduce the
graphics down in the options menu to the point where the game is playable.

You can make it worse through bad programming though of course: ?if 10
events are queued up while blitting a frame to the screen and you only
process one event per frame instead of clearing the queue, the
remaining events have to wait up to 10 frames to be processed with
other events queuing up behind them.

Indeed.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

–
±----------------------------------------------------------

Bob Pendleton: writer and programmer
email: Bob at Pendleton.com
web: www.TheGrumpyProgrammer.com

Patrick_Baggett · April 22, 2011, 3:39am

This is because Quake3, like many games, ties the input update speed
directly to the graphics rendering.

This is more or less a problem with Windows. For the most part, both
video output and input have to be handled by the same thread. You can
queue up your input and deal with it later, but that introduces a
delay between receiving an event and processing it.

Trivially not true. I’ve already got code that handles OpenGL on one
thread,
raw input messages on another. Same window.
The magic is that SwapBuffers() on Win32 requires the HDC of the window,
and
doesn’t require the thread that called wglMakeCurrent() using that HDC to
be
the same thread that created it. So in effect, as long as I never do any
drawing calls that would use that HDC from the message handling thread
(easy), there isn’t any critical section to synchronize on. Even if you
use
good old Win32 messages instead of raw input, AttachThreadInput() allows
you
to effectively detour message handling to a different thread. Compare
that
to X11 which uses Display* from XNextEvent(). That same Display* must be
used in glXSwapBuffers() meaning that messaging handling must synchronize
around message handling.

Not following you here. I assume you are saying that you take a risk
of blocking on XNextEvent() which can keep you from being able to call
glXSwapBuffers() when you need to. Is that correct? If that is the
problem then the usual solution is to use connectionNumber() to get
the fd of the connection from the Display* and then play games with
the equivalent of select() to create a main loop that will react to an
input as soon as it is available without ever blocking in XNextEvent()
and allow you to swap the buffers when you want to. Add in
XEventsQueued() and you get to process all available events without
ever blocking.

OTOH, if I am completely missing what you are talking about, please
let me know.

Bob Pendleton

I’m referring the fact that the Display* is a pointer to structure allocated
by X11 and can be corrupted when two threads try to modify it. If you use
XNextEvent() or XPending() + XNextEvent() then you are accessing that
structure. Almost every X function requires the Display*. glXSwapBuffers()
also requires use of that Display*. Calling both function simultaneously
allows the Display* to be corrupted. Thus, while you could check for events
using select() [agreed], you could not dequeue them without synchronizing
with the renderer. XInitThreads() allows X calls to be re-entrant, solving
that problem, alternately, you can just provide your own synchronization.
WGL / GLX difference here: GLX can use remote X servers, thus the Display*
is needed to specify which connection, while on Win32/WGL a handle to the
window’s DC is all that is necessary and that is independent of any message
handling because the HDC is treated more or less as a buffer handle rather
than a logical connection.
Would the user notice this synchronization? I guess that depends on if you
are polling (XQueryPointer()) or actually doing buffered event handling
(XNextEvent()). Supposedly “hardcore gamer” can tell the difference.On Thu, Apr 21, 2011 at 10:22 PM, Bob Pendleton wrote:

On Thu, Apr 21, 2011 at 8:45 PM, Patrick Baggett <@Patrick_Baggett> wrote:

On Thu, Apr 21, 2011 at 8:14 PM, Kenneth Bull wrote:

On 21 April 2011 20:51, Patrick Baggett <@Patrick_Baggett> wrote:

Bob_Pendleton · April 22, 2011, 4:35am

This is because Quake3, like many games, ties the input update speed
directly to the graphics rendering.

This is more or less a problem with Windows. ?For the most part, both
video output and input have to be handled by the same thread. ?You can
queue up your input and deal with it later, but that introduces a
delay between receiving an event and processing it.

Trivially not true. I’ve already got code that handles OpenGL on one
thread,
raw input messages on another. Same window.
The magic is that SwapBuffers() on Win32 requires the HDC of the window,
and
doesn’t require the thread that called wglMakeCurrent() using that HDC
to be
the same thread that created it. So in effect, as long as I never do any
drawing calls that would use that HDC from the message handling thread
(easy), there isn’t any critical section to synchronize on. Even if you
use
good old Win32 messages instead of raw input, AttachThreadInput() allows
you
to effectively detour message handling to a different thread. Compare
that
to X11 which uses Display* from XNextEvent(). That same Display* must be
used in glXSwapBuffers() meaning that messaging handling must
synchronize
around message handling.

Not following you here. ?I assume you are saying that you take a risk
of blocking on XNextEvent() which can keep you from being able to call
glXSwapBuffers() when you need to. Is that correct? If that is the
problem then the usual solution is to use connectionNumber() to get
the fd of the connection from the Display* and then play games with
the equivalent of select() to create a main loop that will react to an
input as soon as it is available without ever blocking in XNextEvent()
and allow you to swap the buffers when you want to. Add in
XEventsQueued() and you get to process all available events without
ever blocking.

OTOH, if I am completely missing what you are talking about, please
let me know.

Bob Pendleton

I’m referring the fact that the Display* is a pointer to structure allocated
by X11 and can be corrupted when two threads try to modify it. If you use
XNextEvent() or XPending() + XNextEvent() then you are accessing that
structure. Almost every X function requires the Display*. glXSwapBuffers()
also requires use of that Display*. Calling both function?simultaneously
allows the Display* to be corrupted.?Thus, while you could check for events
using select() [agreed], you could not dequeue them without synchronizing
with the renderer. XInitThreads() allows X calls to be re-entrant, solving
that problem, alternately, you can just provide your own synchronization.
WGL / GLX difference here: GLX can use remote X servers, thus the Display*
is needed to specify which connection, while on Win32/WGL a handle to the
window’s DC is all that is necessary and that is independent of any message
handling because the HDC is treated more or less as a buffer handle rather
than a logical connection.

So, we agree that this is a non-issue. So what is the issue?

Would the user notice this synchronization? I guess that depends on if you
are polling (XQueryPointer()) or actually doing buffered event handling
(XNextEvent()). Supposedly “hardcore gamer” can tell the difference.

I think the noticeable difference between using XNextEvent() and
XQueryPointer() is due to the round trip to and from the X server that
XQueryPointer does. It really does query the server and blocks until
it gets a reply. That can easily induce a “randomness” into updates
that the user will notice.

XEventsQueued() lets the developer decide if he wants to look only at
events in the input queue, if he wants to include events queued at the
OS level that have not yet been sent to the application, or, it can be
used to force a flush and wait for events to come back. Basically is
can be used as a substitute for XPending() or for XQlength(). By
removing the chance of a system call or, much worse, a round trip to
the server, you get more consistent response from the whole X system.

Most of what your are describing are actually bugs in the physics
simulation in the game. It has to do with how the simulated time is
accounted for, how the length of simulated time affects things like
round off errors, and the actual simulated time at which events are
applied. X events have a time stamp. While not always accurate, they
can be used to get approximately correct times for events into the
simulation. What often happens is that the events are all processed at
one real time, but they are also treated as having happened at the
same simulated time.

Because events are processed in batches for each frame at low frame
rates there is a long pause in both simulated and real time between
when events are allowed to effect the simulation. At high frame rates
the batches that are processed are smaller but the time between (both
real and simulated) processing the batches is also shorter.

As the old saying goes, “time is natures way of keeping everything
from happening at once.” In most games all the events that happen
during a frame are treated as happening at the end of the frame, not
at the times they actually happened. That will through any simulation
off. If events have time stamps then the current simulation time can
be set to the time of each event in order and the physics simulation
can be made to be consistent no matter what the frame rate is. This is
nothing new, it is the basic way that discrete event simulations (like
games) have always been done. But, it doesn’t seem to have ever made
its way into popular books on game development. Most likely because
the people writing the books didn’t know that decades of prior
knowledge existed on the subject.

Oh, yeah, it looks like XCB is a better substitute for Xlib. It
handles a lot of the buffering and delay problems very nicely.

Bob PendletonOn Thu, Apr 21, 2011 at 10:39 PM, Patrick Baggett <baggett.patrick at gmail.com> wrote:

On Thu, Apr 21, 2011 at 10:22 PM, Bob Pendleton <@Bob_Pendleton> wrote:

On Thu, Apr 21, 2011 at 8:45 PM, Patrick Baggett <baggett.patrick at gmail.com> wrote:

On Thu, Apr 21, 2011 at 8:14 PM, Kenneth Bull wrote:

On 21 April 2011 20:51, Patrick Baggett <baggett.patrick at gmail.com> wrote:

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

–
±----------------------------------------------------------

Bob Pendleton: writer and programmer
email: Bob at Pendleton.com
web: www.TheGrumpyProgrammer.com

Patrick_Baggett · April 22, 2011, 7:22am

I enjoy the interesting technical discussion but it really is quite
sidetracked at this point.
Weren’t we supposed to be discussing audio or something?

Bob - I agree that XQueryPointer() is inferior because it kills remote X
performance due to the roundtrip but also just because it is polling and it
is quite easy to do polling poorly. I can’t imagine the “roundtrip
latency” being a serious issue for local X server + local X client, but I
don’t have any data otherwise. In fact, my original point was that buffered
events are superior to polling. It happens to be appear that way on
X11-based systems and Win32. I don’t find that to be a coincidence. I was
merely lamenting the higher (> 60) refresh rate = better input response,
which to me is a sure sign of a bad application design not a “X11/Win32 is
bad” argument. I disagree strongly with Forest’s advice of disabling vsync
for hardcore gamers due to improved responsiveness of input claiming it was
a hack. The amount of events that a person can generate by moving a mouse in
less than 1/60 of a second is tiny (maybe 2-7 last time I checked), so while
yes there is a latency due to buffering, it isn’t really noticeable and the
alternative of polling loses those extra events that would have been
generated, which, to me, as a design, in a lose.

To say something like “Windows has this problem with it…” is just BS,
which is the original remark which I took exception to. I went to show that
X11 actually suffers from the problem more so than Win32, but it doesn’t
especially cause a problem for well written code. Kenneth qualified the
statement so it was more of “It’s hard to do right”, which isn’t provably
wrong or right, though I agree with him.

I don’t know enough about Quake3’s physics code to authoritatively explain
why a higher frame rate would cause inaccuracies or even exploits. I DO know
that modern physics packages typically suggest and sometime require fixed
time steps, which can and should be independent of the frame rate. This is
to stop those exact behaviors where the physics become increasingly
"non-deterministic".

Anyways Bob, I think at this point I’m just agreeing with you so hurry and
say something we can all flame you for. Just kidding – it was interesting,
but I think we’ve derailed and drained this thread. Happy hacking.

Patrick

Tim_Angus · April 22, 2011, 12:59pm

SDL doesn’t provide the required API in 1.2, but it will be possible
with 1.3. Generally speaking though, the performance improvement elided
is relatively minor in practice.On 22/04/2011 02:59, Forest Hale wrote:

A bigger question is why ioquake3 is not using this technique - I bet
the answer is developer laziness.

I respect developer laziness.