SDL_Flip() slow on X11 in windowed mode while scrolling in Firefox

Ulf_Magnusson · October 5, 2013, 6:45pm

Hi,

Scrolling in Firefox (but not Chromium) causes
SDL_Flip()/SDL_UpdateRect() to take much longer to complete in
windowed mode on my machine, making my emulator drop below 60 FPS and
causing sound skips. Any ideas what might be causing this and how it
could be mitigated (besides dropping SDL_Flip() calls or doing work in
a separate thread)? Is this something that might have improved in SDL
2.0?

Specs:
SDL 1.2.15
Ubuntu 13.04, 64-bit
Unity/Compiz
nouveau 1.0.7
GeForce GT 430

/Ulf

David_Olofson · October 5, 2013, 7:54pm

You’re running an OpenGL accelerated desktop…? (I haven’t been
keeping up to date with Compiz etc.)

Generally speaking, interference between applications in windowed mode
is to be expected, unless you’re on a windowing system with double
buffered windowing implemented in hardware - and I’m not sure you’ll
find that on anything but high end workstations, even today. The
typical situation is that all applications are rendering into the same
physical frame buffer, meaning that the whole desktop is double
buffered and page flipped; not separate windows. As a result, flip
requests need to be synchronized across all applications, which causes
full desktop performance issues if one or more applications aren’t
keeping up.

I’m not sure there is much you (or SDL) can do about this from the
context of an application…

That said, you should never rely on a fixed refresh rate or
rendering frame rate, except possibly if you’re developing for an
arcade machine, console or similarly completely controlled and closed
environment. This is, unsurprisingly, even more true when running in
windowed mode on any operating system. All you can do is try to keep
up with the display refresh rate, and if you can’t, handle it
gracefully by minimizing the impact of dropped frames.

As to sound, well, I know emulators are a bit special in this regard,
but basically, audio code has no business in the graphics rendering
loop! Audio is hard realtime to a much greater extent than anything
else games and the like are doing, and the “text book” solution to
that is to perform the actual audio processing in a dedicated thread.
That way, the worst that can happen is that sound effects started
right at a dropped frame are subtly delayed. It’s a lot harder to do
with an emulator though, as the CPU/soundchip interaction is much
lower level and not at all designed to cope with latency issues… You
could use a buffered stream of timestamped register writes, but that’s
almost as bad as skips whenever the emulated program is doing anything
non-trivial sound wise.

If all else fails, use extra buffering for the audio stream. You could
set it up so that the audio thread detects when buffers from the
emulator core arrive late, and increases buffering as needed. That
way, you’ll have minimal latency by default, but if there are issues,
there’ll just be a few initial glitches until buffering has been
increased.On Sat, Oct 5, 2013 at 8:45 PM, Ulf Magnusson wrote:

Hi,

Scrolling in Firefox (but not Chromium) causes
SDL_Flip()/SDL_UpdateRect() to take much longer to complete in
windowed mode on my machine, making my emulator drop below 60 FPS and
causing sound skips. Any ideas what might be causing this and how it
could be mitigated (besides dropping SDL_Flip() calls or doing work in
a separate thread)? Is this something that might have improved in SDL
2.0?

Specs:
SDL 1.2.15
Ubuntu 13.04, 64-bit
Unity/Compiz
nouveau 1.0.7
GeForce GT 430

/Ulf

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

–
//David Olofson - Consultant, Developer, Artist, Open Source Advocate

.— Games, examples, libraries, scripting, sound, music, graphics —.
| http://consulting.olofson.net http://olofsonarcade.com |
’---------------------------------------------------------------------’

Ulf_Magnusson · October 5, 2013, 9:32pm

Thanks for the reply!

You’re running an OpenGL accelerated desktop…? (I haven’t been
keeping up to date with Compiz etc.)

Yup, it uses 3D hardware acceleration - I would assume OpenGL-based.

Generally speaking, interference between applications in windowed mode
is to be expected, unless you’re on a windowing system with double
buffered windowing implemented in hardware - and I’m not sure you’ll
find that on anything but high end workstations, even today. The
typical situation is that all applications are rendering into the same
physical frame buffer, meaning that the whole desktop is double
buffered and page flipped; not separate windows. As a result, flip
requests need to be synchronized across all applications, which causes
full desktop performance issues if one or more applications aren’t
keeping up.

I’m not sure there is much you (or SDL) can do about this from the
context of an application…

Neither my emulator nor Firefox when it’s scrolling uses very much CPU,
so from that perspective it wouldn’t have any trouble keeping up. I guess
Firefox might be drawing in some fashion that hogs the render queue
(or whatever the term would be) for a long time though, so that SDL has
to wait to do its flip. Going on vague memories here, but I think Firefox
renders to and draws from a buffer, while Chromium uses higher-level
drawing commands - or did in the past at least. Made Firefox very slow
over X11 forwarding.

That said, you should never rely on a fixed refresh rate or
rendering frame rate, except possibly if you’re developing for an
arcade machine, console or similarly completely controlled and closed
environment. This is, unsurprisingly, even more true when running in
windowed mode on any operating system. All you can do is try to keep
up with the display refresh rate, and if you can’t, handle it
gracefully by minimizing the impact of dropped frames.

I’m syncing on a timer, though currently in a single thread. The basic
structure of the main loop is

[emulate a single frame] -> [delay until 1/60’th of a second has
passed] -> [render frame]

The render step is basically just an SDL_Flip(). Samples are generated
into an internal buffer during the emulation step (with some fudging of
the speed during resampling to avoid buffer underruns/overruns). The
SDL callback gets its data from that buffer.

As to sound, well, I know emulators are a bit special in this regard,
but basically, audio code has no business in the graphics rendering
loop! Audio is hard realtime to a much greater extent than anything
else games and the like are doing, and the “text book” solution to
that is to perform the actual audio processing in a dedicated thread.
That way, the worst that can happen is that sound effects started
right at a dropped frame are subtly delayed. It’s a lot harder to do
with an emulator though, as the CPU/soundchip interaction is much
lower level and not at all designed to cope with latency issues… You
could use a buffered stream of timestamped register writes, but that’s
almost as bad as skips whenever the emulated program is doing anything
non-trivial sound wise.

If all else fails, use extra buffering for the audio stream. You could
set it up so that the audio thread detects when buffers from the
emulator core arrive late, and increases buffering as needed. That
way, you’ll have minimal latency by default, but if there are issues,
there’ll just be a few initial glitches until buffering has been
increased.

In this case Firefox makes SDL_Flip() so slow that I can’t maintain
60 emulated frames per second, which will always mess up audio
since I’m not interpolating samples (unless I stretch/slow down
the audio a lot, but that sounds really bad too).

I guess I could use a separate rendering thread as well instead of
messing with audio buffering. Once a frame is ready, the buffer
could be handed off to the rendering thread. If the rendering thread
has not finished displaying the previous frame at that point (e.g.
because it’s stalled in SDL_Flip()), the new frame could be dropped.
That would mean dropped frames while doing stuff in Firefox, which
is probably better than bad audio at least.

The drawback is I might have to do manual double-buffering, since
the emulation step wouldn’t be able to write new pixels into the buffer
that’s being drawn.

/UlfOn Sat, Oct 5, 2013 at 9:54 PM, David Olofson wrote:

David_Olofson · October 5, 2013, 10:01pm

[…]

Neither my emulator nor Firefox when it’s scrolling uses very much CPU,
so from that perspective it wouldn’t have any trouble keeping up. I guess
Firefox might be drawing in some fashion that hogs the render queue
(or whatever the term would be) for a long time though, so that SDL has
to wait to do its flip. Going on vague memories here, but I think Firefox
renders to and draws from a buffer, while Chromium uses higher-level
drawing commands - or did in the past at least. Made Firefox very slow
over X11 forwarding.

That might explain it… If Firefox isn’t smart about what it uploads,
the GPU is going to have to spend quite a bit of time DMAing texture
data from those buffers.

[…]

In this case Firefox makes SDL_Flip() so slow that I can’t maintain
60 emulated frames per second,

This is exactly why you would add buffering in one place or another.
Of course, this adds latency - but that’s what you get unless you’re
running on a dedicated realtime system. You simply need to have enough
buffering that you’re able to meet the deadlines.

[…]

which will always mess up audio
since I’m not interpolating samples (unless I stretch/slow down
the audio a lot, but that sounds really bad too).

Stretching isn’t going to solve this problem; only make it worse…
Stretching is only for when the output sample rate differs from what
you expect, and it should only be a matter of a fraction of a sample
per buffer normally.

I guess I could use a separate rendering thread as well instead of
messing with audio buffering. Once a frame is ready, the buffer
could be handed off to the rendering thread. If the rendering thread
has not finished displaying the previous frame at that point (e.g.
because it’s stalled in SDL_Flip()), the new frame could be dropped.
That would mean dropped frames while doing stuff in Firefox, which
is probably better than bad audio at least.

Yes, that’s one way to do it. Probably better than audio buffering, as
it doesn’t add extra latency.

The drawback is I might have to do manual double-buffering, since
the emulation step wouldn’t be able to write new pixels into the buffer
that’s being drawn.

Well, on the upside, this would allow emulator rendering and buffer
rendering/uploading to happen asynchronously, theoretically allowing
you to maintain 60 fps even if those operations take close to 16.7 ms
each. So, you could basically double your rendering throughput on a
slow/problematic system.On Sat, Oct 5, 2013 at 11:32 PM, Ulf Magnusson wrote:

–
//David Olofson - Consultant, Developer, Artist, Open Source Advocate

.— Games, examples, libraries, scripting, sound, music, graphics —.
| http://consulting.olofson.net http://olofsonarcade.com |
’---------------------------------------------------------------------’

Ulf_Magnusson · October 5, 2013, 10:28pm

[…]

Neither my emulator nor Firefox when it’s scrolling uses very much CPU,
so from that perspective it wouldn’t have any trouble keeping up. I guess
Firefox might be drawing in some fashion that hogs the render queue
(or whatever the term would be) for a long time though, so that SDL has
to wait to do its flip. Going on vague memories here, but I think Firefox
renders to and draws from a buffer, while Chromium uses higher-level
drawing commands - or did in the past at least. Made Firefox very slow
over X11 forwarding.

That might explain it… If Firefox isn’t smart about what it uploads,
the GPU is going to have to spend quite a bit of time DMAing texture
data from those buffers.

[…]

In this case Firefox makes SDL_Flip() so slow that I can’t maintain
60 emulated frames per second,

This is exactly why you would add buffering in one place or another.
Of course, this adds latency - but that’s what you get unless you’re
running on a dedicated realtime system. You simply need to have enough
buffering that you’re able to meet the deadlines.

If the period of not being able to emulate at full speed lasts long enough no
amount of buffering is going to help though, since audio is consumed at a
faster rate than it is generated. There’s also the problem that you wouldn’t
know that you need extra buffering until it’s already too late (when you’re
already emulating too slowly).

[…]

which will always mess up audio
since I’m not interpolating samples (unless I stretch/slow down
the audio a lot, but that sounds really bad too).

Stretching isn’t going to solve this problem; only make it worse…
Stretching is only for when the output sample rate differs from what
you expect, and it should only be a matter of a fraction of a sample
per buffer normally.

Stretching would help if you e.g. generate twice as many samples
(slowing the sound down to half speed) if you’re only able to
maintain 30 FPS. Sounds wonky though.

Adjusting the playback rate very slightly (hopefully inaudibly) is used
in some modern emulators to make sure you never under- or overflow
(keep the audio buffer fill level consistent). Low-level sound emulation
gets messy since you basically have a single infinite "sound effect"
playing at all times, which needs to be perfectly synchronized to video.
If you try to simply generate samples at the backend’s sample rate
without doing any monitoring or adjustments, you’ll eventually
under- or overflow purely due to jitter. Stuff like that is why many
emulators like to have crappy sound.

I guess I could use a separate rendering thread as well instead of
messing with audio buffering. Once a frame is ready, the buffer
could be handed off to the rendering thread. If the rendering thread
has not finished displaying the previous frame at that point (e.g.
because it’s stalled in SDL_Flip()), the new frame could be dropped.
That would mean dropped frames while doing stuff in Firefox, which
is probably better than bad audio at least.

Yes, that’s one way to do it. Probably better than audio buffering, as
it doesn’t add extra latency.

The drawback is I might have to do manual double-buffering, since
the emulation step wouldn’t be able to write new pixels into the buffer
that’s being drawn.

Well, on the upside, this would allow emulator rendering and buffer
rendering/uploading to happen asynchronously, theoretically allowing
you to maintain 60 fps even if those operations take close to 16.7 ms
each. So, you could basically double your rendering throughput on a
slow/problematic system.

Yeah, I think I’ll try this approach (after upgrading to SDL 2 just in case
that helps). Would be nice if there was a non-blocking version of SDL_Flip()
that returned a status and a new buffer pointer or something, but maybe it’s
a bit specific…

/UlfOn Sun, Oct 6, 2013 at 12:01 AM, David Olofson wrote:

On Sat, Oct 5, 2013 at 11:32 PM, Ulf Magnusson <@Ulf_Magnusson> wrote:

David_Olofson · October 5, 2013, 11:11pm

[…]

If the period of not being able to emulate at full speed lasts long enough no
amount of buffering is going to help though, since audio is consumed at a
faster rate than it is generated. There’s also the problem that you wouldn’t
know that you need extra buffering until it’s already too late (when you’re
already emulating too slowly).

Oh, I was thinking in terms of hard limits here; what it takes to deal
with not being able to run at an actual steady 60 fps.

The emulator will have to catch up by dropping frames on way or
another, in order to maintain the nominal 60 fps internally. There’s
just no way around this, whether it’s an emulator or a native video
game.

[…]

Stretching would help if you e.g. generate twice as many samples
(slowing the sound down to half speed) if you’re only able to
maintain 30 FPS. Sounds wonky though.

But, what’s the point in that? I would think the top priority in most
cases is to run the applications in real time, rather than rendering
every single frame.

Adjusting the playback rate very slightly (hopefully inaudibly) is used
in some modern emulators to make sure you never under- or overflow
(keep the audio buffer fill level consistent).

You’ll definitely need to do that one way or another in any real time
audio application, as sample rates and refresh rates are only so
accurate, and not synchronized on the hardware side.

The only exception would be oddball hardware that actually drives the
audio CODEC from the same clock as the video subsystem - and I believe
those days ended shortly after the Amiga, as far as computers go.

Low-level sound emulation
gets messy since you basically have a single infinite "sound effect"
playing at all times, which needs to be perfectly synchronized to video.

Well, you could always separate control register writes from the
actual audio rendering, but that’s going to result in all sorts of
interesting artifacts with anything but the most trivial sounds…
(Stalling arpeggios, rumble/noise that occasionally turns into square
waves etc.) Too low level interface for that kind of stuff.

If you try to simply generate samples at the backend’s sample rate
without doing any monitoring or adjustments, you’ll eventually
under- or overflow purely due to jitter. Stuff like that is why many
emulators like to have crappy sound.

Of course. Same problem as video players, soft synths and anything
else that needs to deal with multiple streams driven by independently
clocked hardware. Rates are only approximate, not stable, and not
synchronized.

[…]

Yeah, I think I’ll try this approach (after upgrading to SDL 2 just in case
that helps). Would be nice if there was a non-blocking version of SDL_Flip()
that returned a status and a new buffer pointer or something, but maybe it’s
a bit specific…

I’m not sure that’s possible to implement over all platforms and APIs,
so it would probably be one of those bonus features you can’t rely
entirely on. However, if you’re doing the software rendering in
another thread and only uploading and rendering in the main thread,
you can sort of implement that anyway.On Sun, Oct 6, 2013 at 12:28 AM, Ulf Magnusson wrote:

–
//David Olofson - Consultant, Developer, Artist, Open Source Advocate

.— Games, examples, libraries, scripting, sound, music, graphics —.
| http://consulting.olofson.net http://olofsonarcade.com |
’---------------------------------------------------------------------’

Ulf_Magnusson · October 6, 2013, 10:59am

[…]

If the period of not being able to emulate at full speed lasts long enough no
amount of buffering is going to help though, since audio is consumed at a
faster rate than it is generated. There’s also the problem that you wouldn’t
know that you need extra buffering until it’s already too late (when you’re
already emulating too slowly).

Oh, I was thinking in terms of hard limits here; what it takes to deal
with not being able to run at an actual steady 60 fps.

The emulator will have to catch up by dropping frames on way or
another, in order to maintain the nominal 60 fps internally. There’s
just no way around this, whether it’s an emulator or a native video
game.

[…]

Stretching would help if you e.g. generate twice as many samples
(slowing the sound down to half speed) if you’re only able to
maintain 30 FPS. Sounds wonky though.

But, what’s the point in that? I would think the top priority in most
cases is to run the applications in real time, rather than rendering
every single frame.

Thought you meant increasing buffering as a way to avoid sound stuttering
when you’re not able to emulate fast enough. Stretching would work there,
but would probably sound just as bad as stuttering.

Low-level sound emulation
gets messy since you basically have a single infinite "sound effect"
playing at all times, which needs to be perfectly synchronized to video.

Well, you could always separate control register writes from the
actual audio rendering, but that’s going to result in all sorts of
interesting artifacts with anything but the most trivial sounds…
(Stalling arpeggios, rumble/noise that occasionally turns into square
waves etc.) Too low level interface for that kind of stuff.

Yeah, don’t think I’ll go down that road. Would make the implementation
messier too with little gain, as audio is already relatively speedy.

[…]

Yeah, I think I’ll try this approach (after upgrading to SDL 2 just in case
that helps). Would be nice if there was a non-blocking version of SDL_Flip()
that returned a status and a new buffer pointer or something, but maybe it’s
a bit specific…

I’m not sure that’s possible to implement over all platforms and APIs,
so it would probably be one of those bonus features you can’t rely
entirely on. However, if you’re doing the software rendering in
another thread and only uploading and rendering in the main thread,
you can sort of implement that anyway.

Since you can’t change which surface is the screen, I guess you might
have to do something like

Emulator thread:

<If rendering thread is idle, copy RAM buffer to screen surface>

Render thread:

SDL_Flip()

With double buffering you could move the copying to the render thread
as well. Maybe there’s some copy-less you could do with SDL 2.0 and
textures… haven’t looked at the SDL 2.0 API yet.

/UlfOn Sun, Oct 6, 2013 at 1:11 AM, David Olofson wrote:

On Sun, Oct 6, 2013 at 12:28 AM, Ulf Magnusson <@Ulf_Magnusson> wrote:

Ulf_Magnusson · October 8, 2013, 9:41pm

I’ve now split things up into an emulation and a rendering/SDL thread
(the latter
being the main thread), and it seems to work well enough. I’ve also switched
to SDL2. It has brought up some other questions though:

(1) I’m still doing sample generation and SDL_Lock/UnlockAudio() in the
emulation thread. Are those functions safe to use outside the main
thread on all platforms?

(2) I’m also doing SDL_PumpEvents() in the emulation thread, right as the
controllers are read (to minimize input lag, though I’m not sure if it makes
any perceptible difference in practice). From reading around a bit this seems
to be more sketchy, and could be changed to doing SDL_PumpEvents()
once per frame in the rendering/SDL thread. Any other options?

/UlfOn Sun, Oct 6, 2013 at 12:59 PM, Ulf Magnusson <@Ulf_Magnusson> wrote:

On Sun, Oct 6, 2013 at 1:11 AM, David Olofson wrote:

On Sun, Oct 6, 2013 at 12:28 AM, Ulf Magnusson <@Ulf_Magnusson> wrote:
[…]

If the period of not being able to emulate at full speed lasts long enough no
amount of buffering is going to help though, since audio is consumed at a
faster rate than it is generated. There’s also the problem that you wouldn’t
know that you need extra buffering until it’s already too late (when you’re
already emulating too slowly).

Oh, I was thinking in terms of hard limits here; what it takes to deal
with not being able to run at an actual steady 60 fps.

The emulator will have to catch up by dropping frames on way or
another, in order to maintain the nominal 60 fps internally. There’s
just no way around this, whether it’s an emulator or a native video
game.

[…]

Stretching would help if you e.g. generate twice as many samples
(slowing the sound down to half speed) if you’re only able to
maintain 30 FPS. Sounds wonky though.

But, what’s the point in that? I would think the top priority in most
cases is to run the applications in real time, rather than rendering
every single frame.

Thought you meant increasing buffering as a way to avoid sound stuttering
when you’re not able to emulate fast enough. Stretching would work there,
but would probably sound just as bad as stuttering.

Low-level sound emulation
gets messy since you basically have a single infinite "sound effect"
playing at all times, which needs to be perfectly synchronized to video.

Well, you could always separate control register writes from the
actual audio rendering, but that’s going to result in all sorts of
interesting artifacts with anything but the most trivial sounds…
(Stalling arpeggios, rumble/noise that occasionally turns into square
waves etc.) Too low level interface for that kind of stuff.

Yeah, don’t think I’ll go down that road. Would make the implementation
messier too with little gain, as audio is already relatively speedy.

[…]

Yeah, I think I’ll try this approach (after upgrading to SDL 2 just in case
that helps). Would be nice if there was a non-blocking version of SDL_Flip()
that returned a status and a new buffer pointer or something, but maybe it’s
a bit specific…

I’m not sure that’s possible to implement over all platforms and APIs,
so it would probably be one of those bonus features you can’t rely
entirely on. However, if you’re doing the software rendering in
another thread and only uploading and rendering in the main thread,
you can sort of implement that anyway.

Since you can’t change which surface is the screen, I guess you might
have to do something like

Emulator thread:

<If rendering thread is idle, copy RAM buffer to screen surface>

Render thread:

SDL_Flip()

With double buffering you could move the copying to the render thread
as well. Maybe there’s some copy-less you could do with SDL 2.0 and
textures… haven’t looked at the SDL 2.0 API yet.

/Ulf