Double-buffering *and* dirty rectangle lists?

Has anyone tried doing animation in SDL with both techniques, or is this a
disaster waiting to happen? I’m trying to maximise animation speed for
potentially very slow machines (albeit ones with 8MB graphics cards) so what
I’d like to do is use a double-buffered hardware display surface, and also
keep lists of dirty rectangles for each buffer, so I can avoid a whole-screen
redraw each frame. The dirty list is a bit more complicated as a result, I’d
imagine, but otherwise I expect it should give me a bit more speed than using
software buffers and SDL_Updating (to get a smooth single-buffered display)
or doing full-screen redraws. Any thoughts on this?

cheers,–
Matthew Bloch Bytemark Computer Consulting Limited
http://www.bytemark.co.uk/
tel. +44 (0) 8707 455026

I guess if you use hardware surfaces (fullscreen on Windows)
with double buffer, you don’t have to use dirty rectangle
since the effect isn’t as obvious. If you use software surfaces
(window mode on most platforms), then you don’t have to specify
double buffer.

It also depends on the tradeoff brought by your dirty rectangle
algorithm. You’ll find my RectangleSet implementation at:

http://www.gime.org/twiki/bin/view/Gime/WebDownload

Normally, when you have more than a few hundred dirty rectangles
per frame update, it is better to stick with full screen update.

Regards,
.paul.On Sat, Aug 24, 2002 at 01:52:10AM +0100, Matthew Bloch wrote:

Has anyone tried doing animation in SDL with both techniques, or is
this a disaster waiting to happen? I’m trying to maximise
animation speed for potentially very slow machines (albeit ones
with 8MB graphics cards) so what I’d like to do is use a
double-buffered hardware display surface, and also keep lists of
dirty rectangles for each buffer, so I can avoid a whole-screen
redraw each frame. The dirty list is a bit more complicated as a
result, I’d imagine, but otherwise I expect it should give me a bit
more speed than using software buffers and SDL_Updating (to get a
smooth single-buffered display) or doing full-screen redraws. Any
thoughts on this?

Well my concern is speed because running at 800x600x32 on a slow machine
(Gamesnet cabinets) has been provably painful. Dirty rectangles will be in
the order of tens rather than hundreds, so I’m sure it’s the right way.
Anyhow I’ve worked out a basic algorithm and will see how it goes; thanks for
the dirty rectangle code, it looks a little more optimal than mine.On Saturday 24 August 2002 05:52, paul at theV.net wrote:

On Sat, Aug 24, 2002 at 01:52:10AM +0100, Matthew Bloch wrote:

Has anyone tried doing animation in SDL with both techniques, or is
this a disaster waiting to happen? I’m trying to maximise
animation speed for potentially very slow machines (albeit ones
with 8MB graphics cards) so what I’d like to do is use a
double-buffered hardware display surface, and also keep lists of
dirty rectangles for each buffer, so I can avoid a whole-screen
redraw each frame. The dirty list is a bit more complicated as a
result, I’d imagine, but otherwise I expect it should give me a bit
more speed than using software buffers and SDL_Updating (to get a
smooth single-buffered display) or doing full-screen redraws. Any
thoughts on this?

I guess if you use hardware surfaces (fullscreen on Windows)
with double buffer, you don’t have to use dirty rectangle
since the effect isn’t as obvious. If you use software surfaces
(window mode on most platforms), then you don’t have to specify
double buffer.


Matthew > http://www.soup-kitchen.net/
> ICQ 19482073

I AM familiar with the technique, and was planning on suggesting it to
Sam (along with multiple back-buffers) for the 1.3 tree (as the current
goal for 1.2 --as I understand it-- is to remain API AND ABI
compatible).

For those of you that don’t understand the issue, I will present it
briefly:

System RAM tends to be much faster than video RAM for most operations
(the primary exception is blitting between 2 buffers BOTH in video
RAM)… As a result, it is often fastest to construct a composite frame
in system memory, and only update the part of the screen that changed in
the last N frames [where N = 1 + Total number of video buffers]. Even
though the whole screen is being drawn (into system memory) every frame,
only the parts that change need to be updated via the relatively slow
PCI bus. This then can be combined with traditional back-buffering
techniques, assuming that you keep a dirty-rect list for each back
buffer. It can be argued that it might be SDL’s job to hide this from
the user when the user uses a software surface, and SDL_UpdateRects(),
but that’s what group debate is for… right?

I will save discussing the merits of multiple back-buffers for another
email… (FYI, I see almost no reason to have more than 2 back-buffers)

As far as doing this with SDL in its current state, I think it SHOULD
work for a REAL hardware double-buffered surface (I would ask Sam his
opinion)… but I doubt it will work correctly when SDL is emulating
hardware double-buffering. I have not tested this myself, so YMMV.

Best of luck,

-LorenOn Fri, 2002-08-23 at 17:52, Matthew Bloch wrote:

Has anyone tried doing animation in SDL with both techniques, or is this a
disaster waiting to happen? I’m trying to maximise animation speed for
potentially very slow machines (albeit ones with 8MB graphics cards) so what
I’d like to do is use a double-buffered hardware display surface, and also
keep lists of dirty rectangles for each buffer, so I can avoid a whole-screen
redraw each frame. The dirty list is a bit more complicated as a result, I’d
imagine, but otherwise I expect it should give me a bit more speed than using
software buffers and SDL_Updating (to get a smooth single-buffered display)
or doing full-screen redraws. Any thoughts on this?

cheers,

El 24 Aug 2002 02:42:26 -0700
Loren Osborn <linux_dr at yahoo.com> escribi?:

System RAM tends to be much faster than video RAM for most operations
(the primary exception is blitting between 2 buffers BOTH in video
RAM)

I think you can get a better overall performance by using DMA, as DirectX does.

DMA transfers are somewhat slower than CPU transfers, but that way you can do
any other thing with the CPU.

Regards,
Wizord.

Has anyone tried doing animation in SDL with both techniques,

Yes, several times on different platforms. (Amiga and VGA Mode-X mostly; just hacking an implementation for SDL and OpenGL now.)

or is this a
disaster waiting to happen?

No, but it’s a bit hairy…

I’m trying to maximise animation speed for
potentially very slow machines (albeit ones with 8MB graphics cards) so what
I’d like to do is use a double-buffered hardware display surface, and also
keep lists of dirty rectangles for each buffer, so I can avoid a whole-screen
redraw each frame. The dirty list is a bit more complicated as a result, I’d
imagine, but otherwise I expect it should give me a bit more speed than using
software buffers and SDL_Updating (to get a smooth single-buffered display)
or doing full-screen redraws. Any thoughts on this?

Well, the major advantage with the method you’re proposing is that it makes it possible to take advantage of retrace sync, if available. (That’s the only way to completely avoid tearing.)

One thing you must keep in mind though, is that most APIs guarantee nothing as to what happens to the contents of the old display page after flipping.

Some implementations (some OpenGL drivers and SDL in software double buffering mode for example) will actually use the same back buffer all the time, and blit from that to the screen when you “flip”.

Others might give you three buffers instead of two, which would mean that you get into trouble if you assume double buffering. (This is probably most likely to happen with OpenGL.)

The first case isn’t a serious problem as things will still render correctly. It’s just that you lose speed due to performing every update twice when you only need to do it once. (You’re rendering into the same buffer all the time.)

The second case is a serious issue - especially if you can’t get reliable info from the driver. This method is really out of spec for most targets, so you’re basically on your own. Giving the user the option to select between “Full Redraw”, “Assume Double Buffered” and “Assume Triple Buffered” should allow things to run as fast as possible on virtually any target. (I have yet to find something that actually thrashes pages during flipping, so as long as you have the right idea about the number of pages used you should be safe. The only realistic problem scenario I can think of would be a target using three or more pages and a “pick any unused” flipping scheme, rather than the usual circular queue scheme. Never seen anything like that, except for the weird double + “low frequency third buffer” thing I hacked for Project Spitfire/DOS.)

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn Sat, 24/08/2002 01:52:10 , Matthew Bloch wrote:

[…smart double buffering…]

As far as doing this with SDL in its current state, I think it SHOULD
work for a REAL hardware double-buffered surface (I would ask Sam his
opinion)…

Well, at least things behave as expected on DirectX and OpenGL on Win32… (Can’t see why it shouldn’t. You just need to watch out for triple buffering when not asking for it.)

but I doubt it will work correctly when SDL is emulating
hardware double-buffering. I have not tested this myself, so YMMV.

Emulation doesn’t make a difference, except of course that the whole double dirty rect list thing is wasted on it.

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn 24/08/2002 02:42:26 , Loren Osborn <linux_dr at yahoo.com> wrote:

My actual concern was one that you illustrated in your last email: That (generally
in emulated backbuffer scenarios) instead of trading off between two distinct
buffers (as you do in hardware double buffering) that the same buffer is used
repeatedly as back buffer (which could confuse the program expecting the opposite
behavior).

-LorenOn Mon, 2002-08-26 at 04:49, David Olofson wrote:

On 24/08/2002 02:42:26 , Loren Osborn <@Loren_Osborn> wrote:

but I doubt it will work correctly when SDL is emulating
hardware double-buffering. I have not tested this myself, so YMMV.

Emulation doesn’t make a difference, except of course that the whole double
dirty rect list thing is wasted on it.

El 24 Aug 2002 02:42:26 -0700
Loren Osborn <linux_dr at yahoo.com> escribi?:

System RAM tends to be much faster than video RAM for most operations
(the primary exception is blitting between 2 buffers BOTH in video
RAM)

I think you can get a better overall performance by using DMA, as DirectX does.

It’s just that most targets don’t support sysRAM->VRAM DMA at all,
and there’s nothing SDL can do about that.

DMA transfers are somewhat slower than CPU transfers, but that way you can do
any other thing with the CPU.

In my experience, DMA is generally a lot faster than CPU transfers,
at least on newer cards (or we wouldn’t have this major problem with
dog slow software rendering on Linux…) - but your point is valid
nevertheless.

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn Mon, 26/08/2002 01:24:29 , Jos? Luis S?nchez wrote:

Yeah, but that’s only a problem with effects that take data from the frame buffer as input. (Recursive blending and blurring effects and such.) Such effects implemented directly towards VRAM (ie VRAM reading) would be painfully slow anyway, so I’m not even considering them.

As long as you stick to plain opaque blits to the video buffer, things should work as expected. The only thing that happens if you assume “real” double buffering but get a single shadow back surface, is that you’ll waste time updating everything twice.

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn 26/08/2002 05:01:06 , Loren Osborn <linux_dr at yahoo.com> wrote:

On Mon, 2002-08-26 at 04:49, David Olofson wrote:

On 24/08/2002 02:42:26 , Loren Osborn <linux_dr at yahoo.com> wrote:

but I doubt it will work correctly when SDL is emulating
hardware double-buffering. I have not tested this myself, so YMMV.

Emulation doesn’t make a difference, except of course that the whole double
dirty rect list thing is wasted on it.

My actual concern was one that you illustrated in your last email: That (generally
in emulated backbuffer scenarios) instead of trading off between two distinct
buffers (as you do in hardware double buffering) that the same buffer is used
repeatedly as back buffer (which could confuse the program expecting the opposite
behavior).

Thanks, an enlightening warning. My display code happily falls back to
SDL_UpdateRects in a single-buffer situation, so I think I can write such
"risky" code and see how it goes on my G200.On Monday 26 August 2002 12:41, David Olofson wrote:

I’m trying to maximise animation speed for
potentially very slow machines (albeit ones with 8MB graphics cards) so
what I’d like to do is use a double-buffered hardware display surface,
and also keep lists of dirty rectangles for each buffer, so I can avoid a
whole-screen redraw each frame. The dirty list is a bit more complicated
as a result, I’d imagine, but otherwise I expect it should give me a bit
more speed than using software buffers and SDL_Updating (to get a smooth
single-buffered display) or doing full-screen redraws. Any thoughts on
this?

Well, the major advantage with the method you’re proposing is that it makes
it possible to take advantage of retrace sync, if available. (That’s the
only way to completely avoid tearing.)

One thing you must keep in mind though, is that most APIs guarantee
nothing as to what happens to the contents of the old display page after
flipping.


Matthew > http://www.soup-kitchen.net/
> ICQ 19482073

I disagree… If you have an object moving from position A to B to C to
D in consecutive frames: When the application goes to draw the object at
location C, it should erase the object from location A, but NOT from
location B if it assumes real double buffering… Likewise it you it is
dealing with TRIPPLE buffering instead, when it is drawing the object at
location D, it should erase the object at location A, but NOT at B or
C. Basically to do this with utmost efficiency, you need to know
EXACTLY how many buffers there are, (and exactly what order they are
being given to you)…

SAM PLEASE READ THE FOLLOWING, I would like your feedback on such an
API change:

Perhaps that is the solution. SDL could add a function to query the
number of buffers, assign each buffer an index number, and add a
function to querey the index number of the current buffer. That way you
could develop a general solution that supports both emulation and
tripple buffering. (You should actually keep 2 dirty lists for each
buffer: A list of all the dirtying operations you are performing on it
this frame, and a list of the dirtying operations you performed on it
the last time you drew on that frame. ) From experience with DirectX, I
know the system (Windows at least) will clear the frame behind your
back. SDL should probably detect and report this also. (We detected it
by keeping a non-zero pixel in one known corner, and reporting when it
got zeroed).

All suggestions welcome,

-LorenOn Mon, 2002-08-26 at 05:49, David Olofson wrote:

On 26/08/2002 05:01:06 , Loren Osborn <@Loren_Osborn> wrote:

On Mon, 2002-08-26 at 04:49, David Olofson wrote:

On 24/08/2002 02:42:26 , Loren Osborn <@Loren_Osborn> wrote:

but I doubt it will work correctly when SDL is emulating
hardware double-buffering. I have not tested this myself, so YMMV.

Emulation doesn’t make a difference, except of course that the whole double
dirty rect list thing is wasted on it.

My actual concern was one that you illustrated in your last email: That (generally
in emulated backbuffer scenarios) instead of trading off between two distinct
buffers (as you do in hardware double buffering) that the same buffer is used
repeatedly as back buffer (which could confuse the program expecting the opposite
behavior).

Yeah, but that’s only a problem with effects that take data from the
frame buffer as input. (Recursive blending and blurring effects and such.)
Such effects implemented directly towards VRAM (ie VRAM reading) would be
painfully slow anyway, so I’m not even considering them.

As long as you stick to plain opaque blits to the video buffer, things
should work as expected. The only thing that happens if you assume
"real" double buffering but get a single shadow back surface, is that
you’ll waste time updating everything twice.

[…]

Yeah, but that’s only a problem with effects that take data from the
frame buffer as input. (Recursive blending and blurring effects and such.)
Such effects implemented directly towards VRAM (ie VRAM reading) would be
painfully slow anyway, so I’m not even considering them.

As long as you stick to plain opaque blits to the video buffer, things
should work as expected. The only thing that happens if you assume
"real" double buffering but get a single shadow back surface, is that
you’ll waste time updating everything twice.

I disagree… If you have an object moving from position A to B to C to
D in consecutive frames: When the application goes to draw the object at
location C, it should erase the object from location A, but NOT from
location B if it assumes real double buffering… Likewise it you it is
dealing with TRIPPLE buffering instead, when it is drawing the object at
location D, it should erase the object at location A, but NOT at B or
C.

Right, but it doesn’t matter if you’re doing more work than required. (Except that you’re wasting time, obviously.)

The whole point with one set of dirty rects per buffer is to reliably
ensure that everything that needs updating in each buffer is updated.
If there in fact is only one real buffer, the extra “history” dirty
rects will just cause areas that are already up to date to be updated
again, for no visible effect.

Basically to do this with utmost efficiency, you need to know
EXACTLY how many buffers there are,

Of course - but I’m not talking about efficiency, but rather about
whether or not you will get correct results despite the fact that
many drivers and targets either refuse to tell you what’s going on,
or lie about it.

(and exactly what order they are
being given to you)…

Yes, but as most targets don’t even support anything but a simple
circular chain (even if other schemes could be of use in some cases
:-/ ), this doesn’t seem to be a real problem.

SAM PLEASE READ THE FOLLOWING, I would like your feedback on such an
API change:

Perhaps that is the solution. SDL could add a function to query the
number of buffers, assign each buffer an index number, and add a
function to querey the index number of the current buffer.

This is not possible to implement on any target I know of. (Well,
unless programming directly to the metal like in the DOS days counts,
of course. However, SDL doesn’t run on DOS, AFAIK.)

That way you
could develop a general solution that supports both emulation and
tripple buffering.

I think SDL can generally tell (or rather, dictate) whether there
will be double buffering or emulation. Triple buffering is not
supported, so the only time you could possibly get that would be
when the driver or system does is behind your back. (Can’t think
of any drivers other than OpenGL and Direct3D that would do such
a thing.)

[…]

From experience with DirectX, I
know the system (Windows at least) will clear the frame behind your
back.

That’s interesting… Why would a driver waste blitting time clearing
buffers automatically.? (If DX does that all the time, that certainly
explains why plain “frame pumping” is much slower than expected on
some cards…)

SDL should probably detect and report this also. (We detected it
by keeping a non-zero pixel in one known corner, and reporting when it
got zeroed).

heh Well, there’s no end to what application programmers have to do
to work around the traps set up by the guys in Redmond… :wink:

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn 26/08/2002 14:24:12 , Loren Osborn <linux_dr at yahoo.com> wrote:

On Mon, 2002-08-26 at 05:49, David Olofson wrote:

El Mon, 26 Aug 2002 14:43:06 +0200
David Olofson<david.olofson at reologica.se> escribi?:> On Mon, 26/08/2002 01:24:29 , Jos? Luis S?nchez <@Jose_Luis_Sanchez> wrote:

El 24 Aug 2002 02:42:26 -0700
Loren Osborn <linux_dr at yahoo.com> escribi?:

System RAM tends to be much faster than video RAM for most operations
(the primary exception is blitting between 2 buffers BOTH in video
RAM)

I think you can get a better overall performance by using DMA, as DirectX
does.

It’s just that most targets don’t support sysRAM->VRAM DMA at all,
and there’s nothing SDL can do about that.

Isn’t is possible to implement this kind of transfer whenever possible? Using
DirectX, for example, you can simply do all your blits asynchronously, and the
underlying system uses DMA (or videocard’s blitter) for transfers.

Regards,
Wizord.

El 26 Aug 2002 14:24:12 -0700
Loren Osborn <linux_dr at yahoo.com> escribi?:

SAM PLEASE READ THE FOLLOWING, I would like your feedback on such an
API change:

[snip]

Speaking of API change: It might be wonderful if a way is provided for
selecting Triple Buffering (for example: adding a new flag for
SDL_SetVideoMode).

Regards,
Wizord.

David Olofson wrote:> On 26/08/2002 14:24:12 , Loren Osborn <linux_dr at yahoo.com> wrote:

From experience with DirectX, I
know the system (Windows at least) will clear the frame behind your
back.

That’s interesting… Why would a driver waste blitting time clearing
buffers automatically.? (If DX does that all the time, that certainly
explains why plain “frame pumping” is much slower than expected on
some cards…)

It doesn’t do this automatically on any of the graphics cards, versions
of Windows, or versions of DirectX that I have used. If you want a
buffer clearing, you have to tell it to do so.


Kylotan
http://pages.eidosnet.co.uk/kylotan

From experience with DirectX, I
know the system (Windows at least) will clear the frame behind your
back.

That’s interesting… Why would a driver waste blitting time clearing
buffers automatically.? (If DX does that all the time, that certainly
explains why plain “frame pumping” is much slower than expected on
some cards…)

It generally only happened after system events (that weren’t readily
detectable) such as another program getting focus, etc.

SDL should probably detect and report this also. (We detected it
by keeping a non-zero pixel in one known corner, and reporting when it
got zeroed).

heh Well, there’s no end to what application programmers have to do
to work around the traps set up by the guys in Redmond… :wink:

Too true.

-LorenOn Tue, 2002-08-27 at 06:42, David Olofson wrote:

On 26/08/2002 14:24:12 , Loren Osborn <@Loren_Osborn> wrote:

But isn’t that supposed to result in the surface being lost? Or is DX
making an exception for display surfaces…?

[…]

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn 27/08/2002 09:41:37 , Loren Osborn <linux_dr at yahoo.com> wrote:

On Tue, 2002-08-27 at 06:42, David Olofson wrote:

On 26/08/2002 14:24:12 , Loren Osborn <linux_dr at yahoo.com> wrote:

From experience with DirectX, I
know the system (Windows at least) will clear the frame behind your
back.

That’s interesting… Why would a driver waste blitting time clearing
buffers automatically.? (If DX does that all the time, that certainly
explains why plain “frame pumping” is much slower than expected on
some cards…)

It generally only happened after system events (that weren’t readily
detectable) such as another program getting focus, etc.

I didn’t work on this code personally, I’m just telling you what I
remember… They detected it with a non-black pixel in a corner… I
don’t know why they didn’t detect it some other way…

-LorenOn Tue, 2002-08-27 at 11:48, David Olofson wrote:

On 27/08/2002 09:41:37 , Loren Osborn <@Loren_Osborn> wrote:

On Tue, 2002-08-27 at 06:42, David Olofson wrote:

On 26/08/2002 14:24:12 , Loren Osborn <@Loren_Osborn> wrote:

From experience with DirectX, I
know the system (Windows at least) will clear the frame behind your
back.

That’s interesting… Why would a driver waste blitting time clearing
buffers automatically.? (If DX does that all the time, that certainly
explains why plain “frame pumping” is much slower than expected on
some cards…)

It generally only happened after system events (that weren’t readily
detectable) such as another program getting focus, etc.

But isn’t that supposed to result in the surface being lost? Or is DX
making an exception for display surfaces…?

Perhaps that is the solution. SDL could add a function to query the
number of buffers, assign each buffer an index number, and add a
function to querey the index number of the current buffer.

[snipped]

These things will be considered for SDL 2.0

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment