Flipping?

I want to do just a 2D game.

Initially I thought I would create a double buffered screen surface.
Completely redraw the entire screen each frame. And then use the
flipping function ( I didn’t eactly mean that as it sounds :slight_smile: ). I
thought surely all modern targets would handle that with ease.

So I headed off in this direction on Win32. And everything is working
fine whether full screen or not. I am getting pretty good frames per
second even on relatively old hardware.

Now from what I read on this forum not all targets will handle that. One
of them being X Windows. Is this true? Which targets do/dont support
this?

One of the work arounds seems to be triple buffering. Where you render
your scene to software surface. When you are finished blit that to a
hardware surface. Then blit that to the screen surface. Does this work?
Is it faster? And if so why doesn’t SDL support something like this
itself when I ask for double buffering on a system which doesn’t support
hardware flipping?

I want to do just a 2D game.

Initially I thought I would create a double buffered screen surface.
Completely redraw the entire screen each frame. And then use the
flipping function ( I didn’t eactly mean that as it sounds :slight_smile: ). I
thought surely all modern targets would handle that with ease.

So I headed off in this direction on Win32. And everything is working
fine whether full screen or not. I am getting pretty good frames per
second even on relatively old hardware.

Now from what I read on this forum not all targets will handle that.
One of them being X Windows. Is this true? Which targets do/dont
support this?

All targets support double buffering. The only difference is that most of
them can’t do hardware page flipping, so what you get for a screen
surface is actually a software back buffer, that is blitted to the video
surface when you SDL_Flip().

One of the work arounds seems to be triple buffering.

Not really. (See below.)

Where you render
your scene to software surface. When you are finished blit that to a
hardware surface. Then blit that to the screen surface. Does this work?

Yeah, but it’s totally pointless, unless it serves as a work-around for
some silly driver bug.

Is it faster?

No, it’s slower.

First, blitting from system RAM to VRAM has to be done with the CPU on
many targets, which makes it dog slow.

Second, if you’re on a target without accelerated VRAM->VRAM blits, BANG!
You’re dead. Blitting from VRAM->VRAM is probably slower than streaming
the uncompressed graphics from the hard drive… heh (If you have a
modern 7200+ rpm DMA/66 or DMA/100 drive, it is slower. Much slower,
in fact.)

Third, it doesn’t even eliminate flicker or tearing, as you’re still not
flipping, and thus cannot get retrace sync on many targets. (Several
targets offer retrace sync only for the flip operation; not as a separate
blocking or polling call.)

And if so why doesn’t SDL support something like this
itself when I ask for double buffering on a system which doesn’t
support hardware flipping?

There are many forms of triple buffering:

1) "True" triple buffering:
	Three VRAM buffers are arranged in a chain. On each
	flip, the chain is rotated, so that the updated back
	buffer gets displayed. The *previous* display buffer
	becomes the new rendering back buffer, while the
	currunt display buffer ends up being unused.

   Disadvantages:
	Uses more VRAM, and adds one video frame of latency.
	You're still rendering directly into VRAM, which
	means software rendering, and in particular alpha
	blending, is very slow, even on targets that support
	DMA blits from system RAM.

   Advantages:
	The extra back buffer makes it possible to use
	up to 200% of the frame rendering time without
	dropping a video frame. (Of course, you have to
	pay for that the next frame, but 200% is just a
	theoretical value. Sustained full frame rate at
	90% average CPU usage with occasional 150% peaks
	is entirely possible.)


2) "SemiTriple" (as it's called in Kobo Deluxe):
	Two VRAM pages are set up in a normal double
	buffering chain, and a software or VRAM back buffer
	is set up for rendering. All rendering is done into
	the "extra" buffer. When one frame is ready, any
	modified areas are blitted from the "extra" back
	buffer into the real back buffer, and then the back
	and front VRAM pages are flipped.

   Disadvantages:
	Not a real triple buffering setup; frame rate
	improvement as a result of more realistic demands
	CPU use and timing not possible. "Dirty areas" need
	to be managed for both VRAM pages. Not possible
	without hardware pageflipping.

   Advantages:
	Flicker free, thanks to double buffering. Efficient,
	as a result of only updating the changed areas, and
	doing so directly into the back buffer. Can take
	advantage of DMA blitting from system RAM to VRAM,
	even when software alpha blending and pixel effects
	are used.


3) Triple buffered scrolling:
	Two VRAM pages are set up in a normal double
	buffered pair. Rendering of sprites and other
	animated effects is done into the back buffer.
	A third "map buffer" is set up, also in VRAM.
	Scrolling background graphics is rendered into this
	buffer, rendering just a small area during each
	video/game frame. When a full frame is completed
	in the "map buffer", this buffer and the "oldest"
	of the other two VRAM buffers are swapped, and the
	background graphics rendering starts over in the
	new map buffer.

   Disadvantages:
	Requires support for non-chained triple buffering.
	(That is, the way DirectX does it is not sufficient.)
	Requires hardware scrolling with accurate per-page
	offset control. (While scrolling the background, the
	front and back pages will contain graphics from
	slightly different map positions.) Figuring out the
	map position before starting to render a new map page
	is tricky, to say the least, especially in games with
	direct "follow mode" scrolling.

   Advantages:
	*Very* efficient full screen scrolling - CPU time
	corresponding to one or two sprites per frame is
	usually enough. (Even a C64 could do it in multicolor
	highres mode, although the buffers alone would consume
	half the memory. :-)

The only new buffering modes of interest as far as I can see would be
"true" triple buffering (for speed, and when there’s retrace sync, for
less dropped frames, better CPU utilization and more realistic scheduling
timing requirements), and “SemiTriple” buffering, for fast software
rendering (in particular alpha blitting) in double buffered hardware
pageflipping modes.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Wednesday 31 October 2001 07:23, Adam Gates wrote:

Thanks for that. A lot to consider there.

I don’t want to make my work too difficult so I am not interested in
getting it running fast on all targets, only the important ones. So what
I want to know is which targets support the easy options.

You say:

Not all targets support hardware flipping, so which ones do?

RAM->VRAM is slow unless you have DMA transfers, so which ones do?

VRAM->VRAM is slow unless it is h/w accelerated, so which ones are?

Do these depend on the SDL target or the video card?

David Olofson wrote:>

On Wednesday 31 October 2001 07:23, Adam Gates wrote:

I want to do just a 2D game.

Initially I thought I would create a double buffered screen surface.
Completely redraw the entire screen each frame. And then use the
flipping function ( I didn’t eactly mean that as it sounds :slight_smile: ). I
thought surely all modern targets would handle that with ease.

So I headed off in this direction on Win32. And everything is working
fine whether full screen or not. I am getting pretty good frames per
second even on relatively old hardware.

Now from what I read on this forum not all targets will handle that.
One of them being X Windows. Is this true? Which targets do/dont
support this?

All targets support double buffering. The only difference is that most of
them can’t do hardware page flipping, so what you get for a screen
surface is actually a software back buffer, that is blitted to the video
surface when you SDL_Flip().

One of the work arounds seems to be triple buffering.

Not really. (See below.)

Where you render
your scene to software surface. When you are finished blit that to a
hardware surface. Then blit that to the screen surface. Does this work?

Yeah, but it’s totally pointless, unless it serves as a work-around for
some silly driver bug.

Is it faster?

No, it’s slower.

First, blitting from system RAM to VRAM has to be done with the CPU on
many targets, which makes it dog slow.

Second, if you’re on a target without accelerated VRAM->VRAM blits, BANG!
You’re dead. Blitting from VRAM->VRAM is probably slower than streaming
the uncompressed graphics from the hard drive… heh (If you have a
modern 7200+ rpm DMA/66 or DMA/100 drive, it is slower. Much slower,
in fact.)

Third, it doesn’t even eliminate flicker or tearing, as you’re still not
flipping, and thus cannot get retrace sync on many targets. (Several
targets offer retrace sync only for the flip operation; not as a separate
blocking or polling call.)

And if so why doesn’t SDL support something like this
itself when I ask for double buffering on a system which doesn’t
support hardware flipping?

There are many forms of triple buffering:

    1) "True" triple buffering:
            Three VRAM buffers are arranged in a chain. On each
            flip, the chain is rotated, so that the updated back
            buffer gets displayed. The *previous* display buffer
            becomes the new rendering back buffer, while the
            currunt display buffer ends up being unused.

       Disadvantages:
            Uses more VRAM, and adds one video frame of latency.
            You're still rendering directly into VRAM, which
            means software rendering, and in particular alpha
            blending, is very slow, even on targets that support
            DMA blits from system RAM.

       Advantages:
            The extra back buffer makes it possible to use
            up to 200% of the frame rendering time without
            dropping a video frame. (Of course, you have to
            pay for that the next frame, but 200% is just a
            theoretical value. Sustained full frame rate at
            90% average CPU usage with occasional 150% peaks
            is entirely possible.)

    2) "SemiTriple" (as it's called in Kobo Deluxe):
            Two VRAM pages are set up in a normal double
            buffering chain, and a software or VRAM back buffer
            is set up for rendering. All rendering is done into
            the "extra" buffer. When one frame is ready, any
            modified areas are blitted from the "extra" back
            buffer into the real back buffer, and then the back
            and front VRAM pages are flipped.

       Disadvantages:
            Not a real triple buffering setup; frame rate
            improvement as a result of more realistic demands
            CPU use and timing not possible. "Dirty areas" need
            to be managed for both VRAM pages. Not possible
            without hardware pageflipping.

       Advantages:
            Flicker free, thanks to double buffering. Efficient,
            as a result of only updating the changed areas, and
            doing so directly into the back buffer. Can take
            advantage of DMA blitting from system RAM to VRAM,
            even when software alpha blending and pixel effects
            are used.

    3) Triple buffered scrolling:
            Two VRAM pages are set up in a normal double
            buffered pair. Rendering of sprites and other
            animated effects is done into the back buffer.
            A third "map buffer" is set up, also in VRAM.
            Scrolling background graphics is rendered into this
            buffer, rendering just a small area during each
            video/game frame. When a full frame is completed
            in the "map buffer", this buffer and the "oldest"
            of the other two VRAM buffers are swapped, and the
            background graphics rendering starts over in the
            new map buffer.

       Disadvantages:
            Requires support for non-chained triple buffering.
            (That is, the way DirectX does it is not sufficient.)
            Requires hardware scrolling with accurate per-page
            offset control. (While scrolling the background, the
            front and back pages will contain graphics from
            slightly different map positions.) Figuring out the
            map position before starting to render a new map page
            is tricky, to say the least, especially in games with
            direct "follow mode" scrolling.

       Advantages:
            *Very* efficient full screen scrolling - CPU time
            corresponding to one or two sprites per frame is
            usually enough. (Even a C64 could do it in multicolor
            highres mode, although the buffers alone would consume
            half the memory. :-)

The only new buffering modes of interest as far as I can see would be
"true" triple buffering (for speed, and when there’s retrace sync, for
less dropped frames, better CPU utilization and more realistic scheduling
timing requirements), and “SemiTriple” buffering, for fast software
rendering (in particular alpha blitting) in double buffered hardware
pageflipping modes.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -’

Thanks for that. A lot to consider there.

I don’t want to make my work too difficult so I am not interested in
getting it running fast on all targets, only the important ones. So
what I want to know is which targets support the easy options.

You say:

Not all targets support hardware flipping, so which ones do?

fbdev, GGI, and svgalib. (Support for svgalib h/w flipping was
added lately.) I’ve heard XFree86 DGA 2.0 should support it, but I
don’t think I’ve seen it in action.

RAM->VRAM is slow unless you have DMA transfers, so which ones do?

No Linux drivers, except possibly DirectFB.

It seems like some of the OpenGL drivers can use busmaster DMA for
texture downloading, but that isn’t of much use to SDL 2D.

VRAM->VRAM is slow unless it is h/w accelerated, so which ones are?

Do these depend on the SDL target or the video card?

Practically all video cards with 3D acceleration, and probably some
without, support busmaster DMA, so in most cases, the problem is that
there’s no busmaster support in the drivers.

I think SDL supports DMA system RAM->VRAM and VRAM->VRAM blitting on
nearly all targets that may have them, but you’ll have to ask someone
else for a specific list of targets, platforms, drivers etc.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Wednesday 31 October 2001 23:38, Adam Gates wrote:

Not all targets support hardware flipping, so which ones do?

X11 doesn’t. DirectX does. WinDIB doesn’t. fbcon (i think) does.

Don’t count on having it ever. Lots of games get good framerates without
it (reference Loki’s 2D titles; none of them have hardware flipping under
X11). SDL_Flip() becomes SDL_UpdateRect(0, 0, 0, 0) on unsupported
platforms.

RAM->VRAM is slow unless you have DMA transfers, so which ones do?

The real lesson here is “don’t write to video ram directly”. Most (all?)
targets have accelerated blits if you use BlitSurface or UpdateRect(s) to
get the bits to the screen.

VRAM->VRAM is slow unless it is h/w accelerated, so which ones are?

Again, use blits.

Do these depend on the SDL target or the video card?

Both, but usually you should be more concerned with the target.

–ryan.

Not all targets support hardware flipping, so which ones do?

fbdev, GGI, and svgalib. (Support for svgalib h/w flipping was
added lately.) I’ve heard XFree86 DGA 2.0 should support it, but I
don’t think I’ve seen it in action.

Yes, SDL’s DGA driver does support it.

-Sam Lantinga, Software Engineer, Blizzard Entertainment

[…]

The real lesson here is “don’t write to video ram directly”. Most
(all?) targets have accelerated blits if you use BlitSurface or
UpdateRect(s) to get the bits to the screen.

Well, that depends on what you mean by “accelerated” - the very problem
here is that SDL actually does write to VRAM directly with the CPU, as
the underlying targets don’t support DMA blits from system RAM. (That’s
entirely different from VRAM->VRAM blits, which are supported on many
targets.)

Anyway, not much to do about it in application or SDL code - it’s an all
too common driver “bug”.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Thursday 01 November 2001 19:56, Ryan C. Gordon wrote: