Optimized scrolling

David_Olofson · November 14, 2000, 1:48pm

Hi!

I’ve spent a few hours porting (old crappy) Object Pascal code into C"++" code,
and studying the SDL API. I have to dig deeper into the inner workings of SDL
before I can contribute any real code, but there are some ideas I’d like to
discuss.

First, I’d like to know if there’s any existing subsystem that could be used
(possibly extended), instead of implementing something new. What I have in mind
is basically a more flexible description of the screen than the current screen
surface. As an example, I’ll use the game I’m porting:

   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  |  ___________________________ 
    |                           | |
  | |                           |
    |                           | |
  | |                           |
    |         Main view         | |
  | |                           |
    |                           | |
  | |                           |
    |                           | |
  | |___________________________|
   _|_ _ _ Splitscreen view_ _ _|_|
    |___________________________|

The splitscreen gets it’s data from a single buffer of the same size as the
splitscreen window - no scrolling or double buffering.

The main view has 3 buffers that are 32 pixels wider and 32 pixel taller than
the main window. Hardware scrolling is used to avoid refreshing or scrolling
the entire buffers, and two buffers at a time are used to double buffer the
sprite rendering for flicker and tearing free animation. (However, this doesn’t
work on many modern, broken “VGA” cards - but that’s another story…)

(The next paragrapht is not really important here - I describe it just to
explain how a screen with just two tiles of scroll margin can deal with maps
of unlimited size.)

The third buffer of the main view is used for preparing new buffers for the
scrolling. While the display/back pair of buffers swap away, scrolling for 16
pixels (ie 16 video frames), the third buffer is refreshed a few tiles at a
time, so that it’s ready to replace the oldest buffer in the display/back pair
before it’s hardware scrolling margin is exhausted. Then the cycle starts
over, and another 16 frames later, the second buffer in the pair is replaced.
This way, the entire background scroller takes only about half the time of
rendering one 32x32 sprite, in the original VGA Mode-X implementation.

Now, to reimplement this efficiently, with or w/o hardware scrolling, one would
need

1a) A way to use VRAM buffers bigger than the screen.

2a) Hardware scrolling support

	or

1b) A way to use buffers (mem or VRAM) bigger than the screen.

2b) Optimized VRAM->VRAM scrolling + partial updates.

	and

3) A more evolved abstraction of the display than a single display
   window, to deal with splitscreens and windows.

and 2) are (probably) quite easy; allow buffers to be bigger than the actual
screen, and add XOffset and YOffset in some suitable place. (My game stores h/w
scroll offsets for each buffer, which makes sense considering that they are to
be applied when the buffers are flipped in, NOT when the offsets are changed.
That would correspond to having the scroll offset fields in the SDL_Surface
struct, but then again, the game doesn’t use it’s corresponding struct for
anything but display surfaces…)

As to hardware scrolling; when it’s supported, it’s just a matter of doing it -
the multi-buffering and updating works as usual.

When there is no hardware scrolling, it gets more complicated, though. The
simplest solution is probably to set up VRAM buffers of the same size as the
display, and then do in-place blit scrolls - with VRAM->VRAM DMA when
available. There will also have to be one oversized buffer for each real
display buffer (in VRAM or system memory): These are the buffers that the
application sees, and the buffers where SDL gets the areas that are exposed
when scrolling.

Ok; I can hardly follow the above myself, so here’s some more ASCII art,
describing how an oversized buffer is scrolled and “flipped” in on a target
without hardware scrolling support (only double buffering here):

   ____________        ____________ 
  |            |      |            |
  | Currently  |      |    Back    |
  |  visible   |      |   buffer   |  <== Real display buffers
  |   buffer   |      |            |
  |            |      |            |
  |____________|      |____________|
 ________________    ________________ 
|                |  |                |
|                |  |                |
|   Currently    |  |    Oversize    |
|   "visible"    |  |      back      | <== "virtual" display buffers
|    oversize    |  |     buffer     |
|     buffer     |  |                |
|                |  |                |
|________________|  |________________|

Now, we scroll the back buffer full right. This means that the scroll offset
for the corresponding oversize buffer changes, and that the real display buffer
has to be blit scrolled acordingly:
____________ ____________
| | |## |
| Currently | |## Back |
| visible | |## buffer | <== Real display buffers
| buffer | |## |
| | |## |
|| |##|
________________ ________________
| | | |
| | |++ |
| Currently | |++ Oversize |
| “visible” | |++ back | <== “virtual” display buffers
| oversize | |++ buffer |
| buffer | |++ |
| | |++ |
|| |_|

The ## area is now invalid (as the data that’s supposed to show up there isn’t
in the display buffer), and has to be reconstructed by blitting it in from the
oversize back buffer. (The ++ area is the source area.)

Now, we just do the usual double buffered flip, and there it is!

Finally, this splitscreen thing… Theoretically, the best abstraction I can
see right now is to make every part of the screen a separate Display Surface,
and then use some new structure to define how these relate to the physial
display.

As an example, my game display could be implemented using two SDL (Extended)
Display Surfaces set up like this:

SDL_Surface screen:
w = 352 //Surface size
h = 240

display_w = 320	//Display window size
display_h = 208

display_x = 16	//Offsets for the display window
display_y = 16

buffers = 3	//3 duffers. Does SDL support more than 2, actually...?

SDL_Surface dashboard;
w = 352 //(Must be same - VGA splitscreen does not change pitch)
h = 24

display_w = 320	//Display window size
display_h = 24

display_x = 0	//Offsets for the display window
display_y = 0	//(Cannot scroll, as VGA can only star a split screen at
		// VRAM address 0.)
buffers = 1	//(VGA splitscreen can't be double buffered; See above.)

It may or may not be a good idea to attach the display size and offset stuff
to the surface struct; not sure. I do think that it belongs with the display
buffers though, as the timing is very critical for real hardware scrolling on
some hardware, so the Flip() function must have all info available.

Any ideas?

//David

Mattias_Engdegard · November 15, 2000, 11:36am

I’ve spent a few hours porting (old crappy) Object Pascal code into C"++" code,
and studying the SDL API. I have to dig deeper into the inner workings of SDL
before I can contribute any real code, but there are some ideas I’d like to
discuss.
[snip]

What about simply keeping your oversized buffer(s) in VRAM and blit
the displayed part to the actual screen buffer? I don’t know how long
a full-screen vidmem->vidmem copy takes, but it may be fast enough.

Oversized display buffers could probably be covered in a fairly general
manner; the client should be able to set the size, and ask for them to be
simulated (with either software or hardware surfaces) if they can’t be done
in hardware. Split-screen sounds trickier; does VGA have built-in
support for changing the base address at a certain scan line, or are you
doing it by hand?

I’d like a survey of video capabilities (both hardware and OS drivers) on
SDL’s likely targets — Macs, PCs, workstations, portables, bitmapped
consoles, web pads — before finalizing the design.

We need asynchronous page flips (or offset changes, in case of oversized
screen buffers), and a way to detect if they have occured (either
semi-synchronously or by polling). Wasting time waiting for vsync isn’t
a lot of fun

Darrell_Walisser · November 15, 2000, 3:23pm

Mattias Engdeg?rd wrote:

I’ve spent a few hours porting (old crappy) Object Pascal code into C"++" code,
and studying the SDL API. I have to dig deeper into the inner workings of SDL
before I can contribute any real code, but there are some ideas I’d like to
discuss.
[snip]

What about simply keeping your oversized buffer(s) in VRAM and blit
the displayed part to the actual screen buffer? I don’t know how long
a full-screen vidmem->vidmem copy takes, but it may be fast enough.

Oversized display buffers could probably be covered in a fairly general
manner; the client should be able to set the size, and ask for them to be
simulated (with either software or hardware surfaces) if they can’t be done
in hardware. Split-screen sounds trickier; does VGA have built-in
support for changing the base address at a certain scan line, or are you
doing it by hand?

I’d like a survey of video capabilities (both hardware and OS drivers) on
SDL’s likely targets — Macs, PCs, workstations, portables, bitmapped
consoles, web pads — before finalizing the design.

If a mac video card vendor claims he has “QuickDraw acceleration under OS 9”, then
we can probably:

create hardware surfaces in vram and/or agp
hardware blit vram->vram, and/or agp->vram
hardware color key for vram/agp surfaces

I also know that all OEM video cards can do these things on G3 and later systems
with ATI hardware.

Macs can’t do hardware page flipping at the moment (unless you are using OpenGL).
However, we can create surfaces in VRAM and blit VRAM -> VRAM, which is what I am
using to simulate an actual page flip in the DrawSprocket driver. Since the VRAM
-> VRAM blit is accelerated in hardware, it occurs very fast. I don’t have any
numbers in front of me, but on a crappy RagePro, I can probably do 300 fps at
640x480x32 just doing SDL_Flip() over and over (assuming I don’t wait for vsync).

On AGP equipped Macs, we can allocate in AGP memory, which means we can get just
about any size hardware surface we want. On older macs with Mach64 cards I know we
can get at least as big as the screen, but I don’t know if we can get any bigger.

We need asynchronous page flips (or offset changes, in case of oversized
screen buffers), and a way to detect if they have occured (either
semi-synchronously or by polling). Wasting time waiting for vsync isn’t
a lot of fun

Tell me about it. I am forced to do this to simulate SDL_Flip() with a vram->vram
copy. Unfortunately, no way to get around this.

David_Olofson · November 17, 2000, 5:39am

As long as the blit is fast enough to emulate a flip (ie do it’s job
fast enough to outrun the raster beam), that should work - and it’s
the easiest way.

You could double buffer the screen, of course (and probably should in
any case), but then the whole point is lost anyway - the frame rate
drops, and the hardware scrolling emulation becomes more of a major
PITA than a performance hack… (Hardware scrolling is not nice,
fun, easy or anything like it; it’s just fast when you finally get it
to work. Not a goal in itself, that is. Smooth scrolling is.)

One could argue that there’s not much point in using the oversized
buffer, “hardware scroll emulation” model at all when the blitting is
fast enough for the full screen blit to work reliably… If you can
do that, you might as well blit from something more structured than
an oversived buffer.

How about a set of big VRAM “tiles”, into which you render the
backgrounds? (If at all required - you might just blit normal tiles
directly from a VRAM graphics bank, as long as the overhead per blit
isn’t too high.)On Wed, 15 Nov 2000, Mattias Engdeg?rd wrote:

I’ve spent a few hours porting (old crappy) Object Pascal code into C"++" code,
and studying the SDL API. I have to dig deeper into the inner workings of SDL
before I can contribute any real code, but there are some ideas I’d like to
discuss.
[snip]

What about simply keeping your oversized buffer(s) in VRAM and blit
the displayed part to the actual screen buffer? I don’t know how long
a full-screen vidmem->vidmem copy takes, but it may be fast enough.

____________
|+++----+++|----
|+++----+++|----
|+++----+++|----
|---++++---|++++
|---++++---|++++
|__________|++++
++++----++++----
++++----++++----
++++----++++----

+++/— = tiles
square = screen size

No wrapping problems with that method - when the screen moves over
the right edge, just have the tiles on the left side ready to be used
there. Add more “big tiles” as needed for extra background rendering
time. (The tiles are blitted one by one, so they can be rearranged on
the screen as required.)

Oversized display buffers could probably be covered in a fairly general
manner; the client should be able to set the size, and ask for them to be
simulated (with either software or hardware surfaces) if they can’t be done
in hardware.

Split-screen sounds trickier; does VGA have built-in
support for changing the base address at a certain scan line, or are you
doing it by hand?

It’s a very simple hardware feature of the kind “reset VRAM pointer
and fine scroll registers at raster line X.” Not very flexible, but
at least it saves you from emulating a C64 style raster IRQ. (You
could program the VIC to IRQ at a specific raster line - very handy.)

Anyway, if it’s not supported, make the smaller “screen” a normal
blit overlay on top of the bigger one. (In my example, that would
mean that the dashboard is basically a big sprite that is kept in a
fixed position in realtion to the display window - pretty much like
when billboards are used as dashboards under OpenGL and Direct3D.)

If we’re actually dealing with a real splitscreen (ie two equally
sized scrolling playfields), we’re in trouble… Using hardware
scrolling for one if supported is faster, but in all other cases, I
think blitting is the only option that makes sense.

I’d like a survey of video capabilities (both hardware and OS drivers) on
SDL’s likely targets — Macs, PCs, workstations, portables, bitmapped
consoles, web pads — before finalizing the design.

That would be very interesting. It’s hard enough to desing even if
you can rule out the most exotic configurations…

We need asynchronous page flips (or offset changes, in case of oversized
screen buffers), and a way to detect if they have occured (either
semi-synchronously or by polling). Wasting time waiting for vsync isn’t
a lot of fun

No, there are better things to do, indeed… In theory (and if you can
hack the underlying drivers), it’s just a matter of using an IRQ.
(Although I’ve heard scary rumors about modern cards not having vsync
IRQ - do they have other ways to do it, such as a "command completed"
IRQ that can be used with a pageflip command…?)

For cards w/o IRQs, it’s still very simple - if you’re running
RTLinux… You basically need a timer interrupt with accurate timing
enough to hit the blanking period, and adjust the timer rate to it.
(Or rather, keep track of the video frame rate, so that you can
program the timer to interrupt slightly before the next blanking
interval you want to measure, and that way keep track of the video
frame rate, etc.

Note: You don’t have to hit all blanking intervals; just a few per
second, to keep track of the raster in relation to the timer used as
reference. That is, it might work with a standard OS, an accurate
timer (TSC on Pentium+) and some smart filtering, but I haven’t tried
it.

Anyway… Sure, it would be possible to implement hardware scrolling
support + emulation by simply supporting oversized buffers + offsets,
but the more I think about it, the more it looks like an incredible
mess for the game programmer, just to enable a significant speed-up
on a few targets (how many actually have a usable h/w scrolling
implementation nowadays), and a very slight speed-up for most other
targets. (It might even slow some targets down, but I doubt that.)

Maybe h/w scrolling support is just too low a level to make much
sense? Great feature if

* it's supported in hardware (perhaps even with
   just "partial" support + emulation), AND

* you need only single level scrolling WITH

* "a few" sprites,

but if you you need

* parallax scrolling OR

* loads of sprites OR

* don't have hardware scrolling at all

you’re better of doing it in other ways. That is, games will have to
support both the hardware scrolling API, and other models when this
isn’t implemented efficiently enough.

I’m not saying that hardware scrolling should not be supported; just
that it doesn’t solve all problems, maybe not even a great deal of
them. I’m still thinking…

Is a higher level scrolling engine (with parallax support, sprites
and other stuff, perhaps; not too high level to be generic, though)
be too high level for SDL? Actually I’m thinking of something more
like a framework than a full engine; a system that figures out how
and where to render graphics to perform the scrolling in the most
efficient way, but leaves the actual rendering to callbacks. The
"engine" would ask the game callbacks to fill areas with graphics,
specifying world coordinates acording to the display model it has
been given, and then puts these on-screen, using hardware scrolling +
CPU, h/w blitting, accelerated OpenGL or whatever works best on the
current platform.

Sounds both simple and complicated - haven’t made up my mind yet!

//David

Mattias_Engdegard · November 17, 2000, 1:33pm

You could double buffer the screen, of course (and probably should in
any case), but then the whole point is lost anyway - the frame rate
drops, and the hardware scrolling emulation becomes more of a major
PITA than a performance hack…

Of course, but you were talking about porting a game already written
to SDL. It might the best way of doing so

It’s a very simple hardware feature of the kind “reset VRAM pointer
and fine scroll registers at raster line X.” Not very flexible, but
at least it saves you from emulating a C64 style raster IRQ. (You
could program the VIC to IRQ at a specific raster line - very handy.)

Yes, I know about the VIC-II. Hardware split screen seems to be
restricted to certain PC video boards, and unless we can make an
abstraction that’s good enough it’s doubtful that it’s worth the
trouble.

Anyway, if it’s not supported, make the smaller “screen” a normal
blit overlay on top of the bigger one. (In my example, that would
mean that the dashboard is basically a big sprite that is kept in a
fixed position in realtion to the display window - pretty much like
when billboards are used as dashboards under OpenGL and Direct3D.)

Yes, but doing it transparently to the client would require copying
the underlying stuff to an offscreen buffer first… It might be
feasible, but I’m not quite convinced yet. Doing it manually has
the advantage of allowing a split along any dimension, not just
vertically

No, there are better things to do, indeed… In theory (and if you can
hack the underlying drivers), it’s just a matter of using an IRQ.
(Although I’ve heard scary rumors about modern cards not having vsync
IRQ - do they have other ways to do it, such as a "command completed"
IRQ that can be used with a pageflip command…?)

Note that we don’t need accurate or low-latency vsync IRQs as long as
we can tell the hardware to “wait until vsync, flip, and then please
send an IRQ when you are done”. Since any interrupt will have to go
through a kernel driver (in Unix; similar mechanisms in other
systems), we can’t rely on it for synchronized flipping anyway. But
that doesn’t matter, since we only need the vsync notification for
more efficient game timing (i.e. to schedule slop work in the time
slot, audio mixing and AI comes to mind), and for triple buffering.

In fact, it might be possible to get away without vsync notification
in many circumstances as long as we can poll

I’m not saying that hardware scrolling should not be supported; just
that it doesn’t solve all problems, maybe not even a great deal of
them. I’m still thinking…

I think allowing the client to request a virtual screen buffer larger
than the visible size, and changing the window offset synchronized
with vertical refresh, is all we need. The API would be simple,
it can be done on many targets (i.e. it’s not necessarily PC/VGA
specific), and for software screen surfaces. When and how to use the
facility is up to the programmer

Is a higher level scrolling engine (with parallax support, sprites
and other stuff, perhaps; not too high level to be generic, though)
be too high level for SDL?

Whether something should be in SDL or not is not a question of
subjective “level” per se, but whether it is needed to enable the
programmer to use the capabilities of the hardware in a platform-
independent way

Actually I’m thinking of something more
like a framework than a full engine; a system that figures out how
and where to render graphics to perform the scrolling in the most
efficient way, but leaves the actual rendering to callbacks.

I don’t think this kind of DWIMmery is within the scope of SDL

David_Olofson · November 19, 2000, 12:18pm

You could double buffer the screen, of course (and probably should in
any case), but then the whole point is lost anyway - the frame rate
drops, and the hardware scrolling emulation becomes more of a major
PITA than a performance hack…

Of course, but you were talking about porting a game already written
to SDL. It might the best way of doing so

Possibly - but in that particular case, I might just as well redesign
the lower levels of the engine anyway, as Object Pascal, Intel
syntax asm, VGA Mode-X and other nasty things are involved.

The idea of keeping the original mode of operation was that it might
actually speed things up on some modern hardware as well, but now I’m
becoming more and more convinced that it would make more sense to
build any new API on a higher level, such as extending the sprite
stuff into a full blown parallax scrolling + sprites engine, with
pixel effect hooks etc.

It’s a very simple hardware feature of the kind “reset VRAM pointer
and fine scroll registers at raster line X.” Not very flexible, but
at least it saves you from emulating a C64 style raster IRQ. (You
could program the VIC to IRQ at a specific raster line - very handy.)

Yes, I know about the VIC-II. Hardware split screen seems to be
restricted to certain PC video boards, and unless we can make an
abstraction that’s good enough it’s doubtful that it’s worth the
trouble.

It should work on any VGA compatible card, but it might not work
in SVGA modes, and it might be partially broken on some cards.

The chance of getting hardware scrolling to work properly is much
better, but even that feature is somewhat broken on many modern
cards. (They have different scroll + address latch timing from the
real VGA, and thus, hardware scrolling games will flicker and jerk.
Meanwhile, it seems to work fine for oversized desktops and the
like…)

Either case - splitscreen is probably not worth the effort anyway, as
most people probably want more advanced stuff (translucent panels or
text + icon overlays) where splitscreens were used in the 8 and 16
bit eras.

As to real splitscreens of the kind seen on C64, Amiga and game
consoles, VGA can’t do that properly in hardware anyway. No hardware
scrolling for the lower half of the screen…

Anyway, if it’s not supported, make the smaller “screen” a normal
blit overlay on top of the bigger one. (In my example, that would
mean that the dashboard is basically a big sprite that is kept in a
fixed position in realtion to the display window - pretty much like
when billboards are used as dashboards under OpenGL and Direct3D.)

Yes, but doing it transparently to the client would require copying
the underlying stuff to an offscreen buffer first… It might be
feasible, but I’m not quite convinced yet. Doing it manually has
the advantage of allowing a split along any dimension, not just
vertically

Yes, and that’s probably the way to go here. Real “hardware compatible
hardware emulation” doesn’t make much sense except when it actually
results in a nice API, and good ways of optimizing software/different
hardware implementations.

Then again, if we were talking about display windows with scrolling
support, then the backsaving and whatnot would fit logically into the
implementations, in similar ways to the “smart refresh” that some
widowing environments employ when moving windows around… (AmigaDOS
had it; not sure about modern systems.)

No, there are better things to do, indeed… In theory (and if you can
hack the underlying drivers), it’s just a matter of using an IRQ.
(Although I’ve heard scary rumors about modern cards not having vsync
IRQ - do they have other ways to do it, such as a "command completed"
IRQ that can be used with a pageflip command…?)

Note that we don’t need accurate or low-latency vsync IRQs as long as
we can tell the hardware to “wait until vsync, flip, and then please
send an IRQ when you are done”. Since any interrupt will have to go
through a kernel driver (in Unix; similar mechanisms in other
systems), we can’t rely on it for synchronized flipping anyway. But
that doesn’t matter, since we only need the vsync notification for
more efficient game timing (i.e. to schedule slop work in the time
slot, audio mixing and AI comes to mind), and for triple buffering.

In fact, it might be possible to get away without vsync notification
in many circumstances as long as we can poll

Yes, but that’s where we need to either burn some CPU time
busy-waitintg, or use a timer driven low-latency thread to do the
polling just around the time when the blanking bit is expected to
change.

Then again, we only need enough “hits” to correctly detect and lock
to the exact refresh rate - then we can use the timebase used for
the measurement to find out exactly when “now” is in relation to the
raster. Add measuring the average time it takes the engine to produce
one frame, and we can easilly calculate for what world time the next
frame should be rendered.

I’m not saying that hardware scrolling should not be supported; just
that it doesn’t solve all problems, maybe not even a great deal of
them. I’m still thinking…

I think allowing the client to request a virtual screen buffer larger
than the visible size, and changing the window offset synchronized
with vertical refresh, is all we need. The API would be simple,
it can be done on many targets (i.e. it’s not necessarily PC/VGA
specific), and for software screen surfaces. When and how to use the
facility is up to the programmer

Yep.

Is a higher level scrolling engine (with parallax support, sprites
and other stuff, perhaps; not too high level to be generic, though)
be too high level for SDL?

Whether something should be in SDL or not is not a question of
subjective “level” per se, but whether it is needed to enable the
programmer to use the capabilities of the hardware in a platform-
independent way

Well, that’s the question here. Hardware scrolling is (was) one
feature that many games use to do full screen scrolling, but it’s not
the key to scrolling. Actually, it’s usefullness is anything from
"low" to “nonexistant” with parallax scrolling and/or lots of
sprites.

Further, I know many game consoles and arcade machines have h/w
parallax scrolling, but no workstations, so it’s probably a better
idea to optimize the API mainly for the method the latter are using,
which is blitting… And that kind of rendering is just too different
from what you do in a system with h/w scrolling, so you probably have
to go to a much higher level to get a useful API together.

Actually I’m thinking of something more
like a framework than a full engine; a system that figures out how
and where to render graphics to perform the scrolling in the most
efficient way, but leaves the actual rendering to callbacks.

I don’t think this kind of DWIMmery is within the scope of SDL

Especially not if it doesn’t make these features available to at
least two major kinds of hardware.

That is, it’s OpenGL for anything but the simplest 2D games, and
hardware scrolling is just a feature that may speed up some of the
slightly more advanced games on some hardware.

Putting it that way, there’s only one thing that could motivate the
existence of hardware scrolling support/emulation: The lack of a more
efficient way to implement it on the majority of hardware.

That is, if there’s no other way to have VRAM surfaces bigger than
the screen, and emulate h/w scrolling that way, doing partial blit
updates (+ pageflipping if available), this feature is needed.

//David