Fast scrolling in sdl

Jeremy_Gregorio · November 2, 2000, 10:54pm

A while back I came across a fast scroll trick on a vga game
programming site. Here’s the gist of if:

Draw your tiles to screen
Save the spot on screen where each sprite is to be drawn. In other words
make an eraser for each sprite.
Draw the sprites.
Update the screen.
Erase the sprites
Shift the tiles on screen around to simulate scrolling. Draw newly visable
tiles where needed.
Repeat.

The point of all this is to avoid the huge number of blits needed to fill a
screen with tiles (imagine two layers of 20x20 tiles or 800 blits to draw the
screen). This way at most you need to blit two sides of the screen with tiles
or about 20+20 or 40 blits (plus the blits to make and use sprite erasers). The
real problem is step number 6, where the tiles allready on screen are moved to
where they would be after a scroll.

First off there's two basic roads to take: use SDL_BlitSurface() to

move the data or lock the surface and use some sort of per pixel or byte based
method. In Win32 DirectX when I was playing around with this algorythm I used a
blit because I was practically garanteed a fast hardware blit would take place.
Under SDL I’m much less sure of that so I’m thinking using the lock surface
approach might be best. What’s great about that is I can use fast memory
functions. I’m thinking I could use a memcpy(). I’ve seen the shift operator
used in old VGA programming and if I could figure out how to do that it’d be
really fast moving the data. But then again that means a lot of
SurfaceLock/Unlocks in my code. One for every scroll operation in fact.

I guess what I'm really asking for is advice on which route I should

take (or else a better scroll algorythm, I’m always open to suggestions). I’m
really curious about the fastest way to shift data around the pixel array from
my call to SDL_LockSurface.

Anyway if you made it this far thanks for your time and thanks in

advance for any suggestions. :-)–
Jeremy Gregorio
jgreg at azstarnet.com

Mattias_Engdegard · November 3, 2000, 10:56am

Draw your tiles to screen

Save the spot on screen where each sprite is to be drawn. In other words
make an eraser for each sprite.

Draw the sprites.

Update the screen.

Erase the sprites

Shift the tiles on screen around to simulate scrolling. Draw newly visable
tiles where needed.

Repeat.

I have done almost exactly the same thing in Xlib, and it works quite
well. The key was custom-written blitter functions that save the
background on a stack, and using a back-buffer slightly larger than
the visible screen.

SDL cannot (yet) use the oversized back-buffer strategy, although I’m
mulling over a way to squeeze it in.

If you have hardware surfaces, you want to store all sprites and background
tiles in video memory. Scrolling can then be accomplished by re-painting
the entire screen each frame - since the blits are all accelerated, this
is fast. This allows page flipping to be used (gets rid of tearing
artifacts), and gives you as many layers as your video card can handle.

If all you have is software surfaces (or if you are doing enough pixel
and/or alpha magic to want your screen buffer in software), then you
need to copy the entire scrolling window to the screen each time, and
this is likely to dominate the update costs. There is a variety of scrolling
schemes that can be used here (see recent thread about it)

Holger_Schemel · November 3, 2000, 7:24pm

Mattias Engdeg?rd wrote:

Draw your tiles to screen

Save the spot on screen where each sprite is to be drawn. In other words
make an eraser for each sprite.

Draw the sprites.

Update the screen.

Erase the sprites

Shift the tiles on screen around to simulate scrolling. Draw newly visable
tiles where needed.

Repeat.

I have done almost exactly the same thing in Xlib, and it works quite
well. The key was custom-written blitter functions that save the
background on a stack, and using a back-buffer slightly larger than
the visible screen.

I think I did something similar in “Rocks’n’Diamonds”, which uses
soft-scrolling with 50 fps in Xlib by just putting all tiles into
Pixmaps, building the playfield (which is one tile larger in each
direction than the visible area) and just blit the whole thing to
the X11 window. I simply used XCopyArea for blitting, and did not
need any XImages, so everything could be done at the X11 server.

This works quite fast (50 fps on AMD K2-200 + ATI RagePro).

After porting the game to SDL, I had to see that I only got about
20-30 fps. The main reason is the additional full-window blit:
I cannot blit the oversized back-buffer (the window-sized visible
area in it) directly to the X11 window, but first have to blit to
the SDL screen surface (one full-window blit) and then have to use
UpdateRect to copy the SDL screen surface to the actual X11 window
(second full-window blit).

This is of course much slower than the direct blit.

This bothered me so much that I looked into the SDL code to try to
add a modified UpdateRect function that could blit a window-sized
rectangle from a back-buffer that is larger than the window itself,
but it looked like it would not be that easy…

SDL cannot (yet) use the oversized back-buffer strategy, although I’m
mulling over a way to squeeze it in.

That would really be great!!!

This way it should be possible to get the full Xlib speed out of SDL
when only using fixed tiles that did not change (so it could use
Pixmaps for every surface). The only thing besides plain XCopyArea
I would need is blitting through masks, which could be done with the
XSetClipMask/XSetClipOrigin functions when calling SDL_SetColorKey.

If you have hardware surfaces, you want to store all sprites and background
tiles in video memory. Scrolling can then be accomplished by re-painting
the entire screen each frame - since the blits are all accelerated, this
is fast. This allows page flipping to be used (gets rid of tearing
artifacts), and gives you as many layers as your video card can handle.

This would be the alternative now – if hardware surfaces are available.
But if I use window mode instead of fullscreen/page-flipping, I would
still have the additional blit compared to the oversized back-buffer
way.

Best regards,
Holger
P.S.:
I think I should release the SDL versions of “Rocks’n’Diamonds” and
"Mirror Magic" soon on the SDL games page, although there are still
some quirks with sound (SDL_mixer). The native Xlib versions can be
found on http://www.artsoft.org, if anybody is interested.–
holger.schemel at mediaways.net ++49 +5241 80
1438

Mattias_Engdegard · November 3, 2000, 8:24pm

SDL cannot (yet) use the oversized back-buffer strategy, although I’m
mulling over a way to squeeze it in.

That would really be great!!!

We need
a) a way to tell SDL that we want a bigger buffer, and how big,
b) a way to set the offset within the buffer to be used when updating,
c) a new interface that conveys these tidbits of information to the drivers
d) think of a reasonable way to implement it for other targets than X11

For a), we can either add a new call to be used in place of SetVideoMode()
with the buffer size as extra parameters, or (perhaps better) a call
that just sets the buffer size, preferably to be called before SetVideoMode().

For b), either add a call to set the current offset, or one that augments
SDL_UpdateRects() with two more arguments.

Oversized buffers can be used for any SWSURFACE display target, not just
X11, in a straightforward way. For hardware screen surfaces we could
allocate one out of video memory, but it is perhaps better to disallow it
for hardware surfaces for the time being, since other update strategies
often are appropriate than for software surfaces.

Suggestions:

SDL_SetBackBuffer(int w, int h) - sets the back buffer size
SDL_SetBufferOffset(int x, int h) - sets the current buffer offset

This way it should be possible to get the full Xlib speed out of SDL
when only using fixed tiles that did not change (so it could use
Pixmaps for every surface). The only thing besides plain XCopyArea
I would need is blitting through masks, which could be done with the
XSetClipMask/XSetClipOrigin functions when calling SDL_SetColorKey.

Masked blits have historically been slow in X11 but I know there exist
consumer-level hardware that can accelerate them, so they might become
faster in the future

I think I should release the SDL versions of “Rocks’n’Diamonds” and
"Mirror Magic" soon on the SDL games page, although there are still
some quirks with sound (SDL_mixer). The native Xlib versions can be
found on http://www.artsoft.org, if anybody is interested.

Do so! I remember having a lot of fun playing Mirror Magic, but some of
the levels were quite tricky. I did finish them all, though

Garrett · November 3, 2000, 8:48pm

You probably want to use a time difference in your scrolling to make
your scrolling smoother and look faster, rather than moving a
constant amount of pixels when the user clicks an arrow key or something.
So you would do something like, get the last time it took to draw the last
frame, get the difference between the time when the last frame was drawn
and the current time. Multiply that by some constant. This way if one
frame takes a long to to drawn, there will be a longer amount of time
between the frames and the screen will scroll a larger amount. The shorter
the time, the less it will scroll. This way the user will scroll around
the map at the same speed no matter what framerate you are getting.

http://www.mongeese.orgOn 3 Nov 2000, Holger Schemel wrote:

Mattias Engdeg?rd wrote:

Draw your tiles to screen

Save the spot on screen where each sprite is to be drawn. In other words
make an eraser for each sprite.

Draw the sprites.

Update the screen.

Erase the sprites

Shift the tiles on screen around to simulate scrolling. Draw newly visable
tiles where needed.

Repeat.

Andreas_Podgurski · November 3, 2000, 9:43pm

You probably want to use a time difference in your scrolling to make
your scrolling smoother and look faster, rather than moving a
constant amount of pixels when the user clicks an arrow key or something.
So you would do something like, get the last time it took to draw the last
frame, get the difference between the time when the last frame was drawn
and the current time. Multiply that by some constant. This way if one
frame takes a long to to drawn, there will be a longer amount of time
between the frames and the screen will scroll a larger amount. The shorter
the time, the less it will scroll. This way the user will scroll around
the map at the same speed no matter what framerate you are getting.
Woh, but beware, this can result in a rather unsmooth scrolling. This is ok,
if you have a strategy like game, just like rock’n’diamonds, c&c or baldur’s
gate. But if you have a constantly scrolling surface, like in action games,
this can tire your eyes heavily…

Regards,
Andreas Podgurski

David_Olofson · November 3, 2000, 8:26pm

Thu, 02 Nov 2000 Jeremy Gregorio wrote:

A while back I came across a fast scroll trick on a vga game
programming site. Here’s the gist of if:

Draw your tiles to screen

Save the spot on screen where each sprite is to be drawn. In other words
make an eraser for each sprite.

Don’t do this with the CPU on modern cards! Reading from VRAM is usually
painfully slow…

Draw the sprites.

Update the screen.

Erase the sprites

…by overdrawing the areas with tiles from the map. This works with any method
(OpenGL, Direct3D DM, CPU direct access, 2D acceleration…), even if it
doesn’t support copying screen areas into off-screen buffers. (Some 3D
accelerators won’t do that, and mixing 2D and 3D acceleration is usually a
very bad idea.)

Shift the tiles on screen around to simulate scrolling. Draw newly visable
tiles where needed.

Once upon a time, there was a fairly standard feature called “hardware
scrolling”… sigh

Anyway, I used that in a VGA Mode-X game engine with a special form of tripple
buffering; while two of the buffers were used as display and sprite rendering
surfaces (to eliminate flicker), the third buffer was updated in the
background, a few tiles at a time. The two buffers in the display loop were
horizontally offset by 16 pixels (total scrolling stroke: 32 pixels), and when
the oldest one hit the border, it was replaced with the third buffer, which by
that time contained a new background, scrolled 32 pixels in relation to the old
one. After scrolling another 16 pixels, the other “in-loop” buffer was
replaced, and so on.

The CPU usage per video frame when scrolling 60 pixels/s (full frame rate at
320x232) was about the same as for painting a single 32x32 pixel sprite.

(Note: This may seem like an awfully complicated way to do it, but my absolute
requirement was that the game should run at full frame rate on “any” machine -
this was back when a 486-66 was a pretty hot machine, and some machines came
with VGA cards that couldn’t transfer more than some 40 full frames over the
bus, no matter what. Besides, I came right from the Amiga, where this kind of
solutions were the only way to do full frame rate scrolling. The blitter just
wasn’t fast enough for full screen blitting with acceptable color deepth…)

Repeat.

The point of all this is to avoid the huge number of blits needed to fill a
screen with tiles (imagine two layers of 20x20 tiles or 800 blits to draw the
screen).

(If the two layers are parallax scrolling, you’re out of luck!

This way at most you need to blit two sides of the screen with tiles
or about 20+20 or 40 blits (plus the blits to make and use sprite erasers).

Only the “use” part of the sprite erasers take significant time when
reconstructing from the map. OTOH, if there are at least three layers of tiles
on average, or some other factors that make map rendering heavy, the background
copying method might pay off - but do remember that modern video cards aren’t
designed for video->sysram transfers…

The
real problem is step number 6, where the tiles allready on screen are moved to
where they would be after a scroll.

First off there’s two basic roads to take: use SDL_BlitSurface() to
move the data or lock the surface and use some sort of per pixel or byte based
method. In Win32 DirectX when I was playing around with this algorythm I used a
blit because I was practically garanteed a fast hardware blit would take place.
Under SDL I’m much less sure of that so I’m thinking using the lock surface
approach might be best. What’s great about that is I can use fast memory
functions. I’m thinking I could use a memcpy().

I’ve actually tried reading from various “modern” cards (S3, Permedia 2 and
other common chipsets), and I can only confirm what I’ve heard from quite a
few game programmers. All of them are several times faster on CPU writes than
they are on reads, regardless of word size and access pattern. Now, having
experienced that the write speed is already a problem, this doesn’t look like
a good idea to me…

I’ve seen the shift operator
used in old VGA programming and if I could figure out how to do that it’d be
really fast moving the data.

One may think that shifting is pointless with the modern packed pixel modes,
but unfortunately, CPUs are not that good at grouping sub word sized accesses
at all times. MMX is an example of that, so dusting of those old shifting
tricks might be a good idea.

Anyway, the problem with reading VRAM indicates that you should be
copying/shifting from sysram to VRAM if you’re using using the CPU at all.

But then again that means a lot of
SurfaceLock/Unlocks in my code. One for every scroll operation in fact.

Well, you can’t ever get lower than that without hardware scrolling… (And you
probably want at least some of the sprites to update every frame anyway.) One
lock/unlock cycle per video frame should hardly be a performance problem.

I guess what I’m really asking for is advice on which route I should
take (or else a better scroll algorythm, I’m always open to suggestions). I’m
really curious about the fastest way to shift data around the pixel array from
my call to SDL_LockSurface.

The route I’m going to take next time I get around to actually hack something
games related is using 3D acceleration for 2D graphics. This seems to be the
most reliable way to get hardware acceleration, and besides, you get some
bonuses, such as alpha blending and interpolated scaling + rotating, almost for
free. “Everyone’s doing it now,” and after looking closer at some 3D games
running on different 3D cards, I’m becoming less worried about the lack of
detailed control you get compared to software rendering. The interpolation
blurring effect is hardly visible on anything like decent resolutions on a PC
screen, and sharp, pixel size details are not going to look good in a shooter
at 1024x768 anyway, so that argument for software rendering is pretty much void
by now.

So, next I’ll just get this G400 MAX to accelerate 3D on XFree86 4.0.1… heh

David Olofson
Programmer
Reologica Instruments AB
david.olofson at reologica.se

David_Olofson · November 3, 2000, 10:58pm

Fri, 03 Nov 2000 Mattias Engdeg?rd wrote:

Draw your tiles to screen

Save the spot on screen where each sprite is to be drawn. In other words
make an eraser for each sprite.

Draw the sprites.

Update the screen.

Erase the sprites

Shift the tiles on screen around to simulate scrolling. Draw newly visable
tiles where needed.

Repeat.

I have done almost exactly the same thing in Xlib, and it works quite
well. The key was custom-written blitter functions that save the
background on a stack, and using a back-buffer slightly larger than
the visible screen.

SDL cannot (yet) use the oversized back-buffer strategy, although I’m
mulling over a way to squeeze it in.

How about an abstraction that looks somewhat like what you see on displays with
hardware scrolling support; ie all video buffers are bigger than the display
window? Depending on what’s available and what gives the best performance, one
could chose between

1) Keeping all buffers that are visible to the application in memory or
   VRAM, and blitting the currently visible part of the currently
   flipped in buffer to the physical display window. That is, emulated
   hardware scrolling + multiple buffers.

2) Actually setting up multiple VRAM buffers that are bigger than the
   display window, and then use real hardware scrolling + pageflipping.

3) Set up one oversized, hardware scrolled buffer in VRAM, and then
   keep the buffers (same size) that are visible to the application in
   system memory, doing real hardware scrolling + "edge updates" when
   the application changes the hardware scrolling offsets. Flipping is
   done the usual (painful) way; by blitting from sysram to VRAM...
   (You don't have to blit the parts that are outside the display
   window, though! Just make sure that applications don't assume
   anything about this.)

4) Like 3, but give the VRAM buffer the same size as the display
   window (or as close as possible - some video cards have strange row
   length limitations). Now, do the hardware scrolling as usual (by
   changing the VRAM display start address + shift regs settings on
   some hardware), and fix the wrapped/rotated areas by doing "edge
   updates", as in method 3). Flipping is like 3), although obviously
   you *can't* copy an entire buffer to VRAM, as the VRAM buffer is
   smaller! :-)

5) Like 4), but use a real VRAM buffer for each application accessible
   (sysRAM) buffer, so that you can do real hardware page flipping.
   Hardware scrolling is still done exactly as in 4). (Wrapping
   hardware scrolling + edge updates.)

NOTE: When doing wrapping hardware scrolling (ie without an oversized video
buffer), you have to keep in mind that some kinds of hardware (including VGA
cards) don’t have a standard way of dealing with the video output pointer
running off the end of the VRAM area! Some wrap to VRAM address 0x00000000,
while others just stop, show garbage, or start doing other weird things.

This suggests that the safest and easiest way is to simply avoid “virtual video
buffers” bigger than methods 3), 4) and 5) can deal with (application decision

it should be possible to ask SDL which method will be used with certain
buffer sizes and counts), and/or automatically use the best alternative that
doesn’t have this limitation. (Try method 4), 3) and finally 1) if there still
isn’t enough VRAM for the requested buffer height.)

There are probably a few more ways to do it, but this should probably indicate
what the API should look like in order to allow the best implementation of
"hardware scrolling style" optimizations on any hardware.

If you have hardware surfaces, you want to store all sprites and background
tiles in video memory. Scrolling can then be accomplished by re-painting
the entire screen each frame - since the blits are all accelerated, this
is fast. This allows page flipping to be used (gets rid of tearing
artifacts), and gives you as many layers as your video card can handle.

Yep. I think any video card that you would consider putting in a game machine
(probably more like “any machine”) should be fast enough to do this. Even old
"el cheapo" S3 cards are amazingly fast as long as you stick with VRAM and
blitting operations that they actually do accelerate.

Heck, the original Amiga was fast enough for fullscreen scrolling with lots
of sprites it full frame rate, using some hardware scrolling tricks, and it’s
blitter was dog slow compared to any recent PC graphics accelerator, even
considering that the latter ones have to deal with several times heavier
graphics data. (The Amiga games used only 4-6 bits per pixel, and usually a
resolution of 320x200 @ 60 Hz [NTSC] or 320x256 @ 50 Hz [PAL]. The blitter was
barely fast enough to blit a full 320x200 x 4 bit buffer per frame, IIRC.)

If all you have is software surfaces (or if you are doing enough pixel
and/or alpha magic to want your screen buffer in software), then you
need to copy the entire scrolling window to the screen each time, and
this is likely to dominate the update costs.

Indeed… That’s the major problem I’ve been having so far; not being able to
blit a full 640x480x16 bit screen and stay above the standard 60 Hz refresh
rate. No problem with DirectX, though…

Anyway, I’m playing with the MTRRs on the P-II 400, and with svgalib. I’ll
report any interesting results.

There is a variety of scrolling
schemes that can be used here (see recent thread about it)

Unless I’m missing something, no trick can help if the final blit/flip to screen
is the problem.

David Olofson
Programmer
Reologica Instruments AB
david.olofson at reologica.se

Mattias_Engdegard · November 5, 2000, 11:19pm

[ several possible or less possible scrolling methods elided ]

The challenge is devising an API general enough to encompass reasonable
solutions on each hardware platform. Scrolling games are badly hit by the
lack of synchronized screen updates, and since scrolling shootemups are the
only true category of software really worth optimizing for, this is quite
important.

Since useful hardware-accelerated 2d alpha-combining operations seem to be
very rare, it will also be important to make the transport to video memory
as fast as possible.

Unless I’m missing something, no trick can help if the final blit/flip to screen
is the problem.

I believe a DMA solution would be able to transfer at least one measly
screenful each frame, and this is why I would like to solve that problem

Andreas_Podgurski · November 6, 2000, 2:36am

Hi!
Beside all other problems, my project now stands before the task
to render the first opengl primitives to the screen. In the german
documentation, I read, that it is not possible to use normal blitting
functions with opengl, but in the library docs, there is a flag called
SDL_OPENGLBLIT, which should make exactly that possible. But
after setting this flag, my application quits at the very beginning of
the program.

Can I use OpenGL with normal surface operations or not?
What consequences does it take to use OpenGL?
Is OpenGL “simply” rendered on the screensurface or am I
limited to draw that stuff first?
Why isn’t OpenGL output to offscreensurface possible? (Asked
that once before, but got no sufficent answer…)

Thanks in advance,
Andreas Podgurski

David_Olofson · November 6, 2000, 3:06pm

Fri, 03 Nov 2000 Holger Schemel wrote:

Mattias Engdeg?rd wrote:

Draw your tiles to screen

Save the spot on screen where each sprite is to be drawn. In other words
make an eraser for each sprite.

Draw the sprites.

Update the screen.

Erase the sprites

Shift the tiles on screen around to simulate scrolling. Draw newly visable
tiles where needed.

Repeat.

I have done almost exactly the same thing in Xlib, and it works quite
well. The key was custom-written blitter functions that save the
background on a stack, and using a back-buffer slightly larger than
the visible screen.

I think I did something similar in “Rocks’n’Diamonds”, which uses
soft-scrolling with 50 fps in Xlib by just putting all tiles into
Pixmaps, building the playfield (which is one tile larger in each
direction than the visible area) and just blit the whole thing to
the X11 window. I simply used XCopyArea for blitting, and did not
need any XImages, so everything could be done at the X11 server.

This works quite fast (50 fps on AMD K2-200 + ATI RagePro).

After porting the game to SDL, I had to see that I only got about
20-30 fps. The main reason is the additional full-window blit:
I cannot blit the oversized back-buffer (the window-sized visible
area in it) directly to the X11 window, but first have to blit to
the SDL screen surface (one full-window blit) and then have to use
UpdateRect to copy the SDL screen surface to the actual X11 window
(second full-window blit).

This is of course much slower than the direct blit.

Another reason to abstract this kind of game display systems in the API, for
automatic optimization for different targets, perhaps? Or rather; this
illustrates one of the aspects that such an extension has to deal with.

David Olofson
Programmer
Reologica Instruments AB
david.olofson at reologica.se

David_Olofson · November 6, 2000, 3:54pm

Fri, 03 Nov 2000 Andreas Podgurski wrote:

You probably want to use a time difference in your scrolling to make
your scrolling smoother and look faster, rather than moving a
constant amount of pixels when the user clicks an arrow key or something.
So you would do something like, get the last time it took to draw the last
frame, get the difference between the time when the last frame was drawn
and the current time. Multiply that by some constant. This way if one
frame takes a long to to drawn, there will be a longer amount of time
between the frames and the screen will scroll a larger amount. The shorter
the time, the less it will scroll. This way the user will scroll around
the map at the same speed no matter what framerate you are getting.
Woh, but beware, this can result in a rather unsmooth scrolling. This is ok,
if you have a strategy like game, just like rock’n’diamonds, c&c or baldur’s
gate. But if you have a constantly scrolling surface, like in action games,
this can tire your eyes heavily…

Exactly. So, it’s all about very careful utilization of the actually achieved
frame rate. (Ie calculate as exact positions you can for every object and frame;
no way you’ll get away with a non-interpolated physics engine running
"somewhere around the video frame rate"!)

Sub pixel accurate object positioning and scrolling would practically
eliminate any remaining unsmoothness regardless of scrolling speed/video
refresh rate ratio, but that’s pretty hard to do without HW acceleration…
(That’s why I’m becoming more and more convinced that 3D acceleration is the
acceleration method to use for games, regardless of projection style.)

David Olofson
Programmer
Reologica Instruments AB
david.olofson at reologica.se

David_Olofson · November 6, 2000, 3:10pm

Fri, 03 Nov 2000 Mattias Engdeg?rd wrote:

SDL cannot (yet) use the oversized back-buffer strategy, although I’m
mulling over a way to squeeze it in.

That would really be great!!!

We need
a) a way to tell SDL that we want a bigger buffer, and how big,

Back in the Amiga days, we called the big buffers “Playfields”, and the screen
display window the “Viewport”. The important difference between most systems
used on PCs (and other systems that generally don’t support hardware scrolling)
is that these two are separated, and that the Viewport is actually not a
surface, but just a logical description of a part of a playfield to display.
AmigaDOS had structures for these things, and used the info to construct
COPPER-lists (code for the rastersynced slave processor) to set up the display.

b) a way to set the offset within the buffer to be used when updating,

Position of the top-left corner of the Viewport relative to the top-left corner
of the Playfield. (“Playfield” corresponds to “video surface”.)

c) a new interface that conveys these tidbits of information to the drivers

Well, that’s where it gets complicated; this interface has to be different for
different targets in order to achieve maximum performance in all situations. An
object oriented polymorphic design should deal with that easily - the question
is, what should the interface look like? Some fundamental ideas…

	Playfield:		Viewport:

Alt. names: Surface: Display:
Data:
width width
height height
*pixels *Playfield
xOffset
yOffset
Actions:
Lock() Flip()
Release()
Invalidate(rect)

(Note: It’s not entirely obvious where the scroll offsets should be. In a
tripple buffered design like the scrolling game I described earlier, the two
pages that are flipped after every sprite update have individual offsets…)

Playfield.Lock()/.Release() would do basically what the current surface
versions are doing. Playfield.Invalidate(rect) is an interface that some
implementations may use for smart refresh when the application accessible
surfaces are not in VRAM, and/or when “flipping” is performed by blitting.
Invalidate() basically allows the driver to build a list of rects that need to
be ready and updated when/before the Playfield is flipped into display.

Playfield.Flip() will in some implementations simply blit the entire visible
part of the new surface into the display hardware surface, while on others, it
will implement the x/yOffsets of the Playfield through hardware scrolling and
do a real double buffer flip. In some cases it might do hardware scrolling, but
not having enough VRAM, it’ll have to refresh the invalidated areas of the VRAM
surface from the new Playfield.

d) think of a reasonable way to implement it for other targets than X11

For a), we can either add a new call to be used in place of SetVideoMode()
with the buffer size as extra parameters, or (perhaps better) a call
that just sets the buffer size, preferably to be called before SetVideoMode().

Isn’t this basically related to the “pitch” field (as DirectX calls it)? That
is, there should actually already be support for buffers that are wider than
the display window… This buffer size API extension would basically be about
turning this into something more useful than a hardware support hack.

For b), either add a call to set the current offset, or one that augments
SDL_UpdateRects() with two more arguments.

I think it needs to be more structured than that to deal with targets that can
actually do real hardware scrolling…

Oversized buffers can be used for any SWSURFACE display target, not just
X11, in a straightforward way. For hardware screen surfaces we could
allocate one out of video memory, but it is perhaps better to disallow it
for hardware surfaces for the time being, since other update strategies
often are appropriate than for software surfaces.

The “update” strategy for a double buffered, hardware scrolling display is
basically “Do nothing but flip the VRAM pointers.”

Suggestions:

SDL_SetBackBuffer(int w, int h) - sets the back buffer size
SDL_SetBufferOffset(int x, int h) - sets the current buffer offset

Well, I’d prefer having the latter associated with the actual buffers (ie one
set per buffer), as that makes it possible to hide the rather critical timing
of setting the hardware scralling registers on some (broken, I’d say; although
there are lots of those around!) cards. If the offset can be set directly by
applications, so that SDL cannot control the flip/scroll timing, there will be
serious trouble.

David Olofson
Programmer
Reologica Instruments AB
david.olofson at reologica.se

Mattias_Engdegard · November 7, 2000, 1:36pm

[ catching up a batch of mail, sorry for the late reply ]

c) a new interface that conveys these tidbits of information to the drivers

Well, that’s where it gets complicated; this interface has to be different for
different targets in order to achieve maximum performance in all situations.

No, this is the easy part. It’s an internal interface that isn’t exported
so we can change it whenever we like without breaking anything, since
all the drivers are in the same source tree. We can afford to make mistakes
here.

d) think of a reasonable way to implement it for other targets than X11

For a), we can either add a new call to be used in place of SetVideoMode()
with the buffer size as extra parameters, or (perhaps better) a call
that just sets the buffer size, preferably to be called before SetVideoMode().

Isn’t this basically related to the “pitch” field (as DirectX calls it)? That
is, there should actually already be support for buffers that are wider than
the display window… This buffer size API extension would basically be about
turning this into something more useful than a hardware support hack.

The pitch field isn’t a hack. It’s a convenient way to represent
several common image formats, with alignment restrictions. It also
works well for subimage extraction, and necessary for some drivers when
the chosen screen size is not one natively available.

Maybe it can be augmented to be used for a big “playfield” (as you
call it; not always accurate since most practical uses of it is for
including scroll margins, not for representing an entire playfield).
On the other hand, the user must be able to access the entire buffer,
not just the current viewport.

And while we’re at it, if we include an offsettable SDL_UpdateRects(),
it should either be a per-rectangle offset, or it should be possible
to inhibit the implicit XSync() (in case different rectangles are
updated with different offsets, which is common if you have a scrolling
and a static area).

I think it needs to be more structured than that to deal with targets that can
actually do real hardware scrolling…

As I said, I didn’t concern myself overly with hardware scrolling; partly
because I have no access to any hardware that can do it, but mostly because
it is less of a problem with hardware surfaces — repainting the entire
viewport each frame is not a big deal, and easily allows for multiple
layers.

That said, if you can design a general model that encompasses a large
set of obsolete, contemporary and future hardware in that regard, that
would be a very valuable contribution.

David_Olofson · November 7, 2000, 9:43pm

Tue, 07 Nov 2000 Mattias Engdeg?rd wrote:

[ catching up a batch of mail, sorry for the late reply ]

c) a new interface that conveys these tidbits of information to the drivers

Well, that’s where it gets complicated; this interface has to be different for
different targets in order to achieve maximum performance in all situations.

No, this is the easy part. It’s an internal interface that isn’t exported
so we can change it whenever we like without breaking anything, since
all the drivers are in the same source tree. We can afford to make mistakes
here.

The problem I’m thinking about is making this flexible enough, and to make sure
it doesn’t break the API semantics when adding new driver implementations.

d) think of a reasonable way to implement it for other targets than X11

For a), we can either add a new call to be used in place of SetVideoMode()
with the buffer size as extra parameters, or (perhaps better) a call
that just sets the buffer size, preferably to be called before SetVideoMode().

Isn’t this basically related to the “pitch” field (as DirectX calls it)? That
is, there should actually already be support for buffers that are wider than
the display window… This buffer size API extension would basically be about
turning this into something more useful than a hardware support hack.

The pitch field isn’t a hack. It’s a convenient way to represent
several common image formats, with alignment restrictions. It also
works well for subimage extraction, and necessary for some drivers when
the chosen screen size is not one natively available.

Well, that’s not what I meant really; the point is just that it could be viewed
as more than just a way of describing how to address the buffer.

Maybe it can be augmented to be used for a big “playfield” (as you
call it; not always accurate since most practical uses of it is for
including scroll margins, not for representing an entire playfield).

True, although it was rather commod to used it as an actual playfield in games
that didn’t need bigger “maps” than could fit in the available chip RAM -
that’s probably why it was a very popular term back then. (The Amiga didn’t have
any dedicated VRAM, except that only the low 512 K-2 MB depending on chipset
DMAable.)

On the other hand, the user must be able to access the entire buffer,
not just the current viewport.

Yes. That’s why it might be a good idea to implement the viewport as a separate
object, more like an abstraction of a hardware display. It doesn’t really have
anything to do with the playfield/buffer when it comes to rendering. (Except
that some engines will like to manipulate the buffer’s scroll position along
with it’s contents.)

And while we’re at it, if we include an offsettable SDL_UpdateRects(),
it should either be a per-rectangle offset, or it should be possible
to inhibit the implicit XSync() (in case different rectangles are
updated with different offsets, which is common if you have a scrolling
and a static area).

Yeah, “split screen” emulation… Very few machines actually have hardware
support for that, except for simple versions such as the VGA split, which
simply resets the VRAM pointer and scroll regs at a specified scan line. (To
hardware scrolling of the bottom window, that is.)

Anyway, are the rectangles you’re referring to here actually persistent
objects? What I’m getting at is that it’s rather messy (at all possible?) to
make use of hardware scrolling if there are no persistent abstraction of the
current display.

I think it needs to be more structured than that to deal with targets that can
actually do real hardware scrolling…

As I said, I didn’t concern myself overly with hardware scrolling; partly
because I have no access to any hardware that can do it, but mostly because
it is less of a problem with hardware surfaces — repainting the entire
viewport each frame is not a big deal, and easily allows for multiple
layers.

Most modern video cards are capable of hardware scrolling (and not only in
VGA emulation mode), and it’s actually used by some Windows GDI drivers.
However, I’ve only seen it used for trivial things such as scrolling around a
big “virtual” desktop. AFAIK, those features are not accessible to applications
using any standard API.

That said, if you can design a general model that encompasses a large
set of obsolete, contemporary and future hardware in that regard, that
would be a very valuable contribution.

Well, I think I could design something that would do that to great extent at
least, but what should it support, actually? The problem is that hardware
scrolling (in the form seen in most hardware at least) has some inherent
problems, such as not being able to deal with parallax effects (exception:
Amiga, and some consoles, like SNES), and enforcing rather complicated
scrolling algorithms on any engines that want to make use of it.

So, games that could make use of it will run full speed without it on anything
like decent hardware (well, in theory - I have yet to see it on Linux), and the
games that need more speed won’t have much use for single layer hardware
scrolling anyway.

It’s probably better (and perhaps even easier!) to design a general API for 2D
graphics engines with parallax scrolling and sprites, and then hack optimized
implementations of that for hardware that supports it. As of now, I bet the
most popular and usable implementations would use plain VRAM->VRAM blitting with
3D acceleration…

I’m planning to do that eventually, but doubt anything like that will ever
become a popular API for games. It kind of worked for 3D, but 2D seems to
require too much control down to the pixel level to allow generic APIs and
hardware acceleration.

OTOH, do any of the existing 2D APIs/packages have the power and features
required for a real arcade quality 2D shooter? Everything I’ve seen so far has
been more or less mediocre.

//David

Mattias_Engdegard · November 8, 2000, 12:22pm

Yeah, “split screen” emulation… Very few machines actually have hardware
support for that, except for simple versions such as the VGA split, which
simply resets the VRAM pointer and scroll regs at a specified scan line. (To
hardware scrolling of the bottom window, that is.)

Anyway, are the rectangles you’re referring to here actually persistent
objects? What I’m getting at is that it’s rather messy (at all possible?) to
make use of hardware scrolling if there are no persistent abstraction of the
current display.

Sure, but you’d have a rather different implementation strategy for
hardware surfaces anyway, probably by hardware-blitting the static
part onto the target screen. But updating rectangles with different
buffer offsets is necessary for doing it in software.

OTOH, do any of the existing 2D APIs/packages have the power and features
required for a real arcade quality 2D shooter? Everything I’ve seen so far has
been more or less mediocre.

Since you can run MAME and SNES9x emulating real arcade quality 2d games,
which do all their rendering in software, it’s definitely possible.

Actually I think that software rendering can be quite workable, at
least if we can bring the RAM->video memory transport cost down to
manageable levels, perhaps by using DMA.

CPUs are often fast enough for this to work, and by using optimized
routines that exploit SIMD instructions, a great variety of effects
should be possible. I’ve been thinking about the design of a blit
system that allows optimized inner loops written in assembler to be
plugged in for various architectures. It won’t work for everything
but it is definitely an alternative

David_Olofson · November 8, 2000, 3:58pm

Wed, 08 Nov 2000 Mattias Engdeg?rd wrote:

Yeah, “split screen” emulation… Very few machines actually have hardware
support for that, except for simple versions such as the VGA split, which
simply resets the VRAM pointer and scroll regs at a specified scan line. (To
hardware scrolling of the bottom window, that is.)

Anyway, are the rectangles you’re referring to here actually persistent
objects? What I’m getting at is that it’s rather messy (at all possible?) to
make use of hardware scrolling if there are no persistent abstraction of the
current display.

Sure, but you’d have a rather different implementation strategy for
hardware surfaces anyway, probably by hardware-blitting the static
part onto the target screen. But updating rectangles with different
buffer offsets is necessary for doing it in software.

I’d prefer to see that the offsets are kept in the “Viewports”, and the update
rects being attached to these, for use when it’s time to actually perform the
update. (When this will happen depends on the actual display setup…)

Anyway, I’m not giving up just yet.

OTOH, do any of the existing 2D APIs/packages have the power and features
required for a real arcade quality 2D shooter? Everything I’ve seen so far has
been more or less mediocre.

Since you can run MAME and SNES9x emulating real arcade quality 2d games,
which do all their rendering in software, it’s definitely possible.

Well, yes; at least I’ve seen an SNES emulator do 60 FPS, although this is with
rather low resolutions. It might be that I’m getting too used to 19" and 21"
monitors, but 320x240 doesn’t look all that sexy any more…

Actually I think that software rendering can be quite workable, at
least if we can bring the RAM->video memory transport cost down to
manageable levels, perhaps by using DMA.

It requires heavy optimization, but getting rid of this sysRAM->VRAM bottleneck
(*) is paramount - all optimizations are quite pointess as long as that
problem remains, at least if the game is supposed to run at full video frame
rate.

(*) Which eats 80-200% of the frame time on the Linux setups I’ve seen so far -
except for my P-III 933 which can actually do some 120 FPS in 1024x768x8
using fbdev; 105 Hz with XFree 4.0.1 IIRC. I think that’s a very, very low
figure for such a machine, doing nothing but a memset() on the buffer
before flipping it in…

CPUs are often fast enough for this to work, and by using optimized
routines that exploit SIMD instructions, a great variety of effects
should be possible.

Yeah, that’s what I thought, and was planning to do from the beginning.
However, this getting-the-output-displayed story has been, to say the least,
depressing so far. Tripple buffering + DMA blits that don’t kill the
CPU<->system RAM bandwidth should solve this. (Even more interesting for video
playback and editing apps, as those have to do software image processing.)

I’ve been thinking about the design of a blit
system that allows optimized inner loops written in assembler to be
plugged in for various architectures. It won’t work for everything
but it is definitely an alternative

I have some ideas in this are as well. The biggest problem is the usual
antagonism between control and speed… Either you have a high level API that
allows a wide range of low level implementations, or you have a low level API
that forces applications to bother with countless different schenarios to
achieve maximum speed on all targets. In the latter case, it’s probably better
not to try to abstract things in the first place, as that will only make the
API more incomprehensible than doing it all directly on the underlying APIs
and/or hardware.

The former alternative seems more appealing, but it only transforms the problem
into balancing API size and complexity against features. And, a complex, hard
to extend engine with few features won’t be used by anyone.

Conclusion: The only ways to design an API that allows fast implemenations
over a wide range of targets is to either make it big, feature rich and
flexible, or to make it small, low level and simple.

If I’m understanding the philosophy behind SDL correctly, the last alternative
would be the way to go. (It also happens to be the alternative that’s easier
to implement than a full, heavily optimized OpenGL implementation…)

Then again, there’s always more than one dimension to (most) things… (I’m
thinking in terms of turning the engine around, and requesting exactly the data
it needs for every update, using callbacks to the application - somewhat
similar to a carefully optimized windowing system.)

I have an old sideways scrolling DOS PM16 game that I’ve “been going to” port to
Win32 and Linux for ages now, and I think it’s a perfect example of a game that
can make use of the kind of "hardware scrolling emulation optimizations"
discussed earlier in this thread. It could serve as a pilot project for my shot
at coming up with an API proposal.

It has single level, unlimited distance, fixed rate sideways scrolling (no
parallax), 32 pixels up/down scrolling, a split-screen dash-board on the bottom
of the screen, reasonable numbers of small to medium sprites, and it’s designed
to run at a fixed 60 FPS rate. It’s using a 320x238 hardware scrolled Mode-X
display. (Oh, and it has terrible PC speaker sound effects, and very stupid
enemies as well.

I already have an unoptimized version that renders to a GDI Window through a
VGA Mode-X “emulator” (could start out by porting that to SDL), but most of
this will probably be replaced, due to the very different (and hopefully more
flexible and easier to use) engine design I’m figuring out right now.

I’ll be back.

//David

slouken · November 8, 2000, 9:56pm

I’ll be back.

Sounds good. I’m looking forward to seeing what you come up with.
We can experiment with this sort of thing in the 1.3 development series.

See ya!
-Sam Lantinga, Lead Programmer, Loki Entertainment Software