Thu, 02 Nov 2000 Jeremy Gregorio wrote:
A while back I came across a fast scroll trick on a vga game
programming site. Here’s the gist of if:
-
Draw your tiles to screen
-
Save the spot on screen where each sprite is to be drawn. In other words
make an eraser for each sprite.
Don’t do this with the CPU on modern cards! Reading from VRAM is usually
painfully slow…
-
Draw the sprites.
-
Update the screen.
-
Erase the sprites
…by overdrawing the areas with tiles from the map. This works with any method
(OpenGL, Direct3D DM, CPU direct access, 2D acceleration…), even if it
doesn’t support copying screen areas into off-screen buffers. (Some 3D
accelerators won’t do that, and mixing 2D and 3D acceleration is usually a
very bad idea.)
- Shift the tiles on screen around to simulate scrolling. Draw newly visable
tiles where needed.
Once upon a time, there was a fairly standard feature called “hardware
scrolling”… sigh
Anyway, I used that in a VGA Mode-X game engine with a special form of tripple
buffering; while two of the buffers were used as display and sprite rendering
surfaces (to eliminate flicker), the third buffer was updated in the
background, a few tiles at a time. The two buffers in the display loop were
horizontally offset by 16 pixels (total scrolling stroke: 32 pixels), and when
the oldest one hit the border, it was replaced with the third buffer, which by
that time contained a new background, scrolled 32 pixels in relation to the old
one. After scrolling another 16 pixels, the other “in-loop” buffer was
replaced, and so on.
The CPU usage per video frame when scrolling 60 pixels/s (full frame rate at
320x232) was about the same as for painting a single 32x32 pixel sprite.
(Note: This may seem like an awfully complicated way to do it, but my absolute
requirement was that the game should run at full frame rate on “any” machine -
this was back when a 486-66 was a pretty hot machine, and some machines came
with VGA cards that couldn’t transfer more than some 40 full frames over the
bus, no matter what. Besides, I came right from the Amiga, where this kind of
solutions were the only way to do full frame rate scrolling. The blitter just
wasn’t fast enough for full screen blitting with acceptable color deepth…)
- Repeat.
The point of all this is to avoid the huge number of blits needed to fill a
screen with tiles (imagine two layers of 20x20 tiles or 800 blits to draw the
screen).
(If the two layers are parallax scrolling, you’re out of luck!
This way at most you need to blit two sides of the screen with tiles
or about 20+20 or 40 blits (plus the blits to make and use sprite erasers).
Only the “use” part of the sprite erasers take significant time when
reconstructing from the map. OTOH, if there are at least three layers of tiles
on average, or some other factors that make map rendering heavy, the background
copying method might pay off - but do remember that modern video cards aren’t
designed for video->sysram transfers…
The
real problem is step number 6, where the tiles allready on screen are moved to
where they would be after a scroll.
First off there’s two basic roads to take: use SDL_BlitSurface() to
move the data or lock the surface and use some sort of per pixel or byte based
method. In Win32 DirectX when I was playing around with this algorythm I used a
blit because I was practically garanteed a fast hardware blit would take place.
Under SDL I’m much less sure of that so I’m thinking using the lock surface
approach might be best. What’s great about that is I can use fast memory
functions. I’m thinking I could use a memcpy().
I’ve actually tried reading from various “modern” cards (S3, Permedia 2 and
other common chipsets), and I can only confirm what I’ve heard from quite a
few game programmers. All of them are several times faster on CPU writes than
they are on reads, regardless of word size and access pattern. Now, having
experienced that the write speed is already a problem, this doesn’t look like
a good idea to me…
I’ve seen the shift operator
used in old VGA programming and if I could figure out how to do that it’d be
really fast moving the data.
One may think that shifting is pointless with the modern packed pixel modes,
but unfortunately, CPUs are not that good at grouping sub word sized accesses
at all times. MMX is an example of that, so dusting of those old shifting
tricks might be a good idea.
Anyway, the problem with reading VRAM indicates that you should be
copying/shifting from sysram to VRAM if you’re using using the CPU at all.
But then again that means a lot of
SurfaceLock/Unlocks in my code. One for every scroll operation in fact.
Well, you can’t ever get lower than that without hardware scrolling… (And you
probably want at least some of the sprites to update every frame anyway.) One
lock/unlock cycle per video frame should hardly be a performance problem.
I guess what I’m really asking for is advice on which route I should
take (or else a better scroll algorythm, I’m always open to suggestions). I’m
really curious about the fastest way to shift data around the pixel array from
my call to SDL_LockSurface.
The route I’m going to take next time I get around to actually hack something
games related is using 3D acceleration for 2D graphics. This seems to be the
most reliable way to get hardware acceleration, and besides, you get some
bonuses, such as alpha blending and interpolated scaling + rotating, almost for
free. “Everyone’s doing it now,” and after looking closer at some 3D games
running on different 3D cards, I’m becoming less worried about the lack of
detailed control you get compared to software rendering. The interpolation
blurring effect is hardly visible on anything like decent resolutions on a PC
screen, and sharp, pixel size details are not going to look good in a shooter
at 1024x768 anyway, so that argument for software rendering is pretty much void
by now.
So, next I’ll just get this G400 MAX to accelerate 3D on XFree86 4.0.1… heh
David Olofson
Programmer
Reologica Instruments AB
david.olofson at reologica.se
…- M u C o S --------------------------------. .- David Olofson ------.
| A Free/Open Multimedia | | Audio Hacker |
| Plugin and Integration Standard | | Linux Advocate |
------------> http://www.linuxdj.com/mucos -' | Open Source Advocate | ..- A u d i a l i t y ------------------------. | Singer | | Rock Solid Low Latency Signal Processing | | Songwriter |
—> http://www.angelfire.com/or/audiality -’ `-> david at linuxdj.com -’