Optimized drawing methodsI

Christopher_Thielen · January 15, 2003, 8:47pm

I was wondering if anybody could explain this to me, I’m running under
X11 and I don’t know if Win32 would show much difference, but here goes:

I’ve been trying to get my application to draw as fast as possible, as
I’m a hobbyist programmer/learn-as-I-go and I’ve had to rewrite some
pretty crappy code from the past, and I’m sure I’ll rewrite it again.
But I’ve been going through various methods of blitting and I was
wondering if somebody could explain the results to me:

ORIGINAL METHOD:
My original method consisted of a simple double buffer. I had a
software surface the size of the window, I’d blit and scratch pixels
there. It was divided into tiles, and if any pixel within a tile (they
were about 30x25 or something like that) was drawn to, that pixel became
dirty, and when all the drawing as done, my flip function would go
through and flip all the tiles that were dirtied. gprof showed that
function to be slow, so I tried speeding it up and to reduce blits I
wrote some code to detect blocks of tiles and such. Okay, that’s my
original method, a simple double buffer. It took about 50% cpu during
normal gameplay and 85% during heavy action.

SECOND METHOD:
My second method was taken from the programmer of LGames. I asked the
maker of lbreakout2 how he got this application to draw, and so I took
his advice and avoided SDL_Flip(). I simply drew directly to the window
surface, writing code to wait for a lock for scratching and got all that
done. I then had a large array of SDL_Rects which I kept track of all
blits, and when it came time to flip, I used SDL_UpdateRects() and
updated them all. This was a major performance increase. 20-30% cpu
during gameplay and 45% during normal gameplay. All these benchmarks are
under X11.

THIRD METHOD:
I thought I’d try and be smart and combine both methods, so I could
blit and scratch around in memory, arguably faster, right? The only
different between this and my original method, is I’d use a large array
of SDL_Rects() from my second method. So I blit/scartch to the
backbuffer, remember the exact rects I did this in, then
SDL_UpdateRects() on all those. This method proved to be slightly faster
than my original, method, but no where near as fast as the second
method. This method was about 20% slower in everything from my second
method.

Can anybody explain why the third method wasn’t faster? I’m not entirely
sure. I think maybe there’s a huge slowdown in calling SDL_BlitSurface()
a bazillion times, but shouldn’t the original backbuffer method be the
fastest anyway? Don’t most games use a simple dobule buffer? Why is the
second method, literally just blitting as I need to draw/erase and then
calling SDL_UpdateRects() on a large array the fastest way to do it?
Does anybody know why maybe the double buffer method wasn’t the fastest?
Thanks for your thought, I’m not sure why it wouldn’t be the fastest.
Thanks.–
Chris Thielen <@Christopher_Thielen>

Tyler_Montbriand · January 15, 2003, 10:03pm

Chris Thielen wrote:

I was wondering if anybody could explain this to me, I’m running under
X11 and I don’t know if Win32 would show much difference, but here goes:

I’ve been trying to get my application to draw as fast as possible, as
I’m a hobbyist programmer/learn-as-I-go and I’ve had to rewrite some
pretty crappy code from the past, and I’m sure I’ll rewrite it again.
But I’ve been going through various methods of blitting and I was
wondering if somebody could explain the results to me:

ORIGINAL METHOD:
My original method consisted of a simple double buffer. I had a
software surface the size of the window, I’d blit and scratch pixels
there. It was divided into tiles, and if any pixel within a tile (they
were about 30x25 or something like that) was drawn to, that pixel became
dirty, and when all the drawing as done, my flip function would go
through and flip all the tiles that were dirtied. gprof showed that
function to be slow, so I tried speeding it up and to reduce blits I
wrote some code to detect blocks of tiles and such. Okay, that’s my
original method, a simple double buffer. It took about 50% cpu during
normal gameplay and 85% during heavy action.

Not suprising, if you were using SDL_Flip… unless you’ve got actual hardware
double-buffering, it had to blit the ENTIRE surface to the screen ever time…
an 80060032bpp window, for example, would need 90MB/s throughput to update at
even 50FPS.

SECOND METHOD:
My second method was taken from the programmer of LGames. I asked the
maker of lbreakout2 how he got this application to draw, and so I took
his advice and avoided SDL_Flip(). I simply drew directly to the window
surface, writing code to wait for a lock for scratching and got all that
done. I then had a large array of SDL_Rects which I kept track of all
blits, and when it came time to flip, I used SDL_UpdateRects() and
updated them all. This was a major performance increase. 20-30% cpu
during gameplay and 45% during normal gameplay. All these benchmarks are
under X11.

OK, that’s a pretty good and normal method. Might be able to do some merging of
rectangles and up the efficiency even more, too.

THIRD METHOD:
I thought I’d try and be smart and combine both methods, so I could
blit and scratch around in memory, arguably faster, right? The only
different between this and my original method, is I’d use a large array
of SDL_Rects() from my second method. So I blit/scartch to the
backbuffer, remember the exact rects I did this in, then
SDL_UpdateRects() on all those. This method proved to be slightly faster
than my original, method, but no where near as fast as the second
method. This method was about 20% slower in everything from my second
method.

Can anybody explain why the third method wasn’t faster? I’m not entirely
sure. I think maybe there’s a huge slowdown in calling SDL_BlitSurface()
a bazillion times, but shouldn’t the original backbuffer method be the
fastest anyway? Don’t most games use a simple dobule buffer? Why is the
second method, literally just blitting as I need to draw/erase and then
calling SDL_UpdateRects() on a large array the fastest way to do it?
Does anybody know why maybe the double buffer method wasn’t the fastest?
Thanks for your thought, I’m not sure why it wouldn’t be the fastest.
Thanks.

The theory of double-buffering, afaik, is you’ve got two backbuffers that get
swapped in their totality every screen update. When done in hardware, this is
extremely fast - the video card simply changes which video surface it’s
displaying - no blitting at all. In software, however, it’s excruciatingly
slow… it has to copy one of the backbuffers in it’s entirety to the actual
buffer on every SDL_Flip() call.

Not exactly sure what SDL_UpdateRects() is doing on double-buffered surfaces,
you’re supposed to use SDL_Flip() for that… plain UpdateRects doesn’t work at
all with hardware double-buffering; you just keep writing stuff to your
backbuffer without the changes ever being displayed 'cause the surfaces don’t get
swapped. And if you do flip every frame, the buffer you’re blitting to is two
frames out-of-date, not just one.
In short, double-buffering and traditional dirtyrects don’t mix.

So: The sequence of events with method #1:
-Blit dirtied items to current backbuffer
-Blit entire backbuffer to screen

Sequence of events with #2:
-Blit dirtied items to backbuffer
-Blit only changed rectangles to screen

Sequence of events with #3:
-Blit dirtied items to current backbuffer
-Copy stuff to screen somehow(suspect a total-screen blit, possibly several
times)

Patrick_McFarland · January 16, 2003, 12:11am

Its closer to two buffers, and which one is front and back changes on flip.
(Semantics, yes, I know. But its important to get something like this right.)
And afaik, software isnt that much slower, but I wouldnt want to be doing
640x480x32bit blitting…On 16-Jan-2003, Corona688 wrote:

The theory of double-buffering, afaik, is you’ve got two backbuffers that get
swapped in their totality every screen update. When done in hardware, this is
extremely fast - the video card simply changes which video surface it’s
displaying - no blitting at all. In software, however, it’s excruciatingly
slow… it has to copy one of the backbuffers in it’s entirety to the actual
buffer on every SDL_Flip() call.

–
Patrick “Diablo-D3” McFarland || unknown at panax.com
"Computer games don’t affect kids; I mean if Pac-Man affected us as kids, we’d
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." – Kristian Wilson, Nintendo, Inc, 1989
-------------- next part --------------
A non-text attachment was scrubbed…
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20030116/c648031d/attachment.pgp

Tyler_Montbriand · January 16, 2003, 11:08am

Patrick McFarland wrote:

Its closer to two buffers, and which one is front and back changes on flip.
(Semantics, yes, I know. But its important to get something like this right.)
And afaik, software isnt that much slower, but I wouldnt want to be doing
640x480x32bit blitting…

Having witnessed the difference between fullscreen doublebuffering in
Win32(hardware) and windowed doublebuffering in Win32(software), I humbly beg to
differ. There’s a difference. On the order of maxing out the framerate vs getting
35FPS. The reason is simple…

With hardware acceleration, no blitting at all happens during SDL_Flip(); the
video card just changes which memory address it starts scanning from, and SDL swaps
two pointers. That’s pretty well it. Under software, however, the video card can’t
change contexts like that. You have to blit the entire darn surface over
into video memory before it’s displayed. Under 64048032, that means one meg per
frame. A couple I/O operations and a pointer swap beats a 1MB block transfer in
my book.

Daniel_Phillips · January 16, 2003, 2:36pm

This isn’t quite accurate either. If rendering is done to main memory but a
hardware-accelerated blit to the graphics memory is available that uses DMA
over AGP and runs in the background, then the CPU will scarcely notice the
massive blits taking place at all. However, getting all those things to come
true at once has proved to be something of a nightmare, as there are many
ways for subsystems to drop the ball due to configuration mistakes,
unimplemented features, or even design errors.

For example, AGP 8X provides 2.1 GB/sec bandwidth, while blitting 70 FPS,
1600x1200, 32 bit color requires only a little over half a GB/sec. So if
software blitting isn’t fast as heck on modern hardware, do blame the system
or the configuration, don’t blame the hardware.

DanielOn Thursday 16 January 2003 20:07, Corona688 wrote:

With hardware acceleration, no blitting at all happens during SDL_Flip();
the video card just changes which memory address it starts scanning from,
and SDL swaps two pointers. That’s pretty well it. Under software,
however, the video card can’t change contexts like that. You have to blit
the entire darn surface over into video memory before it’s displayed.
Under 64048032, that means one meg per frame. A couple I/O
operations and a pointer swap beats a 1MB block transfer in my book.