Comparing cpu usage

Hey SDLers,
I’m wondering if perhaps I’m doing something wrong. I have a game which
runs great and I’ve been using SDL for months but when I look at how
much cpu the game is using, on my P3/500 with nothing else running (this
is under X11, no DGA btw), it’s taking up 60-70% of the cpu. Is this
normal for games that are not hardware accelerated? It’s a simple 2D
space combat game.–
Chris Thielen <@Christopher_Thielen>

I’m wondering if perhaps I’m doing something wrong. I have a game which
runs great and I’ve been using SDL for months but when I look at how
much cpu the game is using, on my P3/500 with nothing else running (this
is under X11, no DGA btw), it’s taking up 60-70% of the cpu. Is this
normal for games that are not hardware accelerated? It’s a simple 2D
space combat game.

Many questions:

  • What resolution?
  • What bit depth does the game natively run in?
  • Are there any 2D primitives that are being accelerated by the hardware?

X11 Runs like a dog anyway, so I wouldn’t be surprised if that was your
bottleneck, but without knowing the above, it’s hard to make an accurate
assessment of where the problem could be.

–>Neil-------------------------------------------------------------------------------
Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898

I'm wondering if perhaps I'm doing something wrong. I have a game which

runs great and I’ve been using SDL for months but when I look at how
much cpu the game is using, on my P3/500 with nothing else running (this
is under X11, no DGA btw), it’s taking up 60-70% of the cpu. Is this
normal for games that are not hardware accelerated? It’s a simple 2D
space combat game.

Many questions:

  • What resolution?
  • What bit depth does the game natively run in?
  • Are there any 2D primitives that are being accelerated by the hardware?

800x600, running at 32bpp, and no 2d primitives are being accelerated.
So, is 60% on a P3/500 under X11 sound normal? Would going 640x480x16
make a large difference?On Wed, 2002-12-25 at 17:34, Neil Bradley wrote:

X11 Runs like a dog anyway, so I wouldn’t be surprised if that was your
bottleneck, but without knowing the above, it’s hard to make an accurate
assessment of where the problem could be.

–>Neil


Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Chris Thielen <@Christopher_Thielen>

Hey SDLers,
I’m wondering if perhaps I’m doing something wrong. I have a game
which
runs great and I’ve been using SDL for months but when I look at
how
much cpu the game is using, on my P3/500 with nothing else running
(this
is under X11, no DGA btw), it’s taking up 60-70% of the cpu. Is
this
normal for games that are not hardware accelerated? It’s a simple
2D
space combat game.

Does your game use a fixed framerate? If so, how do you wait until
it is time to process the next frame? If you use a busy loop (i.e.
while (SDL_GetTicks() < next_time) { }) then that is what is taking
up most of the CPU time. Even if you use a busy loop in combination
with an SDL_Delay(), you’ll see pretty heavy CPU usage, althought
the delay helps to reduce it.From: “Chris Thielen”

Matthijs Hollemans
www.allyoursoftware.com

Yes, I wrote a simple “desired framerate”, such that if the computer is
able to execute the loop 25 times per second, it sleeps until the second
is up. I implement it by calculating the time to sleep in ms and then
calling SDL_Delay(). When SDL_Delay() returns, I carry out another loop
for the game loop in the new second and repeat the pattern.

Do I need smarter sleeping code? Or is it probably the 800x600x32 under
X11 taking the CPU?On Thu, 2002-12-26 at 01:17, Matthijs Hollemans wrote:

Does your game use a fixed framerate? If so, how do you wait until
it is time to process the next frame? If you use a busy loop (i.e.
while (SDL_GetTicks() < next_time) { }) then that is what is taking
up most of the CPU time. Even if you use a busy loop in combination
with an SDL_Delay(), you’ll see pretty heavy CPU usage, althought
the delay helps to reduce it.

Matthijs Hollemans
www.allyoursoftware.com


Chris Thielen <@Christopher_Thielen>

  • What resolution?
  • What bit depth does the game natively run in?
  • Are there any 2D primitives that are being accelerated by the hardware?
    800x600, running at 32bpp, and no 2d primitives are being accelerated.
    So, is 60% on a P3/500 under X11 sound normal? Would going 640x480x16
    make a large difference?

Actually, it sounds fairly reasonable! That means you could run a game @
800x600 @32bpp on a Pentium 350. Not bad!

Going to 640x480x16 may make a big difference. Certainly the drop in data
on a per frame basis (1.9MB vs. 614K) will lessen the amount of data being
blitted to the screen and will have an impact. But it’ll also depend upon
how much hardware accelleration (if any) is being used @ 32bpp mode that
you’re not aware of.

So I guess the answer is in theory, dropping to 640x480x16 should yield a
decent speed increase, but without lots of specifics, there’s no way to
know.

–>Neil-------------------------------------------------------------------------------
Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898

Yes, I wrote a simple “desired framerate”, such that
if the computer is able to execute the loop 25 times
per second, it sleeps until the second is up. I implement
it by calculating the time to sleep in ms and then calling
SDL_Delay(). When SDL_Delay() returns, I carry out
another loop for the game loop in the new second and
repeat the pattern.

Did I understand correctly that you perform the game loop 25 times
in a row as fast as you can, then you sleep a bit, then you perform
another 25 loops in a row, and so on? Or, as seems more reasonable,
do you try to perform one iteration of the game loop every 1/25th of
a second? For example, this gives you 40ms per iteration, so if the
computer only needs 32ms, you wait another 8 before you start the
next iteration.

Do I need smarter sleeping code?

Since you are using SDL_Delay(), it wasn’t the timing code that
caused the CPU load. (If you had used a busy loop, the load would be
more like 100%.)

However, you could make your sleeping code a little smarter, to
improve the responsiveness of your game. SDL_Delay() has a
granularity of about 10ms (on most systems), which means that if you
ask it to sleep 17ms, it will probably sleep 20. If your computer is
fast enough and the desired framerate slow enough, then this should
not be a problem.

But, you can get more accurate timing using a combination of
SDL_Delay() and a busy loop. If you want your game to sleep 17ms,
you’d SDL_Delay() for 10ms and busy loop the remaining 7 (or to be
safe, SDL_Delay() only 9ms). Be advised that the busy loop will hog
your CPU even more, though…From: “Chris Thielen”

Matthijs Hollemans
www.allyoursoftware.com

Hey SDLers,
I’m wondering if perhaps I’m doing something wrong. I have a game which
runs great and I’ve been using SDL for months but when I look at how
much cpu the game is using, on my P3/500 with nothing else running (this
is under X11, no DGA btw), it’s taking up 60-70% of the cpu. Is this
normal for games that are not hardware accelerated? It’s a simple 2D
space combat game.

Depends on what you are doing and the resolution you are doing it in.
For a “simple 2D game” running on a p3/500, 1 or 2 percent CPU usage is
more like what I would expect to see. Using X may add a couple of
percent to that, but not much more. (The 2D operations in X are about as
well optimized as they can get. They represent the work of hundreds, if
not thousands, of programmers working over that code for the last 20
years.)

The only way to find out where you are spending all your time is by
profiling the game.

The things to look for are page flipping when you don’t need it and
unsuspected pixel conversion. Most simple games benefit from using a
"dirty pixels" approach to updating the screen. Swapping the whole
screen is really expensive. And, you always want to let SDL pick the
display pixel depth and then convert your sprites to match it. If you
don’t do that then you will be paying for converting each pixel you blit
from the format it is in to the native format of the screen on each and
every blit.

Those two problems can cause dramatic slow downs in your game.

	Bob Pendleton

P.S.

If you want to learn how to write clean software rendering code you
should read the low level parts of the X server. Not only is it
excellent code but it is released under the X license which lets you use
it anyway you want to. IMHO you can learn more about graphics
programming by reading an understanding the X server than by spending a
year in computer graphics classes.On Wed, 2002-12-25 at 17:52, Chris Thielen wrote:


Chris Thielen


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

±-----------------------------------+

I ran the gprof profiler on my code and did a sample play, and I found
the areas of the code that use the most time:

15.58 0.12 0.12 581 206.54 208.81 erase_particles
12.99 0.22 0.10 950 105.26 105.26 flip
12.99 0.32 0.10 582 171.82 284.21 update_universe
7.79 0.38 0.06 5 12000.00 12000.00 ep_render_window
5.19 0.42 0.04 684 58.48 58.48 rotate
5.19 0.46 0.04 582 68.73 68.73 update_starfield
5.19 0.50 0.04 581 68.85 71.12 draw_particles
3.90 0.53 0.03 820751 0.04 0.04 putpixel

Are mostly functions that call putpixel() to draw a pixel onto a
software surface. erase_particles(), draw_particles() for example, both
call putpixel. Would it be best to integrate the putpixel code for all
the varying bit depths straight into functions that use it? Is that a
major slowdown? I remember reading that putpixel() is extremely slow,
even to software surfaces.On Fri, 2002-12-27 at 11:36, Bob Pendleton wrote:

On Wed, 2002-12-25 at 17:52, Chris Thielen wrote:

Hey SDLers,
I’m wondering if perhaps I’m doing something wrong. I have a game which
runs great and I’ve been using SDL for months but when I look at how
much cpu the game is using, on my P3/500 with nothing else running (this
is under X11, no DGA btw), it’s taking up 60-70% of the cpu. Is this
normal for games that are not hardware accelerated? It’s a simple 2D
space combat game.

Depends on what you are doing and the resolution you are doing it in.
For a “simple 2D game” running on a p3/500, 1 or 2 percent CPU usage is
more like what I would expect to see. Using X may add a couple of
percent to that, but not much more. (The 2D operations in X are about as
well optimized as they can get. They represent the work of hundreds, if
not thousands, of programmers working over that code for the last 20
years.)

The only way to find out where you are spending all your time is by
profiling the game.

The things to look for are page flipping when you don’t need it and
unsuspected pixel conversion. Most simple games benefit from using a
"dirty pixels" approach to updating the screen. Swapping the whole
screen is really expensive. And, you always want to let SDL pick the
display pixel depth and then convert your sprites to match it. If you
don’t do that then you will be paying for converting each pixel you blit
from the format it is in to the native format of the screen on each and
every blit.

Those two problems can cause dramatic slow downs in your game.

  Bob Pendleton

P.S.

If you want to learn how to write clean software rendering code you
should read the low level parts of the X server. Not only is it
excellent code but it is released under the X license which lets you use
it anyway you want to. IMHO you can learn more about graphics
programming by reading an understanding the X server than by spending a
year in computer graphics classes.


Chris Thielen <@Christopher_Thielen>


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Chris Thielen <@Christopher_Thielen>

Are mostly functions that call putpixel() to draw a pixel onto a
software surface. erase_particles(), draw_particles() for example, both
call putpixel. Would it be best to integrate the putpixel code for all
the varying bit depths straight into functions that use it?

Depending upon the code, absolutely. Eliminate run time comparisons on a
per pixel basis. That’s a lot of pixels! You also might try doing an
inlining of those functions if your compiler supports it.

Pixel access is PAINFULLY slow. It might be tempting to do something like
this to draw a blue box:

for (y = 0; y < 200; y++)
{
for (x = 0; x < 200; x++)
{
putpixel(x, y, 0x1f);
}
}

To draw a blue box, but that’s HORRIBLY slow. Go look at the putpixel
function in SDL. It’s big, and general purpose, bit it 'aint fast.
Instead, get a pointer to the surface, and do something like this
(assuming 16bpp):

pu16SurfacePtr = &((UINT16 *) surface->pixels)[startx + (starty *
(surface->pitch / sizeof(UINT16))];
u32PointerAdvance = ((surface->pitch / sizeof(UINT16)) - xcount;

for (y = 0; y < ycount; y++)
{
for (x = 0; x < xcount; x++)
{
*pu16SurfacePtr++ = 0x1e;
}

pu16SurfacePtr += u32PointerAdvance;
}

In short, it’s better to get a pointer to the starting location of the
surface and scribble on it directly. NEVER Use putpixel() to write large
amounts of data to a surface.

Of course the above routine is designed for 16bpp. You’ll have to make one
for 32bpp and one for 8bpp if your game supports it.

Is that a
major slowdown? I remember reading that putpixel() is extremely slow,
even to software surfaces.

Yes it is, considering that you’re doing a multiply, an add, and an offset
for every single pixel you write.

–>Neil-------------------------------------------------------------------------------
Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898

you could also do this to get rid of the inner loop which would speed things
up quite a bit!

pu16SurfacePtr = &((UINT16 ) surface->pixels)[startx + (starty *
(surface->pitch / sizeof(UINT16))];
u32PointerAdvance = ((surface->pitch / sizeof(UINT16));
int BoxWidth=xcount
sizeof(UINT16);

for (y = 0; y < ycount; y++)
{
memset(pu16SurfacePtr,0x1e,BoxWidth);
pu16SurfacePtr += u32PointerAdvance;
}

of course you could also write it in assembly and use CPU registers instead
of variables (much faster as well!) and rep stosb instead of memset but then
youd lose the portability of SDL so it might not be worth it for you.

Hope this helps!
-Atrix> ----- Original Message -----

From: nb@synthcom.com (Neil Bradley)
To:
Sent: Sunday, December 29, 2002 11:27 AM
Subject: Re: [SDL] comparing cpu usage

Are mostly functions that call putpixel() to draw a pixel onto a
software surface. erase_particles(), draw_particles() for example, both
call putpixel. Would it be best to integrate the putpixel code for all
the varying bit depths straight into functions that use it?

Depending upon the code, absolutely. Eliminate run time comparisons on a
per pixel basis. That’s a lot of pixels! You also might try doing an
inlining of those functions if your compiler supports it.

Pixel access is PAINFULLY slow. It might be tempting to do something like
this to draw a blue box:

for (y = 0; y < 200; y++)
{
for (x = 0; x < 200; x++)
{
putpixel(x, y, 0x1f);
}
}

To draw a blue box, but that’s HORRIBLY slow. Go look at the putpixel
function in SDL. It’s big, and general purpose, bit it 'aint fast.
Instead, get a pointer to the surface, and do something like this
(assuming 16bpp):

pu16SurfacePtr = &((UINT16 *) surface->pixels)[startx + (starty *
(surface->pitch / sizeof(UINT16))];
u32PointerAdvance = ((surface->pitch / sizeof(UINT16)) - xcount;

for (y = 0; y < ycount; y++)
{
for (x = 0; x < xcount; x++)
{
*pu16SurfacePtr++ = 0x1e;
}

pu16SurfacePtr += u32PointerAdvance;
}

In short, it’s better to get a pointer to the starting location of the
surface and scribble on it directly. NEVER Use putpixel() to write large
amounts of data to a surface.

Of course the above routine is designed for 16bpp. You’ll have to make one
for 32bpp and one for 8bpp if your game supports it.

Is that a
major slowdown? I remember reading that putpixel() is extremely slow,
even to software surfaces.

Yes it is, considering that you’re doing a multiply, an add, and an offset
for every single pixel you write.

–>Neil



Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

you could also do this to get rid of the inner loop which would speed things
up quite a bit!

pu16SurfacePtr = &((UINT16 ) surface->pixels)[startx + (starty *
(surface->pitch / sizeof(UINT16))];
u32PointerAdvance = ((surface->pitch / sizeof(UINT16));
int BoxWidth=xcount
sizeof(UINT16);

SDL code should use Uint16 (if it doesn’t use its own name); "UINT16"
isn’t portable.

of course you could also write it in assembly and use CPU registers instead
of variables (much faster as well!) and rep stosb instead of memset but then
youd lose the portability of SDL so it might not be worth it for you.

memset already does better optimizations than that on any sane
platform, and the optimizer will already put variables in registers
intelligently. (Actually, the precalculation of BoxWidth and
u32PointerAdvance is probably unnecessary, too.)

This is roving off-topic again, so I’ll leave it at this: Don’t make
the naive assumption that ASM is the only way to make code fast.
Optimizers are better than most people think.On Sun, Dec 29, 2002 at 11:51:55AM -0800, Atrix Wolfe wrote:


Glenn Maynard

you could also do this to get rid of the inner loop which would speed things
up quite a bit!
pu16SurfacePtr = &((UINT16 ) surface->pixels)[startx + (starty *
(surface->pitch / sizeof(UINT16))];
u32PointerAdvance = ((surface->pitch / sizeof(UINT16));
int BoxWidth=xcount
sizeof(UINT16);

for (y = 0; y < ycount; y++)
{
memset(pu16SurfacePtr,0x1e,BoxWidth);
pu16SurfacePtr += u32PointerAdvance;
}

Actually, no I can’t because this would set the value of each pixel to
0x1e1e, not 0x001e. But you’re right that this would work great for 8bpp,
or if one wanted to set the value to 0xffff (all white).

of course you could also write it in assembly and use CPU registers instead
of variables (much faster as well!) and rep stosb instead of memset but then
youd lose the portability of SDL so it might not be worth it for you.

rep stosd Would of course be better. :wink:

–>Neil-------------------------------------------------------------------------------
Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898

pu16SurfacePtr = &((UINT16 ) surface->pixels)[startx + (starty *
(surface->pitch / sizeof(UINT16))];
u32PointerAdvance = ((surface->pitch / sizeof(UINT16));
int BoxWidth=xcount
sizeof(UINT16);
SDL code should use Uint16 (if it doesn’t use its own name); "UINT16"
isn’t portable.

Whatever works. UINT16 Of course is portable as well provided it’s
defined. But it doesn’t matter - the point I was getting across is that
it’s a 16 bit pointer.

of variables (much faster as well!) and rep stosb instead of memset but then
youd lose the portability of SDL so it might not be worth it for you.
memset already does better optimizations than that on any sane
platform, and the optimizer will already put variables in registers
intelligently. (Actually, the precalculation of BoxWidth and
u32PointerAdvance is probably unnecessary, too.)

Well, you need to know how wide the box is. :wink: That’s why BoxWidth needs
to be set. There’s no harm in calculating u32PointerAdvance in advance to
help the compiler make much better decisions.

This is roving off-topic again, so I’ll leave it at this: Don’t make
the naive assumption that ASM is the only way to make code fast.

That statement is true.

Optimizers are better than most people think.

For large register machines, I’d absoltuely agree, however my personal
experience with many, many graphics apps on the PC yields completely
different results. Even with Watcom, Borland, MSVC, and gcc (and pgcc) the
results aren’t anywhere near as good as you think they are.

The x86 is register starved. I’ve seen boosts of >30% when coding tight
loops in assembly after all C optimizations had been exhausted, and in
some cases it was more than double the speed. The compiler can’t predict
how many times a given loop is going to execute when it’s passed
autovariables. Only the programmer knows that, and spending a bit of time
up front to put things in registers so they aren’t recomputed regardless
of how good the compiler’s common subexpression removal is will still
yield same or better results on any platform.

–>Neil-------------------------------------------------------------------------------
Neil Bradley In the land of the blind, the one eyed man is not
Synthcom Systems, Inc. king - he’s a prisoner.
ICQ #29402898

I ran the gprof profiler on my code and did a sample play, and I found
the areas of the code that use the most time:

15.58 0.12 0.12 581 206.54 208.81 erase_particles
12.99 0.22 0.10 950 105.26 105.26 flip
12.99 0.32 0.10 582 171.82 284.21 update_universe
7.79 0.38 0.06 5 12000.00 12000.00 ep_render_window
5.19 0.42 0.04 684 58.48 58.48 rotate
5.19 0.46 0.04 582 68.73 68.73 update_starfield
5.19 0.50 0.04 581 68.85 71.12 draw_particles
3.90 0.53 0.03 820751 0.04 0.04 putpixel

Are mostly functions that call putpixel() to draw a pixel onto a
software surface. erase_particles(), draw_particles() for example, both
call putpixel. Would it be best to integrate the putpixel code for all
the varying bit depths straight into functions that use it? Is that a
major slowdown? I remember reading that putpixel() is extremely slow,
even to software surfaces.

Yes, getting rid of calls to putpixel() is a good idea. If you look at
highly optimized graphics code, like that in SDL for example, you will
find that first the code checks the pixel depth and then goes into code
optimized for that depth. The depth check is always done outside of the
inner loop of the graphics code.

But, having said that, the time honored approach is to look at the
routine where you are spending the most time and optimizing that code.
Then do the same thing for the next most time consuming routine, and so
on. Spend your programming time on the code that needs it the most. You
might find that there are only a couple of places in your code where
getting rid of putpixel() will save you 50% or more of the total time.
In that case it isn’t worth it to rewrite all the other routines. And,
you only need to optimize until the program is as fast as you need it to
be. If you want it to run on a 100 MHz Pentium you will need to do more
work than if you want it to run on 2 GHz P4. If your target machine is a
500 MHz P3 then you don’t need to do anything.

Programmers have a tendency to want to optimize beyond what is needed to
accomplish their goals. This wastes a LOT of time. The thing to do is
pick a target machine and optimize until the code runs well on the
target machine, and then quit. Have a written statement of the target
machine and the desired performance and when you meet that goal. Stop
developing, you are done.

Of course, you goal may not be to write a game, but rather to learn
how to optimize graphics code. In that case keep going until you are
sick of it and then quit. :slight_smile:

Always know what your goals are, and how to tell when you have met them.
Otherwise you will waste a lot of time doing things that don’t get you
anywhere near your real goals.

	Bob PendletonOn Sat, 2002-12-28 at 14:21, Chris Thielen wrote:

On Fri, 2002-12-27 at 11:36, Bob Pendleton wrote:

On Wed, 2002-12-25 at 17:52, Chris Thielen wrote:

Hey SDLers,
I’m wondering if perhaps I’m doing something wrong. I have a game which
runs great and I’ve been using SDL for months but when I look at how
much cpu the game is using, on my P3/500 with nothing else running (this
is under X11, no DGA btw), it’s taking up 60-70% of the cpu. Is this
normal for games that are not hardware accelerated? It’s a simple 2D
space combat game.

Depends on what you are doing and the resolution you are doing it in.
For a “simple 2D game” running on a p3/500, 1 or 2 percent CPU usage is
more like what I would expect to see. Using X may add a couple of
percent to that, but not much more. (The 2D operations in X are about as
well optimized as they can get. They represent the work of hundreds, if
not thousands, of programmers working over that code for the last 20
years.)

The only way to find out where you are spending all your time is by
profiling the game.

The things to look for are page flipping when you don’t need it and
unsuspected pixel conversion. Most simple games benefit from using a
"dirty pixels" approach to updating the screen. Swapping the whole
screen is really expensive. And, you always want to let SDL pick the
display pixel depth and then convert your sprites to match it. If you
don’t do that then you will be paying for converting each pixel you blit
from the format it is in to the native format of the screen on each and
every blit.

Those two problems can cause dramatic slow downs in your game.

  Bob Pendleton

P.S.

If you want to learn how to write clean software rendering code you
should read the low level parts of the X server. Not only is it
excellent code but it is released under the X license which lets you use
it anyway you want to. IMHO you can learn more about graphics
programming by reading an understanding the X server than by spending a
year in computer graphics classes.


Chris Thielen


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Chris Thielen


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

±-----------------------------------+

Alright, I think I’ve sped up pixel drawing here, supporting 16 and 32
bpp:

static void draw_starfield(void) {
int i;

switch(screen_bpp) {
	case 32:
	{
		int pitch_adjust = backbuffer->pitch / sizeof(Uint32);
		for (i = 0; i < NUM_STAR; i++) {
			((Uint32 *) backbuffer->pixels)[(int)starfield[i].x +

((int)starfield[i].y * pitch_adjust)] = starfield[i].color;
dirty_pixel(starfield[i].x, starfield[i].y);
}
break;
}
case 16:
{
int pitch_adjust = backbuffer->pitch / sizeof(Uint16);
for (i = 0; i < NUM_STAR; i++) {
((Uint16 *) backbuffer->pixels)[(int)starfield[i].x +
((int)starfield[i].y * pitch_adjust)] = starfield[i].color;
dirty_pixel(starfield[i].x, starfield[i].y);
}
break;
}
default:
{
break;
}
}
}

Thanks again for the help!On Sun, 2002-12-29 at 11:27, Neil Bradley wrote:

Instead, get a pointer to the surface, and do something like this
(assuming 16bpp):

pu16SurfacePtr = &((UINT16 *) surface->pixels)[startx + (starty *
(surface->pitch / sizeof(UINT16))];
u32PointerAdvance = ((surface->pitch / sizeof(UINT16)) - xcount;

for (y = 0; y < ycount; y++)
{
for (x = 0; x < xcount; x++)
{
*pu16SurfacePtr++ = 0x1e;
}

pu16SurfacePtr += u32PointerAdvance;
}


Chris Thielen <@Christopher_Thielen>

work than if you want it to run on 2 GHz P4. If your target machine is a
500 MHz P3 then you don’t need to do anything.

Programmers have a tendency to want to optimize beyond what is needed to
accomplish their goals. This wastes a LOT of time. The thing to do is
pick a target machine and optimize until the code runs well on the
target machine, and then quit. Have a written statement of the target
machine and the desired performance and when you meet that goal. Stop
developing, you are done.

Of course, you goal may not be to write a game, but rather to learn
how to optimize graphics code. In that case keep going until you are
sick of it and then quit. :slight_smile:

Always know what your goals are, and how to tell when you have met them.
Otherwise you will waste a lot of time doing things that don’t get you
anywhere near your real goals.

  Bob Pendleton

Sounds like good advice to me, and I’ll take it. I did do some
optimizing. I got rid of putpixel(), limited the support to 16 and 32
bpp (I think most people can handle that), and optimized a function
taking the large majority of the time, a double buffering flip
technique. I optimized it to flip even less and everything seems nice
now. 30-40% CPU usage on my p3/500 with the same framerate. Thanks guys!On Mon, 2002-12-30 at 15:40, Bob Pendleton wrote:

On Fri, 2002-12-27 at 11:36, Bob Pendleton wrote:

On Wed, 2002-12-25 at 17:52, Chris Thielen wrote:

Hey SDLers,
I’m wondering if perhaps I’m doing something wrong. I have a game which
runs great and I’ve been using SDL for months but when I look at how
much cpu the game is using, on my P3/500 with nothing else running (this
is under X11, no DGA btw), it’s taking up 60-70% of the cpu. Is this
normal for games that are not hardware accelerated? It’s a simple 2D
space combat game.

Depends on what you are doing and the resolution you are doing it in.
For a “simple 2D game” running on a p3/500, 1 or 2 percent CPU usage is
more like what I would expect to see. Using X may add a couple of
percent to that, but not much more. (The 2D operations in X are about as
well optimized as they can get. They represent the work of hundreds, if
not thousands, of programmers working over that code for the last 20
years.)

The only way to find out where you are spending all your time is by
profiling the game.

The things to look for are page flipping when you don’t need it and
unsuspected pixel conversion. Most simple games benefit from using a
"dirty pixels" approach to updating the screen. Swapping the whole
screen is really expensive. And, you always want to let SDL pick the
display pixel depth and then convert your sprites to match it. If you
don’t do that then you will be paying for converting each pixel you blit
from the format it is in to the native format of the screen on each and
every blit.

Those two problems can cause dramatic slow downs in your game.

  Bob Pendleton

P.S.

If you want to learn how to write clean software rendering code you
should read the low level parts of the X server. Not only is it
excellent code but it is released under the X license which lets you use
it anyway you want to. IMHO you can learn more about graphics
programming by reading an understanding the X server than by spending a
year in computer graphics classes.


Chris Thielen <@Christopher_Thielen>


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Chris Thielen <@Christopher_Thielen>


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Chris Thielen <@Christopher_Thielen>

One thing you should alter too.

Get rid of the dirty_pixel() function ie unroll the function code to the
calling place. Function calling needs first to write all the registers
in memory then execute the function and then restore the register
contents from the memory. And because it’s done inside loop it will
really be worth it.On Tuesday 31 December 2002 10:51, Chris Thielen wrote:

On Sun, 2002-12-29 at 11:27, Neil Bradley wrote:

Instead, get a pointer to the surface, and do something like this
(assuming 16bpp):

pu16SurfacePtr = &((UINT16 *) surface->pixels)[startx + (starty *
(surface->pitch / sizeof(UINT16))];
u32PointerAdvance = ((surface->pitch / sizeof(UINT16)) - xcount;

for (y = 0; y < ycount; y++)
{
for (x = 0; x < xcount; x++)
{
*pu16SurfacePtr++ = 0x1e;
}

pu16SurfacePtr += u32PointerAdvance;
}

Alright, I think I’ve sped up pixel drawing here, supporting 16 and
32 bpp:

static void draw_starfield(void) {
int i;

switch(screen_bpp) {
case 32:
{
int pitch_adjust = backbuffer->pitch / sizeof(Uint32);
for (i = 0; i < NUM_STAR; i++) {
((Uint32 *) backbuffer->pixels)[(int)starfield[i].x +
((int)starfield[i].y * pitch_adjust)] = starfield[i].color;
dirty_pixel(starfield[i].x, starfield[i].y);
}
break;
}
case 16:
{
int pitch_adjust = backbuffer->pitch / sizeof(Uint16);
for (i = 0; i < NUM_STAR; i++) {
((Uint16 *) backbuffer->pixels)[(int)starfield[i].x +
((int)starfield[i].y * pitch_adjust)] = starfield[i].color;
dirty_pixel(starfield[i].x, starfield[i].y);
}
break;
}
default:
{
break;
}
}
}

Oh, oops. You’re right. Thanks!On Tue, 2002-12-31 at 04:47, Sami N??t?nen wrote:

On Tuesday 31 December 2002 10:51, Chris Thielen wrote:

On Sun, 2002-12-29 at 11:27, Neil Bradley wrote:

Instead, get a pointer to the surface, and do something like this
(assuming 16bpp):

pu16SurfacePtr = &((UINT16 *) surface->pixels)[startx + (starty *
(surface->pitch / sizeof(UINT16))];
u32PointerAdvance = ((surface->pitch / sizeof(UINT16)) - xcount;

for (y = 0; y < ycount; y++)
{
for (x = 0; x < xcount; x++)
{
*pu16SurfacePtr++ = 0x1e;
}

pu16SurfacePtr += u32PointerAdvance;
}

Alright, I think I’ve sped up pixel drawing here, supporting 16 and
32 bpp:

static void draw_starfield(void) {
int i;

switch(screen_bpp) {
  case 32:
  {
  	int pitch_adjust = backbuffer->pitch / sizeof(Uint32);
  	for (i = 0; i < NUM_STAR; i++) {
  		((Uint32 *) backbuffer->pixels)[(int)starfield[i].x +

((int)starfield[i].y * pitch_adjust)] = starfield[i].color;
dirty_pixel(starfield[i].x, starfield[i].y);
}
break;
}
case 16:
{
int pitch_adjust = backbuffer->pitch / sizeof(Uint16);
for (i = 0; i < NUM_STAR; i++) {
((Uint16 *) backbuffer->pixels)[(int)starfield[i].x +
((int)starfield[i].y * pitch_adjust)] = starfield[i].color;
dirty_pixel(starfield[i].x, starfield[i].y);
}
break;
}
default:
{
break;
}
}
}

One thing you should alter too.

Get rid of the dirty_pixel() function ie unroll the function code to the
calling place. Function calling needs first to write all the registers
in memory then execute the function and then restore the register
contents from the memory. And because it’s done inside loop it will
really be worth it.


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Chris Thielen <@Christopher_Thielen>