Request

Can you please look at gj.pointblue.com.pl/gr.tar.gz program and tell me,
how can it be optimized for rendering speed (it’s just test version, so
… ).

Grzegorz Jaskiewicz
C/C++/PERL/PHP/SQL Programmer

Wednesday 05 June 2002 15:33:

Can you please look at gj.pointblue.com.pl/gr.tar.gz program and tell me,
how can it be optimized for rendering speed (it’s just test version, so
… ).
Your stuff runs for me with about 50fps, which is I think reasonable
framerate (PIII 933, GF256). In your program the blits and updates take up on
average 19.7ms, out of which 12.2ms is spent in the last fullscreen redraw
(SDL_UpdateRect around line 120). You can’t really get around this last
redraw as your whole screen really changes, so you can at most gain some of
those 7ms that are spent in the blitting. Looking a bit further you will see
that it is blitting the background tiles that takes most of those 7ms. You
might gain some speed there, but it won’t be much.
The problem is that having a SW surface as your display the SDL_UpdateRect
does the slow system-memory -> VRAM blit over which you don’t have any
control.
Regards,
Jacek–
±------------------------------------+
|from: J.C.Wojdel |
| J.C.Wojdel at cs.tudelft.nl |
±------------------------------------+

Oops, in my last post I said that it runs on about 50fps… wrong; I just
trusted your program. Unfortunately it is wrong: using clock() you get the
amount of time spent by CPU in your process. Use SDL_GetTicks instead and
you’ll see that the framereate is 35 (or 30% lower than whatever rate you
had). What your program measures is the fps it could potentially do if the
CPU did nothing else. The difference of 15fps suggests that you lose as much
as 30% of the time for the system. Looking at your program I would say that
you lose it on switching contexts between threads, but I might be wrong.
Regards,
Jacek–
±------------------------------------+
|from: J.C.Wojdel |
| J.C.Wojdel at cs.tudelft.nl |
±------------------------------------+

At 12:20 PM 6/6/02 +0200, you wrote:

Your stuff runs for me with about 50fps, which is I think reasonable
framerate (PIII 933, GF256). In your program the blits and updates take up on
average 19.7ms, out of which 12.2ms is spent in the last fullscreen redraw
The problem is that having a SW surface as your display the SDL_UpdateRect
does the slow system-memory -> VRAM blit over which you don’t have any

I cannot believe that an AGP card needs 12ms to do a memcpy to the gfxcard,
I fear the problem is that the SDL_UpdateRect has to do a conversion from
the bpp format used in the SW surface and the REAL bpp format of the screen.

You CAN gain a lot by allocating a sw surface with the same format as the
hw one if this is the case. (if you are windowed it’s enough to set bpp = 0
in SetVideoMode() ).

Bye,
Gabry (gabrielegreco at tin.it)

Yes I know that, but it still takes a lot of time. In the program in
question Grzegorz allocates the surfaces properly and then updates almost
full screen - that really takes 12ms. Just to check whether it isn’t related
to some other things that he does in the code I wrote the following:

#include <stdio.h>
#include “SDL.h”

int main() {
SDL_Surface* s;
long stime, etime;
int i;

atexit(SDL_Quit);
SDL_Init( SDL_INIT_VIDEO );
s=SDL_SetVideoMode( 800, 600, 0, SDL_SWSURFACE );

stime = SDL_GetTicks();
for( i=0;i<1000;i++ )
SDL_UpdateRect( s, 0, 0, s->w, s->h );
etime = SDL_GetTicks();

printf( “Average update time: %.1fms\n”, (etime-stime)/1000. );

return 0;
}

The output:
Average update time: 14.2ms

Do you really think it should be faster ? If so, what’s wrong ?
More info on my system:
PIII 933
nVidia GeForce256 SDR
Linux kernel 2.4.18
nVidia drivers 1.0-2880
XFree86 4,1
1280x1024 24bppOn Thu, Jun 06, 2002 at 01:22:06PM +0200, Gabriele Greco wrote:

I cannot believe that an AGP card needs 12ms to do a memcpy to the gfxcard,
I fear the problem is that the SDL_UpdateRect has to do a conversion from
the bpp format used in the SW surface and the REAL bpp format of the screen.


±------------------------------------+
|from: J.C.Wojdel |
| J.C.Wojdel at cs.tudelft.nl |
±------------------------------------+

At 03:39 PM 6/6/02 +0200, you wrote:

Average update time: 14.2ms
Do you really think it should be faster ? If so, what’s wrong ?
PIII 933
nVidia GeForce256 SDR
Linux kernel 2.4.18
nVidia drivers 1.0-2880
XFree86 4,1
1280x1024 24bpp

I’ve made some tests and on my system here at work and I’ve also worst
results:

Average update time: 25ms

My system is anyway older than yours:

K6-3 450
Matrox G400 (AGP 1x for MB limitations)
Linux 2.4.10
XFree86 4.2
1280x1024 24bpp (32 real)

Anyway with some counts the amount of time is not that strange…

800x600x4 = 1920000

1920000 x 50 = 96000000

So you need 96MB/sec of bandwidth to have a frame each 20ms (50fps), that’s
a lot of datas, since you have also to read them from memory and cache
don’t help in this kind of operations…

I suggest you to try to make the game run in 16bit, so you could have
almost double framerates and the quality will be almost equal.

Bye,
Gabry (gabrielegreco at tin.it)

The original program only use 800x600, so I guess it isn’t really a problem.
After I removed the sleep and thread stuff and put paint() directly into
main loop, it runs two as faster as before. I got about 46fps on my laptop,
a PIII 500 with 8M S3 Savage IX card.

Whole screen update can certainly be avoided as long as you don’t need
the background scrolling to be at full speed. 50fps looks like too much
to me if just for the background.

So let say, if your background only needs scrolling every 2 frames, you
can then save half of the full screen update time, which can be spent
on other stuffs besides drawing. Of course you need to calculate dirty
rectangles and use SDL_UpdateRects() for the sprites when your background
does not change.

I’ll test it again using my own GUI codes, see if I can get better result :0

Regards,
.paul.On Thu, Jun 06, 2002 at 04:35:38PM +0200, Gabriele Greco wrote:

At 03:39 PM 6/6/02 +0200, you wrote:

Average update time: 14.2ms
Do you really think it should be faster ? If so, what’s wrong ?
PIII 933
nVidia GeForce256 SDR
Linux kernel 2.4.18
nVidia drivers 1.0-2880
XFree86 4,1
1280x1024 24bpp

I’ve made some tests and on my system here at work and I’ve also worst
results:

Average update time: 25ms

My system is anyway older than yours:

K6-3 450
Matrox G400 (AGP 1x for MB limitations)
Linux 2.4.10
XFree86 4.2
1280x1024 24bpp (32 real)

Anyway with some counts the amount of time is not that strange…

800x600x4 = 1920000

1920000 x 50 = 96000000

So you need 96MB/sec of bandwidth to have a frame each 20ms (50fps), that’s
a lot of datas, since you have also to read them from memory and cache
don’t help in this kind of operations…

I suggest you to try to make the game run in 16bit, so you could have
almost double framerates and the quality will be almost equal.

Bye,
Gabry (gabrielegreco at tin.it)


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Wednesday 05 June 2002 15:33:
The problem is that having a SW surface as your display the
SDL_UpdateRect
does the slow system-memory -> VRAM blit over which you don’t have any
control.

Remember that this is Linux only game, i thought no hardware surfaces
under X11 as user. And yes, i am using bpp==0 to be sure system uses one
and only right bpp.

I didn’t new that clock is only measuring process time (i guess this is
because i was using that under watcom/dos4gw/m$dos so process
time=(almost)whole time).

I have ages older system,it’s SMP 2x200MHz Ppro, ,64 MB of ram on Voodoo
III on PCI. My test program gives me around 20fsp (without your change).

About thread, i am using that to make possible rendering + event
management (in future of coz.). tests provides me that it doesn’t make
any difference.

Still i am after checking if drawing whole surface from pieces in two
loops is the best solution:

for(x){
for(y){
paintlittle(x48,y48,48,48);
}
}

maybe by changing x loop with y loop it will speed up ?
maybe i should prerender whole background and blit it once (what IMO
kills cache).

if somebody has any other idea, i will be glad to hear them. I was
trying to cause discussion about such simple tasks, so if somebody has
any ideas how to speed this v.simple program - go ahead :wink:
i am open (but not in source).

GJ.On Thu, 2002-06-06 at 10:20, Jacek Wojdel wrote:

At 12:20 PM 6/6/02 +0200, you wrote:

Your stuff runs for me with about 50fps, which is I think reasonable
framerate (PIII 933, GF256). In your program the blits and updates take up on
average 19.7ms, out of which 12.2ms is spent in the last fullscreen redraw
The problem is that having a SW surface as your display the SDL_UpdateRect
does the slow system-memory -> VRAM blit over which you don’t have any

I cannot believe that an AGP card needs 12ms to do a memcpy to the gfxcard,

You don’t? Try it. You don’t even need a particularly high resolution to get 12 ms.

That “memcpy” must be a busmaster DMA transfer (ie done by the video card), or you’ll get nowhere near the theoretical transfer rate of the AGP bus.

I fear the problem is that the SDL_UpdateRect has to do a conversion from
the bpp format used in the SW surface and the REAL bpp format of the screen.

Yeah, that can make things even worse, whether or not there is DMA.

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn Thu, 06/06/2002 13:22:06 , Gabriele Greco wrote:

Ok, I rewrite the same program as follows:

  1. SDL_Flip() under hardware surface, or dirty rectangle detection
    under software surface;

  2. no SDL_Delay() in the code, the main loop just loop as fast as it can be,
    while all scrolling/animation are calculated based on the time passed
    instead of frame.

  3. the background scrolls 1 pixel at 25hz, the sprite moves down 2 pixels
    at 25hz too. The sprite animates to next frame at 10hz.

  4. The sprite appears randomly from the top, and on average about 17 sprites
    on screen at the same time.

  5. the FPS is counted every 2 seconds

I’ve tested both under Linux and Windows, fullscreen and window mode. The
result looks interesting:

On Linux (PIII 450, nVidia TNT2 16M) XFree 4.2 screen resolution 1152x864

800x600x16 window mode FPS from 50 to 60
800x600x16 fullscreen mode FPS from 50 to 60
800x600x32 window mode FPS at 20
800x600x32 fullscreen mode FPS at 20

On Windows (PIII 450, ATI 3D Rage IIc 4M) screen resolution 1024x768,
monitor at 75Hz.

800x600x16 window mode FPS anywhere between 150 to 250
800x600x16 fullscreen mode FPS at 37.5
800x600x32 window mode FPS at 27
800x600x32 fullscreen mode FPS at 25

The conclusion is:

  1. SDL on Linux does not support hardware surface in fullscreen;

  2. 16-bit color is about twice as fast as 32-bit under software surface,
    but only slightly faster under hardware surface.

It is strange that on Windows the test performs poor under fullscreen mode
than window mode. Normally using SDL_Flip() under fullscreen mode you can
get best at screen’s vertical refresh rate. Am I correct to say that
if it couldn’t get up to 75 Hz, it falls at half of it?

Regards,
.paul.On Thu, Jun 06, 2002 at 11:34:07PM +0800, paul at theV.net wrote:

The original program only use 800x600, so I guess it isn’t really a problem.
After I removed the sleep and thread stuff and put paint() directly into
main loop, it runs two as faster as before. I got about 46fps on my laptop,
a PIII 500 with 8M S3 Savage IX card.

Whole screen update can certainly be avoided as long as you don’t need
the background scrolling to be at full speed. 50fps looks like too much
to me if just for the background.

So let say, if your background only needs scrolling every 2 frames, you
can then save half of the full screen update time, which can be spent
on other stuffs besides drawing. Of course you need to calculate dirty
rectangles and use SDL_UpdateRects() for the sprites when your background
does not change.

I’ll test it again using my own GUI codes, see if I can get better result :0

Regards,
.paul.

On Thu, Jun 06, 2002 at 04:35:38PM +0200, Gabriele Greco wrote:

At 03:39 PM 6/6/02 +0200, you wrote:

Average update time: 14.2ms
Do you really think it should be faster ? If so, what’s wrong ?
PIII 933
nVidia GeForce256 SDR
Linux kernel 2.4.18
nVidia drivers 1.0-2880
XFree86 4,1
1280x1024 24bpp

I’ve made some tests and on my system here at work and I’ve also worst
results:

Average update time: 25ms

My system is anyway older than yours:

K6-3 450
Matrox G400 (AGP 1x for MB limitations)
Linux 2.4.10
XFree86 4.2
1280x1024 24bpp (32 real)

Anyway with some counts the amount of time is not that strange…

800x600x4 = 1920000

1920000 x 50 = 96000000

So you need 96MB/sec of bandwidth to have a frame each 20ms (50fps), that’s
a lot of datas, since you have also to read them from memory and cache
don’t help in this kind of operations…

I suggest you to try to make the game run in 16bit, so you could have
almost double framerates and the quality will be almost equal.

Bye,
Gabry (gabrielegreco at tin.it)


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

  1. SDL on Linux does not support hardware surface in fullscreen;

That’s documented, if you want them you should use SDL_VIDEODRIVER=dga ad
run you app as root.

  1. 16-bit color is about twice as fast as 32-bit under software surface,
    but only slightly faster under hardware surface.

That’s quite obvious too, if you blit at hand (software surface) you have
to copy twice the data, hw blits instead are always done in the internal
graphics card format, that usually is 32bit for moder cards.

It is strange that on Windows the test performs poor under fullscreen mode
than window mode. Normally using SDL_Flip() under fullscreen mode you can
get best at screen’s vertical refresh rate. Am I correct to say that
if it couldn’t get up to 75 Hz, it falls at half of it?

Because you wait for retrace with SDL_Flip() on windows!

So if you finish the blit while the retrace pen is just after the top of
the image you have to wait a complete frame (at 100hz is 10ms) before
SDL_Flip() returns!

Bye,
Gabry (gabrielegreco at tin.it)