Slooow

I’ve made a small program that blits a 720x576 (24 bpp) image into a 800x600
SDL window. The problem is that its painfully slow. It takes over 200 ms to
blit one image. Much much slower than doing it with gdk_draw_rgb_image (~110
ms) from gtk+. Is it supposed to be like this? I’m looking for a way to do
it alot faster. This is for a realtime video application and it requires
much much faster video than this. The code can be found below. Ohh, the
system is a pretty ordinary pentium2 running at 400 MHz. Video card is a ATI
3D Rage Pro. X-server is XFree86 3.3.6 and current depth is 24 bpp.

Thanks in advance
Mattias Blomqvist

if ( SDL_Init(SDL_INIT_VIDEO) < 0 ) {
return -1;
}

screen = SDL_SetVideoMode(800, 600, 24, SDL_SWSURFACE);
if ( screen == NULL ) {
ComplainAndExit();
}
fread(ind,1,120000+PAL*24000,f);
screen2 = SDL_CreateRGBSurface(SDL_HWSURFACE, FRAME_WIDTH, FRAME_HEIGHT,
24, 0xff, 0xff00, 0xff0000,0);
dstrect.x = (screen->w-screen2->w)/2;
dstrect.y = (screen->h-screen2->h)/2;
dstrect.w = screen2->w;
dstrect.h = screen2->h;
screen2->pixels = frame;

while (!done && (getnextframe(frame) != 0)) {
if ( SDL_BlitSurface(screen2, NULL, screen, &dstrect) < 0 ) {
SDL_FreeSurface(screen2);
return -1;
}
SDL_UpdateRects(screen, 1, &dstrect);
while ( SDL_PollEvent(&event) ) {
switch (event.type) {
case SDL_KEYDOWN:
if ( event.key.keysym.sym == SDLK_ESCAPE ) {
done = 1;
}
break;
case SDL_QUIT:
done = 1;
break;
default:
break;
}
}
}
SDL_Quit();

Mattias.Blomqvist at nokia.com wrote:

I’ve made a small program that blits a 720x576 (24 bpp) image into a 800x600
SDL window. The problem is that its painfully slow. It takes over 200 ms to
blit one image. Much much slower than doing it with gdk_draw_rgb_image (~110
ms) from gtk+. Is it supposed to be like this? I’m looking for a way to do
it alot faster. This is for a realtime video application and it requires
much much faster video than this. The code can be found below. Ohh, the
system is a pretty ordinary pentium2 running at 400 MHz. Video card is a ATI
3D Rage Pro. X-server is XFree86 3.3.6 and current depth is 24 bpp.

Thanks in advance
Mattias Blomqvist

if ( SDL_Init(SDL_INIT_VIDEO) < 0 ) {
return -1;
}

screen = SDL_SetVideoMode(800, 600, 24, SDL_SWSURFACE);
if ( screen == NULL ) {
ComplainAndExit();
}
fread(ind,1,120000+PAL*24000,f);
screen2 = SDL_CreateRGBSurface(SDL_HWSURFACE, FRAME_WIDTH, FRAME_HEIGHT,
24, 0xff, 0xff00, 0xff0000,0);
dstrect.x = (screen->w-screen2->w)/2;
dstrect.y = (screen->h-screen2->h)/2;
dstrect.w = screen2->w;
dstrect.h = screen2->h;
screen2->pixels = frame;

while (!done && (getnextframe(frame) != 0)) {
if ( SDL_BlitSurface(screen2, NULL, screen, &dstrect) < 0 ) {
SDL_FreeSurface(screen2);
return -1;
}
SDL_UpdateRects(screen, 1, &dstrect);
while ( SDL_PollEvent(&event) ) {
switch (event.type) {
case SDL_KEYDOWN:
if ( event.key.keysym.sym == SDLK_ESCAPE ) {
done = 1;
}
break;
case SDL_QUIT:
done = 1;
break;
default:
break;
}
}
}
SDL_Quit();

I don’t know too much about SDL, but I do know that 24 bpp modes tend to be
slower than 16 or 32… You could try switching it and see what it does… (=

    David

I’ve made a small program that blits a 720x576 (24 bpp) image into a 800x600
SDL window. The problem is that its painfully slow. It takes over 200 ms to
blit one image. Much much slower than doing it with gdk_draw_rgb_image (~110
ms) from gtk+.

I doubt you will get it fast enough with MIT-SHM, if that is what you are
trying to do. Try x11perf -shmput500 to get a feeling for what is possible.
Make sure you are not trying to blit a 24bpp image onto a 32bpp
surface; that conversion will take some time. I don’t think XFree 3.x
supports 24bpp visuals (depth 24 will give you 32bpp).

Remember that you are shoving 1.6 megs from memory, through the CPU and to
the frame buffer each time. If 16bpp is possible, blits will be around
twice as fast.

– Mattias

I’ve made a small program that blits a 720x576 (24 bpp)
image into a 800x600
SDL window. The problem is that its painfully slow. It takes
over 200 ms to

blit one image. Much much slower than doing it with
gdk_draw_rgb_image (~110
ms) from gtk+.

I doubt you will get it fast enough with MIT-SHM, if that is
what you are
trying to do. Try x11perf -shmput500 to get a feeling for
what is possible.
Make sure you are not trying to blit a 24bpp image onto a 32bpp
surface; that conversion will take some time. I don’t think
XFree 3.x
supports 24bpp visuals (depth 24 will give you 32bpp).

Remember that you are shoving 1.6 megs from memory, through
the CPU and to
the frame buffer each time. If 16bpp is possible, blits
will be around
twice as fast.

I tried using a 32 bpp image and it is faster now.
SDL_BlitSurface takes 50 ms and the SDL_UpdateRects takes
another 30 ms. Still not fast enough though.
x11perf -shmput500 says it takes 22.9 ms to put a 500x500 image.

Mattias Blomqvist

I tried using a 32 bpp image and it is faster now.
SDL_BlitSurface takes 50 ms and the SDL_UpdateRects takes
another 30 ms. Still not fast enough though.
x11perf -shmput500 says it takes 22.9 ms to put a 500x500 image.

That seems very wrong. What depth is your display set to?
SDL_BlitSurface() shouldn’t take 50 ms for even a 1024x768 blit, since
it’s using inline assembly version of memcpy(), unless it’s doing a
complicated unoptimized conversion blit (which almost never happens)

What is the output of the testvidinfo program in the test subdirectory?

What’s the output of cat /proc/cpuinfo?

See ya,
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

I tried using a 32 bpp image and it is faster now.
SDL_BlitSurface takes 50 ms and the SDL_UpdateRects takes
another 30 ms. Still not fast enough though.
x11perf -shmput500 says it takes 22.9 ms to put a 500x500 image.

This would translate to about (720576)/(500500) * 22.9 = 38 ms in your
case, so with 30ms, you are lucky. What do you really need SDL_BlitSurface
for? Can’t you blast your bits directly into the screen buffer, run
SDL_UpdateRects and be done with it? (Be sure to check the RGB masks first,
though.)

– Mattias

This would translate to about (720576)/(500500) * 22.9 = 38 ms in your
case, so with 30ms, you are lucky. What do you really need SDL_BlitSurface
for? Can’t you blast your bits directly into the screen buffer, run
SDL_UpdateRects and be done with it? (Be sure to check the RGB masks first,
though.)
How would a person “blast the bits?” . What is the difference between
that and SDL_BlitSurface???

Dave>

– Mattias

I tried using a 32 bpp image and it is faster now.
SDL_BlitSurface takes 50 ms and the SDL_UpdateRects takes
another 30 ms. Still not fast enough though.
x11perf -shmput500 says it takes 22.9 ms to put a 500x500 image.

This would translate to about (720576)/(500500) * 22.9 = 38 ms in your
case, so with 30ms, you are lucky. What do you really need SDL_BlitSurface
for? Can’t you blast your bits directly into the screen buffer, run
SDL_UpdateRects and be done with it? (Be sure to check the RGB masks first,
though.)

Hmmm… Never thought of that. I’ll try it tomorrow. Even if I get it
down to 30 ms its still too slow. What do i have to use to get hw
accelerated blits? Something else than X11 i presume, but what?

/Mattias

How would a person “blast the bits?” . What is the difference between
that and SDL_BlitSurface???

I got the impression that he was reading frames from a file to an intermediate
surface, then doing SDL_BlitSurface to the screen buffer. Eliminating the
middle-man could be a win.

I tried using a 32 bpp image and it is faster now.
SDL_BlitSurface takes 50 ms and the SDL_UpdateRects takes
another 30 ms. Still not fast enough though.
x11perf -shmput500 says it takes 22.9 ms to put a 500x500 image.

That seems very wrong. What depth is your display set to?
SDL_BlitSurface() shouldn’t take 50 ms for even a 1024x768 blit, since
it’s using inline assembly version of memcpy(), unless it’s doing a
complicated unoptimized conversion blit (which almost never happens)

Depth is 32.

What is the output of the testvidinfo program in the test
subdirectory?

A window manager is available
Current display: 32 bits-per-pixel
Red Mask = 0x00ff0000
Green Mask = 0x0000ff00
Blue Mask = 0x000000ff
No special fullscreen video modes

What’s the output of cat /proc/cpuinfo?

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 5
model name : Pentium II (Deschutes)
stepping : 2
cpu MHz : 397.956046
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov
pat pse36 mmx fxsr
bogomips : 397.31

/Mattias

Mattias.Blomqvist at nokia.com wrote:

That seems very wrong. What depth is your display set to?
SDL_BlitSurface() shouldn’t take 50 ms for even a 1024x768 blit, since
it’s using inline assembly version of memcpy(), unless it’s doing a
complicated unoptimized conversion blit (which almost never happens)

Depth is 32.

Just FYI, as this is a programming forum and this is an often confused
thing: in this case, the color depth is 24 (the number of bits actually
used) and the bits per pixel is 32 (the number of bits a single pixel
occupies in the frame buffer memory).

Color depth <= bits per pixel, all the time. There are some uncommon
video boards that can do a 30 bits color depth, apparently (according to
Dirk Hohndel).–
Pierre Phaneuf
http://ludusdesign.com/

Yepp. But that wasnt at all what we discussed. Using 32 bits per pixel is
faster than using 24. Although only 24 of the bits are used for
colorinformation since this is the way that the X-server does. Anyway, it
was still much slower than expected.

/Mattias> -----Original Message-----

From: EXT Pierre Phaneuf [mailto:pp at ludusdesign.com]
Sent: 13. April 2000 15:54
To: sdl at lokigames.com
Subject: [SDL] Re: slooow

Mattias.Blomqvist at nokia.com wrote:

That seems very wrong. What depth is your display set to?
SDL_BlitSurface() shouldn’t take 50 ms for even a
1024x768 blit, since

it’s using inline assembly version of memcpy(), unless
it’s doing a

complicated unoptimized conversion blit (which almost
never happens)

Depth is 32.

Just FYI, as this is a programming forum and this is an often confused
thing: in this case, the color depth is 24 (the number of
bits actually
used) and the bits per pixel is 32 (the number of bits a single pixel
occupies in the frame buffer memory).

Color depth <= bits per pixel, all the time. There are some uncommon
video boards that can do a 30 bits color depth, apparently
(according to
Dirk Hohndel).

Mattias.Blomqvist at nokia.com wrote:

Yepp. But that wasnt at all what we discussed. Using 32 bits per pixel is
faster than using 24. Although only 24 of the bits are used for
colorinformation since this is the way that the X-server does. Anyway, it
was still much slower than expected.

Apparently, 24 bits per pixel is actually faster, but it makes
computation on the pixel values more complicated and difficult. But
transferring 24 bpp pixmaps to the video board takes 75% of the time it
takes to transfer the same pixmap in 32 bpp. 25% is not negligible!–
Pierre Phaneuf
Systems Exorcist

Apparently, 24 bits per pixel is actually faster, but it makes
computation on the pixel values more complicated and difficult. But
transferring 24 bpp pixmaps to the video board takes 75% of the time it
takes to transfer the same pixmap in 32 bpp. 25% is not negligible!

For moving huge blocks this is true, I guess…

But I believe (and I’m unfortunately no expert) that the reason 32-bit is
usually considered faster than 24-bit is because of the alignment of bytes.

CPU’s are MUCH better at moving 1, 2, 4, etc. bytes at a time than they
are 3, 5, etc. bytes at a time.

-bill!

Also correct. if the pixmap were actually transferred with 24 bpp. As it
is now, it is transferred in 32 bpp since the X-server doesnt support 24
bpp and therefore converts it to 32 bpp in software. Making it slower.
It’s more of an academic question than a practical one anyway. It got
alot faster using 32 bpp than 24 bpp.

/Mattias

Pierre Phaneuf wrote:>

Mattias.Blomqvist at nokia.com wrote:

Yepp. But that wasnt at all what we discussed. Using 32 bits per pixel is
faster than using 24. Although only 24 of the bits are used for
colorinformation since this is the way that the X-server does. Anyway, it
was still much slower than expected.

Apparently, 24 bits per pixel is actually faster, but it makes
computation on the pixel values more complicated and difficult. But
transferring 24 bpp pixmaps to the video board takes 75% of the time it
takes to transfer the same pixmap in 32 bpp. 25% is not negligible!


Pierre Phaneuf
Systems Exorcist

CPU’s are MUCH better at moving 1, 2, 4, etc. bytes at a time than they
are 3, 5, etc. bytes at a time.

True, but RAM bandwidth is also a factor, and 24bpp buys you a 25%
advantage. The fastest transfer method probably depends on how much data
you’re moving, your bus speed, etc.

Dan