Slowdown of alpha blits when colorkey is enabled

i experence strange behalfier when doing per surface alpha enabled
blits. setting also a colorkey decreases speed dramatically.

i have written a short program to demonstrate this:

#include <stdio.h>
#include <SDL.h>

int main(int argc, char **argv) {
unsigned int rgb, bpp, flg = SDL_SWSURFACE | SDL_ANYFORMAT;
SDL_Surface *screen, *sfc1, *sfc2;
SDL_Rect r1 = { 240, 200, 0, 0 };
SDL_Rect r2 = r1;
int x;

SDL_Init(SDL_INIT_VIDEO);
screen = SDL_SetVideoMode(800, 600, 0, flg);
bpp = screen->format->BitsPerPixel;
sfc1 = SDL_CreateRGBSurface(flg, 320, 200, bpp, 0, 0, 0, 0);
sfc2 = SDL_CreateRGBSurface(flg, 320, 200, bpp, 0, 0, 0, 0);
rgb = SDL_MapRGB(sfc1->format, 0xff, 0x00, 0xff);
if (argc>1) {
SDL_SetColorKey(sfc1, SDL_SRCCOLORKEY, rgb);
SDL_SetColorKey(sfc2, SDL_SRCCOLORKEY, rgb);
}
SDL_SetAlpha(sfc1, SDL_SRCALPHA, 128);
SDL_SetAlpha(sfc2, SDL_SRCALPHA, 128);
rgb = SDL_MapRGB(sfc1->format, 0xdd, 0x11, 0x11);
SDL_FillRect(sfc1, NULL, rgb);
rgb = SDL_MapRGB(sfc1->format, 0x11, 0x11, 0xdd);
SDL_FillRect(sfc2, NULL, rgb);
rgb = SDL_MapRGB(sfc1->format, 0xff, 0xff, 0xff);
SDL_FillRect(screen, NULL, rgb);
for (x=0; x<480; x++) {
r2.x = x;
SDL_BlitSurface(sfc1, NULL, screen, &r1);
SDL_BlitSurface(sfc2, NULL, screen, &r2);
SDL_Flip(screen);
SDL_FillRect(screen, &r1, rgb);
SDL_FillRect(screen, &r2, rgb);
}
return (0);
}

gcc -Wall sdl-config --cflags --libs -o bla bla.c

time ./bla
real 0m10.134s

time ./bla 1
real 0m14.180s

i understand that enabling colorkey means an additional branch to the
blitter, but i wouldn’t expect that much of a slowdown. (note: the
numbers are much to low, because it measures not only the blits but
also initializing SDL, clearing the screen and flipping it.)

is there something i can do about this? can SDL be improved in that
direction or is it even a bug in my program?

many thx …
clemens

i experence strange behalfier when doing per surface alpha enabled
blits. setting also a colorkey decreases speed dramatically.

i have written a short program to demonstrate this:

[code]

gcc -Wall sdl-config --cflags --libs -o bla bla.c

time ./bla
real 0m10.134s

time ./bla 1
real 0m14.180s

i understand that enabling colorkey means an additional branch to the
blitter, but i wouldn’t expect that much of a slowdown. (note: the
numbers are much to low, because it measures not only the blits but
also initializing SDL, clearing the screen and flipping it.)

is there something i can do about this? can SDL be improved in that
direction or is it even a bug in my program?

The additional branching is very expensive, especially on recent cpus like the P4 (as it can not be accurately predicted). Also, there are lots of blitting functions for misc pixel formats, so t’s possible that one of the m isn’t optimized very well.

Now some questions :
What version of SDL are you using ?
What is the value of bpp ?
Did you try displayformating your surfaces ? You’ll get higher speed blits that way, as no no-the-fly conversion will take place.

Stephane

Stephane Marchesin <stephane.marchesin at wanadoo.fr> wrote:

i understand that enabling colorkey means an additional branch to
the blitter, but i wouldn’t expect that much of a slowdown. (note:
the numbers are much to low, because it measures not only the blits
but also initializing SDL, clearing the screen and flipping it.)

is there something i can do about this? can SDL be improved in that
direction or is it even a bug in my program?

The additional branching is very expensive, especially on recent cpus
like the P4 (as it can not be accurately predicted). Also, there are
lots of blitting functions for misc pixel formats, so t’s possible
that one of them isn’t optimized very well.

in the latter case, there would be room for improvment within sdl.

Now some questions :
What version of SDL are you using ?

cvs from today. i wanted to be sure. sorry, i thought i had told so.

What is the value of bpp ?

16 in my case.

Did you try displayformating your surfaces ?

i don’t think this is needed because i create the surface with
SDL_CreateSurface() and pixeldepth of screen.

You’ll get higher speed
blits that way, as no no-the-fly conversion will take place.

i know. in real applications i allways do it.

thx for your comments …
clemens

Clemens Kirchgatterer wrote:

Stephane Marchesin <@Stephane_Marchesin> wrote:

i understand that enabling colorkey means an additional branch to
the blitter, but i wouldn’t expect that much of a slowdown. (note:
the numbers are much to low, because it measures not only the blits
but also initializing SDL, clearing the screen and flipping it.)

is there something i can do about this? can SDL be improved in that
direction or is it even a bug in my program?

The additional branching is very expensive, especially on recent cpus
like the P4 (as it can not be accurately predicted). Also, there are
lots of blitting functions for misc pixel formats, so t’s possible
that one of them isn’t optimized very well.

in the latter case, there would be room for improvment within sdl.

There’s always room for improvement :slight_smile:

I just looked and there are no optimized functions for colorkeyed blitting.
Plus, the branch is hard for the cpu to predict, and a misprediction
costs at least 15 cpu cycles on any modern cpu, compare this to the
number of cycles needed to blit a pixel (far fewer than 5, depending on
the pixel format) and you have a good explanation for the slower speed.
i686 and upper CPUs have an instruction (cmov) for conditional move that
avoids branches, but the blitting code doesn’t lend itself well to this
optimization, so gcc won’t be able to do it.

The function you’re interested in is called BlitNtoNSurfaceAlphaKey and
is in SDL12/src/video/SDL_blit_A.c, you might want to see what you can
do to speed it up.

[…]

Did you try displayformating your surfaces ?

i don’t think this is needed because i create the surface with
SDL_CreateSurface() and pixeldepth of screen.

That’s not the problem here anyway, but still :

  • that can be a problem if you use, for example, two different pixel
    format with the same bpp
  • backends might do other transformations than just changing pixel
    format on the surface
    so I think displayformating is just the way to go to be on the safe side.

Stephane

Stephane Marchesin <stephane.marchesin at wanadoo.fr> wrote:

Clemens Kirchgatterer wrote:

Stephane Marchesin <stephane.marchesin at wanadoo.fr> wrote:

in the latter case, there would be room for improvment within sdl.

There’s always room for improvement :slight_smile:

I just looked and there are no optimized functions for colorkeyed
blitting. Plus, the branch is hard for the cpu to predict, and a
misprediction costs at least 15 cpu cycles on any modern cpu, compare
this to the number of cycles needed to blit a pixel (far fewer than 5,
depending on the pixel format) and you have a good explanation for the
slower speed. i686 and upper CPUs have an instruction (cmov) for
conditional move that avoids branches, but the blitting code doesn’t
lend itself well to this optimization, so gcc won’t be able to do it.

The function you’re interested in is called BlitNtoNSurfaceAlphaKey
and is in SDL12/src/video/SDL_blit_A.c, you might want to see what you
can do to speed it up.

thank you very much. though, i fear my 6510 assembly knowledge will not
be to helpful. :-/
anyway, i will have a look …

  • that can be a problem if you use, for example, two different pixel
    format with the same bpp

how could i end up with this? only when i create surfaces with
different maskX values. correct?

  • backends might do other transformations than just changing pixel
    format on the surface, so I think displayformating is just the way to
    go to be on the safe side.

so you do a DisplayFormat after each CreateSurface ?

thank you very much for your suggestions…
clemens

Hello, Clemens!

CK> i have written a short program to demonstrate this:

CK> SDL_SetColorKey(sfc1, SDL_SRCCOLORKEY, rgb);
CK> SDL_SetColorKey(sfc2, SDL_SRCCOLORKEY, rgb);

Looks like your video driver (real of SDL driver wrapper) or hardware
doesn’t support colorkey+alpha blitting, adding SDL_RLEACCEL you can enable
RLE acceleration, at least it will be faster.

With best regards, Mike Gorchak. E-mail: @Mike_Gorchak

Clemens Kirchgatterer wrote:

The function you’re interested in is called BlitNtoNSurfaceAlphaKey
and is in SDL12/src/video/SDL_blit_A.c, you might want to see what you
can do to speed it up.

thank you very much. though, i fear my 6510 assembly knowledge will not
be to helpful. :-/
anyway, i will have a look …

  • that can be a problem if you use, for example, two different pixel
    format with the same bpp

how could i end up with this? only when i create surfaces with
different maskX values. correct?

Yes, if you create all your surfaces with CreateRGBSurface and the same
bpp and flags, they’ll have the same format.

But the formats created by CreateRGBSurface are constant and depend on
the bpp only :
16 bpp -> R5G6B5
24 bpp -> R8G8B8
So if some weird backend does RGB444 or some other strange format,
on-the-fly conversion will happen.

(a bit off topic : if you set the colorkey or the alpha flag, the bpp
can be changed on the surface in favor of the video bpp if the video
driver supports colorkey or alpha blits in h/w. That’s not documented
but causes bug on some programs)

  • backends might do other transformations than just changing pixel
    format on the surface, so I think displayformating is just the way to
    go to be on the safe side.

so you do a DisplayFormat after each CreateSurface ?

For surfaces that will subsequently blitted to the screen, yes.
Inter surface blitting is something totally different from a performance
viewpoint (as that is mostly done by the cpu, and depends on the surface
size too).

Stephane