Alpha blit with alpha == 0 check

Hi

In current code is only alpha == 255 check. This code speedup Alpha
blit by adding alpha == 0 check. This help when you have a lot of
pixels with A == 0 ( full transparent ).

Rafal
-------------- next part --------------
diff -u -r SDL12/src/video/SDL_blit_A.c SDL12a/src/video/SDL_blit_A.c
— SDL12/src/video/SDL_blit_A.c Wed Mar 6 12:23:02 2002
+++ SDL12a/src/video/SDL_blit_A.c Sat Feb 15 19:26:01 2003
@@ -278,9 +278,10 @@
compositioning used (>>8 instead of /255) doesn’t handle
it correctly. Also special-case alpha=0 for speed?
Benchmark this! */

  •   if(alpha == SDL_ALPHA_OPAQUE) {
    
  •   if(alpha) {   
    
  •     if(alpha == SDL_ALPHA_OPAQUE) {
          *dstp = (s & 0x00ffffff) | (*dstp & 0xff000000);
    
  •   } else {
    
  •     } else {
          /*
           * take out the middle component (green), and process
           * the other two in parallel. One multiply less.
    

@@ -294,6 +295,7 @@
d &= 0xff00;
d = (d + ((s - d) * alpha >> 8)) & 0xff00;
*dstp = d1 | d | dalpha;

  •     }
      }
      ++srcp;
      ++dstp;
    

@@ -500,10 +502,11 @@
compositioning used (>>8 instead of /255) doesn’t handle
it correctly. Also special-case alpha=0 for speed?
Benchmark this! */

  •   if(alpha == (SDL_ALPHA_OPAQUE >> 3)) {
    
  •   if(alpha) {   
    
  •     if(alpha == (SDL_ALPHA_OPAQUE >> 3)) {
          *dstp = (s >> 8 & 0xf800) + (s >> 5 & 0x7e0)
      	  + (s >> 3  & 0x1f);
    
  •   } else {
    
  •     } else {
          Uint32 d = *dstp;
          /*
           * convert source and destination to G0RAB65565
    

@@ -515,6 +518,7 @@
d += (s - d) * alpha >> 5;
d &= 0x07e0f81f;
*dstp = d | d >> 16;

  •     }
      }
      srcp++;
      dstp++;
    

@@ -543,10 +547,11 @@
compositioning used (>>8 instead of /255) doesn’t handle
it correctly. Also special-case alpha=0 for speed?
Benchmark this! */

  •   if(alpha == (SDL_ALPHA_OPAQUE >> 3)) {
    
  •   if(alpha) {   
    
  •     if(alpha == (SDL_ALPHA_OPAQUE >> 3)) {
          *dstp = (s >> 9 & 0x7c00) + (s >> 6 & 0x3e0)
      	  + (s >> 3  & 0x1f);
    
  •   } else {
    
  •     } else {
          Uint32 d = *dstp;
          /*
           * convert source and destination to G0RAB65565
    

@@ -558,6 +563,7 @@
d += (s - d) * alpha >> 5;
d &= 0x03e07c1f;
*dstp = d | d >> 16;

  •     }
      }
      srcp++;
      dstp++;
    

@@ -583,7 +589,8 @@
unsigned sA = srcfmt->alpha;
unsigned dA = dstfmt->Amask ? SDL_ALPHA_OPAQUE : 0;

  • while ( height-- ) {
  • if(sA) {
  • while ( height-- ) {
      DUFFS_LOOP4(
      {
      Uint32 pixel;
    

@@ -603,6 +610,7 @@
width);
src += srcskip;
dst += dstskip;

  • }
    
    }
    }

@@ -634,7 +642,7 @@
unsigned dG;
unsigned dB;
RETRIEVE_RGB_PIXEL(src, srcbpp, pixel);

  •   if(pixel != ckey) {
    
  •   if(sA && pixel != ckey) {
          RGB_FROM_PIXEL(pixel, srcfmt, sR, sG, sB);
          DISEMBLE_RGB(dst, dstbpp, dstfmt, pixel, dR, dG, dB);
          ALPHA_BLEND(sR, sG, sB, sA, dR, dG, dB);
    

@@ -686,9 +694,11 @@
unsigned sA;
unsigned dA;
DISEMBLE_RGBA(src, srcbpp, srcfmt, pixel, sR, sG, sB, sA);

  •   DISEMBLE_RGBA(dst, dstbpp, dstfmt, pixel, dR, dG, dB, dA);
    
  •   ALPHA_BLEND(sR, sG, sB, sA, dR, dG, dB);
    
  •   ASSEMBLE_RGBA(dst, dstbpp, dstfmt, dR, dG, dB, dA);
    
  •   if(sA) {
    
  •     DISEMBLE_RGBA(dst, dstbpp, dstfmt, pixel, dR, dG, dB, dA);
    
  •     ALPHA_BLEND(sR, sG, sB, sA, dR, dG, dB);
    
  •     ASSEMBLE_RGBA(dst, dstbpp, dstfmt, dR, dG, dB, dA);
    
  •   }
      src += srcbpp;
      dst += dstbpp;
      },

In current code is only alpha == 255 check. This code speedup Alpha
blit by adding alpha == 0 check. This help when you have a lot of
pixels with A == 0 ( full transparent ).

When posting patches with optimizations, it’s a good idea to accompany
it with benchmarks. This may speed up alpha==0, but it’s extra code
inside a blitter loop, so you need to be careful it doesn’t slow down
the more important case of arbitrary alpha values.

Note that if you’re using alpha to mask things off–which I suspect
is where this optimization is most important–you’re probably better
off with 1-bit alpha.

diff -u -r SDL12/src/video/SDL_blit_A.c SDL12a/src/video/SDL_blit_A.c
— SDL12/src/video/SDL_blit_A.c Wed Mar 6 12:23:02 2002
+++ SDL12a/src/video/SDL_blit_A.c Sat Feb 15 19:26:01 2003
@@ -278,9 +278,10 @@
compositioning used (>>8 instead of /255) doesn’t handle
it correctly. Also special-case alpha=0 for speed?
Benchmark this! */

Even the code says so. Benchmark it! :)On Sat, Feb 15, 2003 at 07:54:25PM +0100, Rafal Bursig wrote:


Glenn Maynard

Rafal Bursig wrote:

Hi

In current code is only alpha == 255 check. This code speedup Alpha
blit by adding alpha == 0 check. This help when you have a lot of pixels
with A == 0 ( full transparent ).

Rafal

All of the lines that check if alpha is true and then check for the
value of alpha, are redundant and will slow things down slightly. fix
the patch, benchmark, re-post.> ------------------------------------------------------------------------

diff -u -r SDL12/src/video/SDL_blit_A.c SDL12a/src/video/SDL_blit_A.c
— SDL12/src/video/SDL_blit_A.c Wed Mar 6 12:23:02 2002
+++ SDL12a/src/video/SDL_blit_A.c Sat Feb 15 19:26:01 2003
@@ -278,9 +278,10 @@
compositioning used (>>8 instead of /255) doesn’t handle
it correctly. Also special-case alpha=0 for speed?
Benchmark this! */

  • if(alpha == SDL_ALPHA_OPAQUE) {
    
  • if(alpha) {   
    
  •   if(alpha == SDL_ALPHA_OPAQUE) {
        *dstp = (s & 0x00ffffff) | (*dstp & 0xff000000);
    
  • } else {
    
  •   } else {
        /*
         * take out the middle component (green), and process
         * the other two in parallel. One multiply less.
    

@@ -294,6 +295,7 @@
d &= 0xff00;
d = (d + ((s - d) * alpha >> 8)) & 0xff00;
*dstp = d1 | d | dalpha;

  •   }
    }
    ++srcp;
    ++dstp;
    

@@ -500,10 +502,11 @@
compositioning used (>>8 instead of /255) doesn’t handle
it correctly. Also special-case alpha=0 for speed?
Benchmark this! */

  • if(alpha == (SDL_ALPHA_OPAQUE >> 3)) {
    
  • if(alpha) {   
    
  •   if(alpha == (SDL_ALPHA_OPAQUE >> 3)) {
        *dstp = (s >> 8 & 0xf800) + (s >> 5 & 0x7e0)
    	  + (s >> 3  & 0x1f);
    
  • } else {
    
  •   } else {
        Uint32 d = *dstp;
        /*
         * convert source and destination to G0RAB65565
    

@@ -515,6 +518,7 @@
d += (s - d) * alpha >> 5;
d &= 0x07e0f81f;
*dstp = d | d >> 16;

  •   }
    }
    srcp++;
    dstp++;
    

@@ -543,10 +547,11 @@
compositioning used (>>8 instead of /255) doesn’t handle
it correctly. Also special-case alpha=0 for speed?
Benchmark this! */

  • if(alpha == (SDL_ALPHA_OPAQUE >> 3)) {
    
  • if(alpha) {   
    
  •   if(alpha == (SDL_ALPHA_OPAQUE >> 3)) {
        *dstp = (s >> 9 & 0x7c00) + (s >> 6 & 0x3e0)
    	  + (s >> 3  & 0x1f);
    
  • } else {
    
  •   } else {
        Uint32 d = *dstp;
        /*
         * convert source and destination to G0RAB65565
    

@@ -558,6 +563,7 @@
d += (s - d) * alpha >> 5;
d &= 0x03e07c1f;
*dstp = d | d >> 16;

  •   }
    }
    srcp++;
    dstp++;
    

@@ -583,7 +589,8 @@
unsigned sA = srcfmt->alpha;
unsigned dA = dstfmt->Amask ? SDL_ALPHA_OPAQUE : 0;

  • while ( height-- ) {
  • if(sA) {
  • while ( height-- ) {
    DUFFS_LOOP4(
    {
    Uint32 pixel;
    @@ -603,6 +610,7 @@
    width);
    src += srcskip;
    dst += dstskip;
  • }
    }
    }

@@ -634,7 +642,7 @@
unsigned dG;
unsigned dB;
RETRIEVE_RGB_PIXEL(src, srcbpp, pixel);

  • if(pixel != ckey) {
    
  • if(sA && pixel != ckey) {
        RGB_FROM_PIXEL(pixel, srcfmt, sR, sG, sB);
        DISEMBLE_RGB(dst, dstbpp, dstfmt, pixel, dR, dG, dB);
        ALPHA_BLEND(sR, sG, sB, sA, dR, dG, dB);
    

@@ -686,9 +694,11 @@
unsigned sA;
unsigned dA;
DISEMBLE_RGBA(src, srcbpp, srcfmt, pixel, sR, sG, sB, sA);

  • DISEMBLE_RGBA(dst, dstbpp, dstfmt, pixel, dR, dG, dB, dA);
    
  • ALPHA_BLEND(sR, sG, sB, sA, dR, dG, dB);
    
  • ASSEMBLE_RGBA(dst, dstbpp, dstfmt, dR, dG, dB, dA);
    
  • if(sA) {
    
  •   DISEMBLE_RGBA(dst, dstbpp, dstfmt, pixel, dR, dG, dB, dA);
    
  •   ALPHA_BLEND(sR, sG, sB, sA, dR, dG, dB);
    
  •   ASSEMBLE_RGBA(dst, dstbpp, dstfmt, dR, dG, dB, dA);
    
  • }
    src += srcbpp;
    dst += dstbpp;
    },

Dnia 2003.02.15 20:35 Calvin Spealman napisa?(a):

Rafal Bursig wrote:

Hi

In current code is only alpha == 255 check. This code speedup
Alpha blit by adding alpha == 0 check. This help when you have a lot
of pixels with A == 0 ( full transparent ).

Rafal

All of the lines that check if alpha is true and then check for the
value of alpha, are redundant and will slow things down slightly. fix
the patch, benchmark, re-post.

Hi

Sorry to All that I post this code and don’t say more about it.

I use 3 buffer surface with screen size. First (I) map buffer has
screen pixel coding and rest (II, III) have 32bit RGBA/ARGB. Secound
buffer layer is for text description and last one is for gui widget.
When I draw something to screen I “flush” all layers.

On II and III I have a lot of pixels with A=0.

On my system P200MXX flush ( 3 blits ) entire screen 640x480 take
~160ms,
with this patch this is ~60 ms.

about check…

You must check 3 state :

  1. Alpha == 0

  2. Alpha == 255

  3. 0 < Alpha < 255

    In case no-pixel alpha (SurfaceAlpha) you can create new function
    which call normal blit flunction when SurfaceAlpha == 255 or current
    AlphaBlit ( with removed Alpha check ) function when 0 < SurfaceAlpha <
    255 or return when SurfaceAlpha == 0.

But when you have pixelalpha surface you must do this check in blit
loop.
Current code check 2) and 3) and use one “if” instuction, but if you
want check 1) you must add next “if”.

Yes… It is True that this slow down blit if you don’t have pixels
with A == 0 but the same I can tell about A == 255 check.

Rafal

Yes, but the 255 check needs to be done for other reasons anyway (read
the comments).On Sat, Feb 15, 2003 at 09:20:33PM +0100, Rafal Bursig wrote:

Yes… It is True that this slow down blit if you don’t have pixels
with A == 0 but the same I can tell about A == 255 check.


Glenn Maynard

Instead of testing for zero in the inner loop, how about stripping off
leading and trailing zeros before entering the main loop. I suppose there
would only need to be one leading or trailing pixel with zero alpha on each
row for this to be a win.

Regards,

DanielOn Sat 15 Feb 03 22:09, Glenn Maynard wrote:

On Sat, Feb 15, 2003 at 09:20:33PM +0100, Rafal Bursig wrote:

Yes… It is True that this slow down blit if you don’t have pixels
with A == 0 but the same I can tell about A == 255 check.

Yes, but the 255 check needs to be done for other reasons anyway (read
the comments).

Actually SDL_RLEACCEL on alpha surfaces does exactly this. Have you tried it?

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment> On Sat 15 Feb 03 22:09, Glenn Maynard wrote:

On Sat, Feb 15, 2003 at 09:20:33PM +0100, Rafal Bursig wrote:

Yes… It is True that this slow down blit if you don’t have pixels
with A == 0 but the same I can tell about A == 255 check.

Yes, but the 255 check needs to be done for other reasons anyway (read
the comments).

Instead of testing for zero in the inner loop, how about stripping off
leading and trailing zeros before entering the main loop. I suppose there
would only need to be one leading or trailing pixel with zero alpha on each
row for this to be a win.