Bug in Alpha blitting?

I’ve been looking into writing a mostly-software (Ouch!)
blitter to support full alpha-channeling. In so doing,
I have been looking alot at the SDL blitting code.
I haven’t tested it, but the following around line 225 of
SDL-1.1.2/src/video/SDL_blit.h looks wrong:

#define ALPHA_BLEND(sR, sG, sB, A, Adiv, dR, dG, dB)
{
dR = (((Uint16)(sR-dR)(Adiv-A))>>8)+dR;
dG = (((Uint16)(sG-dG)
(Adiv-A))>>8)+dG;
dB = (((Uint16)(sB-dB)*(Adiv-A))>>8)+dB;
}

I’m fairly confident that the three Uint16’s should
be Sint16’s… In fact, the cast makes slightly more
sense BEFORE the subtraction. How does this look to
the rest of you? Sam? Comments?

Also… not to make a buffoon out of myself, but I was
looking at the SDL_SetAlpha function, and I didn’t see
any place where the SDL_HWACCEL flag was restored when
turning off alpha-ing on a surface. Is this handled
elsewhere or is this a genuine issue?

All comments welcome.

Thanks,

-Loren


Great news! Get free KNXmail here!
http://www.knx1070.com

I’ve been looking into writing a mostly-software (Ouch!)
blitter to support full alpha-channeling. In so doing,
I have been looking alot at the SDL blitting code.
I haven’t tested it, but the following around line 225 of
SDL-1.1.2/src/video/SDL_blit.h looks wrong:

#define ALPHA_BLEND(sR, sG, sB, A, Adiv, dR, dG, dB)
{
dR = (((Uint16)(sR-dR)(Adiv-A))>>8)+dR;
dG = (((Uint16)(sG-dG)
(Adiv-A))>>8)+dG;
dB = (((Uint16)(sB-dB)*(Adiv-A))>>8)+dB;
}

I’m fairly confident that the three Uint16’s should
be Sint16’s… In fact, the cast makes slightly more
sense BEFORE the subtraction. How does this look to
the rest of you? Sam? Comments?

Can any of the Adonthell folks comment?

Also… not to make a buffoon out of myself, but I was
looking at the SDL_SetAlpha function, and I didn’t see
any place where the SDL_HWACCEL flag was restored when
turning off alpha-ing on a surface. Is this handled
elsewhere or is this a genuine issue?

It’s probably a genuine issue. :slight_smile:
Can you submit a patch after testing it on a driver that
has hardware acceleration?

Thanks!
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

I’ve been looking into writing a mostly-software (Ouch!)
blitter to support full alpha-channeling. In so doing,
I have been looking alot at the SDL blitting code.
I haven’t tested it, but the following around line 225 of
SDL-1.1.2/src/video/SDL_blit.h looks wrong:

#define ALPHA_BLEND(sR, sG, sB, A, Adiv, dR, dG, dB)
{
dR = (((Uint16)(sR-dR)(Adiv-A))>>8)+dR;
dG = (((Uint16)(sG-dG)
(Adiv-A))>>8)+dG;
dB = (((Uint16)(sB-dB)*(Adiv-A))>>8)+dB;
}

I’m fairly confident that the three Uint16’s should
be Sint16’s… In fact, the cast makes slightly more
sense BEFORE the subtraction. How does this look to
the rest of you? Sam? Comments?

No… I think Uint16, because you don’t want a signed bit mucking anything
up so that you can use the whole 16 bits (or whatever). On that note, I
spent about five days argueing about alpha blitting and the best way to do
it; I’ll send you the messages that are relevant.

-Loren

Cheers,

Nicholas

----- Original Message -----
From: sondheim at knxmail.com (sondheim@knxmail.com)
To: sdl at lokigames.com
Date: Tuesday, May 02, 2000 8:21 PM
Subject: [SDL] Bug in Alpha blitting???

I’ve been looking into writing a mostly-software (Ouch!)
blitter to support full alpha-channeling.

First the good news: Looks like the functionality I need
is already in SDL (except for the clipping thing I
mentioned last week), so, although I might have found a
bug in the alpha stuff, I probably won’t need to exercise
it to the extent I was planning, and most of the effects
of these bugs will go un-noticed.

                                       In so doing,

I have been looking alot at the SDL blitting code.
I haven’t tested it, but the following around line 225 of
SDL-1.1.2/src/video/SDL_blit.h looks wrong:

#define ALPHA_BLEND(sR, sG, sB, A, Adiv, dR, dG, dB)
{
dR = (((Uint16)(sR-dR)(Adiv-A))>>8)+dR;
dG = (((Uint16)(sG-dG)
(Adiv-A))>>8)+dG;
dB = (((Uint16)(sB-dB)*(Adiv-A))>>8)+dB;
}

I’m fairly confident that the three Uint16’s should
be Sint16’s… In fact, the cast makes slightly more
sense BEFORE the subtraction. How does this look to
the rest of you? Sam? Comments?

Can any of the Adonthell folks comment?

Okay, a few things on this note: I’m not sure whether
Sint’s are needed. Nicholas says I’m wrong (and he’s
probably right), but this should be looked into: Reason
being, that you want the result of the subtraction (sX-dX)
to be a signed number, so you can add or subtract from dX.
What I don’t know is how the signed quantity will behave
in the multiply or in the right-shift.:

If I multiply -3 (0xfffd) times 3 (0x0003), I should get
-9 (0xfff7), then when I divide by 255 (or shift right 8)
I should get 0…

This brings me to my SECOND POINT. The algorythm is
supposed to divide by Adiv (usually 255) not 256
(same as shifting right 8)… The way things are
implemented now an alpha value of 0 should be totally
opaque (this is the reverse of what I’m used to, but
that is irrelevant). This means that the resulting
dX value should equal the original sX value. Also,
this fast alpha as it is now only approximately works
for Adiv values of 255.

A plausible solution that Stephane suggested should
optomize to a shift right in the compiler (unless
Adiv is a variable) is something like:
dX = (((Uint16)(sX-dX)(Adiv+1-A))/(Adiv+1))+dX;
As a result of integer rounding errors, this yeilds
much more accurate results where Adiv is (2^n)-1.
Perhaps you should add an Ashift argument to the
macro to allow you to optomize for variable Adiv’s, and
compute the Ashift on a per SDL_PixelFormat basis:
for( Ashift=0; ((Adiv+1)>>Ashift)==1; Ashift++ ) ;
Then just use:
dX = (((Uint16)(sX-dX)
(Adiv+1-A))>>Ashift)+dX;

As for the sign-bit, I’m going to leave that for sharper
minds than mine to tackel, although I think I adiquately
explained the potential problem.

Also… not to make a buffoon out of myself, but I was
looking at the SDL_SetAlpha function, and I didn’t see
any place where the SDL_HWACCEL flag was restored when
turning off alpha-ing on a surface. Is this handled
elsewhere or is this a genuine issue?

It’s probably a genuine issue. :slight_smile:
Can you submit a patch after testing it on a driver that
has hardware acceleration?

I’ve spent a few hours now looking into this. As a
stop-gap issue, I can recommend developers keep a temprary
copy of the flags register while doing any temporary alpha
stuff with SDL_Surfaces. As far as fixing SDL, this looks
like it may need a small design change how hardware
accelleration is determined… This looks like it could
result in a few hundred lines of diffs scattered throught
the surface infastructure code. Not something I’m prepared
to do without a bit of planning right now. My easiest
suggestion would be have a single function (per video
device type) to determine if a surface’s current alpha
configuration is supported in hardware. You currently
support 3 different alpha techniques, and any of them
can be used in concert. I’m not prepared to embark on
such changes without a bit more input from the software
architect. Sam? Comments?

Regards,

-Loren


Great news! Get free KNXmail here!
http://www.knx1070.com

Okay, a few things on this note: I’m not sure whether
Sint’s are needed. Nicholas says I’m wrong (and he’s
probably right), but this should be looked into: Reason
being, that you want the result of the subtraction (sX-dX)
to be a signed number, so you can add or subtract from dX.
What I don’t know is how the signed quantity will behave
in the multiply or in the right-shift.:

IMO C/C++ will treat the sX-dX expression as a signed number, then convert
to an unsigned number for the final component.

As for the sign-bit, I’m going to leave that for sharper
minds than mine to tackel, although I think I adiquately
explained the potential problem.

See above. Don’t worry about it — you don’t want the sign bit preserved. C
should (if memory serves) handle it as a Sint until it needs to/wants to
stop. You might want to look at throwing an abs() around it, although I
don’t know how fast it’ll go.

Regards,

-Loren

Cheers,

Nicholas

----- Original Message -----
From: sondheim at knxmail.com (sondheim@knxmail.com)
To: sdl at lokigames.com
Date: Wednesday, May 03, 2000 12:51 PM
Subject: Re: [SDL] Bug in Alpha blitting???

What I don’t know is how the signed quantity will behave
in the multiply or in the right-shift.:

The result of right-shifting a negative number is implementation-defined in C.
Some processors have a distinct “arithmetic shift” operation, which replicates
the sign bit, whereas others only have “logical shift” which feeds zero bits
from the left. In the former case, right-shifting a negative number behaves
as “expected” (division by two, rounded towards negative infinity), but
with logical shift, you’re screwed.

So the code fragment

 dR = (((sR-dR)*(Adiv-A))>>8)+dR;

is not strictly correct, but if sR etc are ints, then we’ll have lots of
spare ones to the left of the interesting 8 least significant bits, so it
works anyway!

The Uint16 cast was of course wrong and is not present in the CVS.
With 2-complement ints (all we care about), signedness does not matter for

  • and -, but for *, / and % it’s important. That’s why many processors have
    separate “multiply signed” and “multiply unsigned” opcodes.

Hack away,

 Mattias.

Okay, a few things on this note: I’m not sure whether
Sint’s are needed. Nicholas says I’m wrong (and he’s
probably right), but this should be looked into: Reason
being, that you want the result of the subtraction (sX-dX)
to be a signed number, so you can add or subtract from dX.
What I don’t know is how the signed quantity will behave
in the multiply or in the right-shift.:

IMO C/C++ will treat the sX-dX expression as a signed number, then convert
to an unsigned number for the final component.

Well… I figured that until someone came up with an
athoratative answer, we would keep going back and forth
on this. There were two issues that seemed important:
1) Determining what behavior was desired.
2) Determining how to specify it un-ambigously in
C.
I looked into both of these today. First, once you have
computed the multiply and divide (or shift), the sign is
irrelivant. All that is important is if the number is
greater than 255-dX, and all the two’s-compliment math
works by itself. The divide turns out to be similarly
benign. I did some work with bc today, and found that
there is a catch with regard to the multiply, however:
For the math to work out properly, the high byte of the
16-bit integer must be 0xff if the number is negative,
or the multiply will give a bogus result. As Nic said
earlier, that actual signed-ness of the type at multiply
time doesn’t matter.

I looked this up in the ANSI C spec (both sections 6.3.4 and
6.3.6) and found nothing specified as to the return-type
of unsigned integer subtraction. Although gcc and other
compilers may handle this okay, I thought an explicit cast
would be best. (Casts between int types don’t burn any real
cycles anyway.)

I also thought, since I was proposing adding a 1 to a
8-bit representation of 255, it would probably be best
to cast that 16-bits also.

ANYWAY, here’s my proposed patch to fix the problems
with the alpha-blitting. Again my appologies for the
patch last week. This patch has been tested… I haven’t
gone over a actual running program too closely to verify
the numerics, but I have tested that it doesn’t break
anything… The most visable difference you’ll see in
test/testalpha is that the sprite won’t completely dissapear
as long as it did in the old version. It should also be
more opaque at it darkest point, although this isn’t very
visable.

Thanks again, All comments welcome,

-Loren

=====BEGIN=====
diff -ruN src/SDL-1.1.2/src/video/SDL_blit.h src/SDL-1.1.2-new/src/video/SDL_blit.h
— src/SDL-1.1.2/src/video/SDL_blit.h Thu Mar 16 07:20:38 2000
+++ src/SDL-1.1.2-new/src/video/SDL_blit.h Wed May 3 17:47:51 2000
@@ -222,14 +222,14 @@
/* Blend the RGB values of two pixels based on a source alpha value */
#define FAST_ALPHA_BLEND
#ifdef FAST_ALPHA_BLEND
-#define ALPHA_BLEND(sR, sG, sB, A, Adiv, dR, dG, dB)
+#define ALPHA_BLEND(sR, sG, sB, A, Adiv, Abits, dR, dG, dB) \ { \

  • dR = (((Uint16)(sR-dR)*(Adiv-A))>>8)+dR; \
  • dG = (((Uint16)(sG-dG)*(Adiv-A))>>8)+dG; \
  • dB = (((Uint16)(sB-dB)*(Adiv-A))>>8)+dB; \
  • dR = (((Uint16)(((Sint16)sR)-((Sint16)dR))*(((Uint16)Adiv)+1-A))>>Abits)+dR; \
  • dG = (((Uint16)(((Sint16)sG)-((Sint16)dG))*(((Uint16)Adiv)+1-A))>>Abits)+dG; \
  • dB = (((Uint16)(((Sint16)sB)-((Sint16)dB))(((Uint16)Adiv)+1-A))>>Abits)+dB;
    }
    #else
    -#define ALPHA_BLEND(sR, sG, sB, A, Adiv, dR, dG, dB)
    +#define ALPHA_BLEND(sR, sG, sB, A, Adiv, Abits, dR, dG, dB) \ {
    dR = ((Uint16)sR
    (Adiv-A) + (Uint16)dRA) / Adiv;
    dG = ((Uint16)sG
    (Adiv-A) + (Uint16)dG*A) / Adiv;
    diff -ruN src/SDL-1.1.2/src/video/SDL_blit_A.c src/SDL-1.1.2-new/src/video/SDL_blit_A.c
    — src/SDL-1.1.2/src/video/SDL_blit_A.c Thu Mar 16 07:20:38 2000
    +++ src/SDL-1.1.2-new/src/video/SDL_blit_A.c Wed May 3 17:18:59 2000
    @@ -64,7 +64,7 @@
    sB = srcpal[bit].b;
    DISEMBLE_RGB(dst, dstbpp, dstfmt,
    pixel, dR, dG, dB);
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      	  	ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	byte <<= 1;
    

@@ -96,7 +96,7 @@
sB = srcpal[*src].b;
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      	  	ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src++;
    

@@ -129,7 +129,7 @@
dR = dstfmt->palette->colors[*dst].r;
dG = dstfmt->palette->colors[*dst].g;
dB = dstfmt->palette->colors[*dst].b;

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		/* Pack RGB into 8bit pixel */
      		if ( palmap == NULL ) {
      			*dst =((dR>>5)<<(3+2))|
    

@@ -151,6 +151,7 @@
Uint32 pixel;
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -159,7 +160,7 @@
dR = dstfmt->palette->colors[*dst].r;
dG = dstfmt->palette->colors[*dst].g;
dB = dstfmt->palette->colors[*dst].b;

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		/* Pack RGB into 8bit pixel */
      		if ( palmap == NULL ) {
      			*dst =((dR>>5)<<(3+2))|
    

@@ -206,7 +207,7 @@
if ( 1 ) {
pixel16 = *dstp;
RGB_FROM_PIXEL(pixel16, dstfmt, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		PIXEL_FROM_RGB(*dstp, dstfmt, dR, dG, dB);
      	}
      	++srcp;
    

@@ -218,6 +219,7 @@
} else {
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -226,7 +228,7 @@
if ( 1 ) {
pixel16 = *dstp;
RGB_FROM_PIXEL(pixel16, dstfmt, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		PIXEL_FROM_RGB(*dstp, dstfmt, dR, dG, dB);
      	}
      	++srcp;
    

@@ -260,7 +262,7 @@
if ( 1 ) {
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src += srcbpp;
    

@@ -272,6 +274,7 @@
} else {
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -279,7 +282,7 @@
if ( 1 ) {
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src += srcbpp;
    

diff -ruN src/SDL-1.1.2/src/video/SDL_blit_AK.c src/SDL-1.1.2-new/src/video/SDL_blit_AK.c
— src/SDL-1.1.2/src/video/SDL_blit_AK.c Thu Mar 16 07:20:38 2000
+++ src/SDL-1.1.2-new/src/video/SDL_blit_AK.c Wed May 3 17:21:54 2000
@@ -64,7 +64,7 @@
sB = srcpal[bit].b;
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      	  	ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	byte <<= 1;
    

@@ -96,7 +96,7 @@
sB = srcpal[*src].b;
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      	  	ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src++;
    

@@ -129,7 +129,7 @@
dR = dstfmt->palette->colors[*dst].r;
dG = dstfmt->palette->colors[*dst].g;
dB = dstfmt->palette->colors[*dst].b;

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		/* Pack RGB into 8bit pixel */
      		if ( palmap == NULL ) {
      			*dst =((dR>>5)<<(3+2))|
    

@@ -151,6 +151,7 @@
Uint32 pixel;
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -159,7 +160,7 @@
dR = dstfmt->palette->colors[*dst].r;
dG = dstfmt->palette->colors[*dst].g;
dB = dstfmt->palette->colors[*dst].b;

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		/* Pack RGB into 8bit pixel */
      		if ( palmap == NULL ) {
      			*dst =((dR>>5)<<(3+2))|
    

@@ -206,7 +207,7 @@
if ( pixel32 != srcfmt->colorkey ) {
pixel16 = *dstp;
RGB_FROM_PIXEL(pixel16, dstfmt, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		PIXEL_FROM_RGB(*dstp, dstfmt, dR, dG, dB);
      	}
      	++srcp;
    

@@ -218,6 +219,7 @@
} else {
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -226,7 +228,7 @@
if ( pixel32 != srcfmt->colorkey ) {
pixel16 = *dstp;
RGB_FROM_PIXEL(pixel16, dstfmt, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		PIXEL_FROM_RGB(*dstp, dstfmt, dR, dG, dB);
      	}
      	++srcp;
    

@@ -260,7 +262,7 @@
if ( pixel != srcfmt->colorkey ) {
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src += srcbpp;
    

@@ -272,6 +274,7 @@
} else {
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -279,7 +282,7 @@
if ( pixel != srcfmt->colorkey ) {
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src += srcbpp;
    

======END======


Great news! Get free KNXmail here!
http://www.knx1070.com

I looked this up in the ANSI C spec (both sections 6.3.4 and
6.3.6) and found nothing specified as to the return-type
of unsigned integer subtraction. Although gcc and other
compilers may handle this okay, I thought an explicit cast
would be best. (Casts between int types don’t burn any real
cycles anyway.)

They actually may do that (sign extension). And look into integral promotion
rules and unsigned contagion. The rules are slightly hairy, but every C(++)
programmer doing this sort of stuff must know them.

  • dR = (((Uint16)(((Sint16)sR)-((Sint16)dR))*(((Uint16)Adiv)+1-A))>>Abits)+dR; \

If sR, dR, A are of type int, how is this different from

dR = ((sR - dR) * (Adiv + 1 - A) >> Abits) + dR

? Don’t cast to integers of exact sizes without a good reason. The name of
the fastest type is “int”, almost by definition.

They actually may do that (sign extension). And look into integral promotion
rules and unsigned contagion. The rules are slightly hairy, but every C(++)
programmer doing this sort of stuff must know them.

  • dR = (((Uint16)(((Sint16)sR)-((Sint16)dR))*(((Uint16)Adiv)+1-A))>>Abits)+dR; \

If sR, dR, A are of type int, how is this different from

dR = ((sR - dR) * (Adiv + 1 - A) >> Abits) + dR

? Don’t cast to integers of exact sizes without a good reason. The name of
the fastest type is “int”, almost by definition.

Hmm… I actually hadn’t thought of that… Mostly because
everything came in as a Uint8…

Okay, anyway, I did some more testing after my email last
night… Turns out that my integer rounding trick only
works with positive numbers. Doh!!! If you run the
test/testalpha demo, and click the mouse repeatedly in
the same spot you’ll start to see a dark rectangle aound
the circle it’s drawing… I had to add a conditional to
detect the sign of what was being multiplied.
I know that this is a slight performance hit. I did get
the bright idea that if I do the conditionals, and made
everything so that it was positive arithmetic, I would
have 3 8-bit numbers multiplied by one of two 8-bit numbers
( or more correctly one of two number in the range of 0-256
… 9-bits really… but even the maximum 8-bit number, 255,
times 256 won’t overflow 16-bits) So I know I have at least
2 numbers multiplied by the same value, each needing 16
bits, so I can do 2 multiplies at once in a 32-bit int,
and on a processor with a 64-bit int if I am multiplying
all three numbers by the same value (1 in 4 chance) I
can actually do all three multiplies with one 64-bit
multiply.

…So…

I added 3 conditionals, but eliminated 1 multiply, and
sometimes, on a 64-bit processor, elliminated 2.

UNFORTUNATELY… I’ve now replaced a 6 line macro with a
180+ line macro (Ouch!) I have tested this. It looks fine
and doesn’t have the darkening bug I saw last night, and is
AT LEAST as fast as the original code, although I would like
to see some benchmarks, and I had no means to test the
64-bit code… (Anyone have an alpha handy?)

Anyway, this is the last patch I plan on making to the
alpha-blitting code for a while, although I welcome any
questions or comments

Thanks again for all the feedback.

Regards,

-Loren

P.S. This patch is from the original SDL-1.1.2 code-
base… not from the patch I set out yesterday.

======BEGIN======
diff -ruN src/SDL-1.1.2/src/video/SDL_blit.h src/SDL-1.1.2-new/src/video/SDL_blit.h
— src/SDL-1.1.2/src/video/SDL_blit.h Thu Mar 16 07:20:38 2000
+++ src/SDL-1.1.2-new/src/video/SDL_blit.h Thu May 4 11:35:11 2000
@@ -221,15 +221,211 @@

/* Blend the RGB values of two pixels based on a source alpha value */
#define FAST_ALPHA_BLEND
+#define CONDENSE_MULTIPLIES_ALPHA_BLEND
#ifdef FAST_ALPHA_BLEND
-#define ALPHA_BLEND(sR, sG, sB, A, Adiv, dR, dG, dB)
+#ifdef CONDENSE_MULTIPLIES_ALPHA_BLEND
+#define ALPHA_BLEND(sR, sG, sB, A, Adiv, Abits, dR, dG, dB) \ { \

  • dR = (((Uint16)(sR-dR)*(Adiv-A))>>8)+dR; \
  • dG = (((Uint16)(sG-dG)*(Adiv-A))>>8)+dG; \
  • dB = (((Uint16)(sB-dB)*(Adiv-A))>>8)+dB; \
  • /* We have 3 8-bit numbers that we are multiplying by one of 2 \
  •  numbers... at least 2 will be the same. */	\
    
  • int multiplier = 0; \
  • if ( sizeof(int) >= 6 ) { /* Optimize for 64-bit processors */ \
  •   if ( sR<dR ) {	\
    
  •   	if ( sG<dG ) {	\
    
  •   		if ( sB<dB ) {	\
    
  •   			/* Same sign... use one multiply */	\+					multiplier = ((int)(dR-sR)) | (((int)(dG-sG))<<16) |	\
    
  •   				(((int)(dB-sB))<<32) ;	\
    
  •   			multiplier *= (((int)A)+1) ;	\
    
  •   			multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff00ff ;	\
    
  •   			dR = ((multiplier & 0xff) + sR) & 0xff ; \
    
  •   			dG = (((multiplier & 0xff0000)>>16) + sG) & 0xff ; \
    
  •   			dB = (((multiplier & 0xff00000000)>>32) + sB) & 0xff ; \
    
  •   		} else { \
    
  •   			/* Same sign */	\
    
  •   			multiplier = ((int)(dR-sR)) | (((int)(dG-sG))<<16) ;	\
    
  •   			multiplier *= (((int)A)+1) ;	\
    
  •   			multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff ;	\
    
  •   			dR = ((multiplier & 0xff) + sR) & 0xff ; \
    
  •   			dG = (((multiplier & 0xff0000)>>16) + sG) & 0xff ; \
    
  •   			/* Odd man out */	\
    
  •   			dB = 0xff & ((((int)(sB-dB)*(((int)Adiv)+1-A))>>Abits)+dB) ;    \
    
  •   		}	\
    
  •   	} else { \
    
  •   		if ( sB<dB ) {	\
    
  •   			/* Same sign */	\
    
  •   			multiplier = ((int)(dR-sR)) | (((int)(dB-sB))<<16) ;	\
    
  •   			multiplier *= (((int)A)+1) ;	\
    
  •   			multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff ;	\
    
  •   			dR = ((multiplier & 0xff) + sR) & 0xff ; \
    
  •   			dB = (((multiplier & 0xff0000)>>16) + sB) & 0xff ; \
    
  •   			/* Odd man out */	\
    
  •   			dG = 0xff & ((((int)(sG-dG)*(((int)Adiv)+1-A))>>Abits)+dG) ;    \
    
  •   		} else { \
    
  •   			/* Same sign */	\
    
  •   			multiplier = ((int)(sG-dG)) | (((int)(sB-dB))<<16) ;	\
    
  •   			multiplier *= (((int)Adiv)+1-A) ;	\+					multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff ;	\
    
  •   			dG = ((multiplier & 0xff) + dG) & 0xff ; \
    
  •   			dB = (((multiplier & 0xff0000)>>16) + dB) & 0xff ; \
    
  •   			/* Odd man out */	\
    
  •   			dR = 0xff & ((((int)(dR-sR)*(((int)A)+1))>>Abits)+sR) ;    \
    
  •   		}	\
    
  •   	}	\
    
  •   } else { \
    
  •   	if ( sG<dG ) {	\
    
  •   		if ( sB<dB ) {	\
    
  •   			/* Same sign */	\
    
  •   			multiplier = ((int)(dG-sG)) | (((int)(dB-sB))<<16) ;	\
    
  •   			multiplier *= (((int)A)+1) ;	\
    
  •   			multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff ;	\
    
  •   			dG = ((multiplier & 0xff) + sG) & 0xff ; \
    
  •   			dB = (((multiplier & 0xff0000)>>16) + sB) & 0xff ; \
    
  •   			/* Odd man out */	\
    
  •   			dR = 0xff & ((((int)(sR-dR)*(((int)Adiv)+1-A))>>Abits)+dR) ;    \
    
  •   		} else { \
    
  •   			/* Same sign */	\
    
  •   			multiplier = ((int)(sR-dR)) | (((int)(sB-dB))<<16) ;	\
    
  •   			multiplier *= (((int)Adiv)+1-A) ;	\+					multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff ;	\
    
  •   			dR = ((multiplier & 0xff) + dR) & 0xff ; \
    
  •   			dB = (((multiplier & 0xff0000)>>16) + dB) & 0xff ; \
    
  •   			/* Odd man out */	\
    
  •   			dG = 0xff & ((((int)(dG-sG)*(((int)A)+1))>>Abits)+sG) ;    \
    
  •   		}	\
    
  •   	} else { \
    
  •   		if ( sB<dB ) {	\
    
  •   			/* Same sign */	\
    
  •   			multiplier = ((int)(sR-dR)) | (((int)(sG-dG))<<16) ;	\
    
  •   			multiplier *= (((int)Adiv)+1-A) ;	\+					multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff ;	\
    
  •   			dR = ((multiplier & 0xff) + dR) & 0xff ; \
    
  •   			dG = (((multiplier & 0xff0000)>>16) + dG) & 0xff ; \
    
  •   			/* Odd man out */	\
    
  •   			dB = 0xff & ((((int)(dB-sB)*(((int)A)+1))>>Abits)+sB) ;    \
    
  •   		} else { \
    
  •   			/* Same sign... use one multiply */	\+					multiplier = ((int)(sR-dR)) | (((int)(sG-dG))<<16) |	\
    
  •   				(((int)(sB-dB))<<32) ;	\
    
  •   			multiplier *= (((int)Adiv)+1-A) ;	\+					multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff00ff ;	\
    
  •   			dR = ((multiplier & 0xff) + dR) & 0xff ; \
    
  •   			dG = (((multiplier & 0xff0000)>>16) + dG) & 0xff ; \
    
  •   			dB = (((multiplier & 0xff00000000)>>32) + dB) & 0xff ; \
    
  •   		}	\
    
  •   	}	\
    
  •   }	\
    
  • } else if ( sizeof(int) >= 4 ) { \
  •   if ( sR<dR ) {	\
    
  •   	if ( sG<dG ) {	\
    
  •   		/* Same sign */	\
    
  •   		multiplier = ((int)(dR-sR)) | (((int)(dG-sG))<<16) ;	\
    
  •   		multiplier *= (((int)A)+1) ;	\
    
  •   		multiplier >>= Abits ;	\
    
  •   		multiplier &= 0x00ff00ff ;	\
    
  •   		dR = ((multiplier & 0xff) + sR) & 0xff ; \
    
  •   		dG = (((multiplier & 0xff0000)>>16) + sG) & 0xff ; \
    
  •   		/* Odd man out */	\
    
  •   		dB = 0xff & ((sB<dB)?    \
    
  •    			((((int)(dB-sB)*(((int)A)+1))>>Abits)+sB):    \
    
  •    			((((int)(sB-dB)*(((int)Adiv)+1-A))>>Abits)+dB) );    \
    
  •   	} else { \
    
  •   		if ( sB<dB ) {	\
    
  •   			/* Same sign */	\
    
  •   			multiplier = ((int)(dR-sR)) | (((int)(dB-sB))<<16) ;	\
    
  •   			multiplier *= (((int)A)+1) ;	\
    
  •   			multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff ;	\
    
  •   			dR = ((multiplier & 0xff) + sR) & 0xff ; \
    
  •   			dB = (((multiplier & 0xff0000)>>16) + sB) & 0xff ; \
    
  •   			/* Odd man out */	\
    
  •   			dG = 0xff & ((((int)(sG-dG)*(((int)Adiv)+1-A))>>Abits)+dG) ;    \
    
  •   		} else { \
    
  •   			/* Same sign */	\
    
  •   			multiplier = ((int)(sG-dG)) | (((int)(sB-dB))<<16) ;	\
    
  •   			multiplier *= (((int)Adiv)+1-A) ;	\+					multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff ;	\
    
  •   			dG = ((multiplier & 0xff) + dG) & 0xff ; \
    
  •   			dB = (((multiplier & 0xff0000)>>16) + dB) & 0xff ; \
    
  •   			/* Odd man out */	\
    
  •   			dR = 0xff & ((((int)(dR-sR)*(((int)A)+1))>>Abits)+sR) ;    \
    
  •   		}	\
    
  •   	}	\
    
  •   } else { \
    
  •   	if ( sG<dG ) {	\
    
  •   		if ( sB<dB ) {	\
    
  •   			/* Same sign */	\
    
  •   			multiplier = ((int)(dG-sG)) | (((int)(dB-sB))<<16) ;	\
    
  •   			multiplier *= (((int)A)+1) ;	\
    
  •   			multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff ;	\
    
  •   			dG = ((multiplier & 0xff) + sG) & 0xff ; \
    
  •   			dB = (((multiplier & 0xff0000)>>16) + sB) & 0xff ; \
    
  •   			/* Odd man out */	\
    
  •   			dR = 0xff & ((((int)(sR-dR)*(((int)Adiv)+1-A))>>Abits)+dR) ;    \
    
  •   		} else { \
    
  •   			/* Same sign */	\
    
  •   			multiplier = ((int)(sR-dR)) | (((int)(sB-dB))<<16) ;	\
    
  •   			multiplier *= (((int)Adiv)+1-A) ;	\+					multiplier >>= Abits ;	\
    
  •   			multiplier &= 0x00ff00ff ;	\
    
  •   			dR = ((multiplier & 0xff) + dR) & 0xff ; \
    
  •   			dB = (((multiplier & 0xff0000)>>16) + dB) & 0xff ; \
    
  •   			/* Odd man out */	\
    
  •   			dG = 0xff & ((((int)(dG-sG)*(((int)A)+1))>>Abits)+sG) ;    \
    
  •   		}	\
    
  •   	} else { \
    
  •   		/* Same sign */	\
    
  •   		multiplier = ((int)(sR-dR)) | (((int)(sG-dG))<<16) ;	\
    
  •   		multiplier *= (((int)Adiv)+1-A) ;	\
    
  •   		multiplier >>= Abits ;	\
    
  •   		multiplier &= 0x00ff00ff ;	\
    
  •   		dR = ((multiplier & 0xff) + dR) & 0xff ; \
    
  •   		dG = (((multiplier & 0xff0000)>>16) + dG) & 0xff ; \
    
  •   		/* Odd man out */	\
    
  •   		dB = 0xff & ((sB<dB)?    \
    
  •    			((((int)(dB-sB)*(((int)A)+1))>>Abits)+sB):    \
    
  •    			((((int)(sB-dB)*(((int)Adiv)+1-A))>>Abits)+dB) );    \
    
  •   	}	\
    
  •   }	\
    
  • } else { /* need a 32-bit int to do any special multiplication \
  •   	    condensing... otherwise fall-back */	\
    
  •   dR = 0xff & ((sR<dR)?    \
    
  •    	((((int)(dR-sR)*(((int)A)+1))>>Abits)+sR):    \
    
  •    	((((int)(sR-dR)*(((int)Adiv)+1-A))>>Abits)+dR) );    \
    
  •   dG = 0xff & ((sG<dG)?    \
    
  •    	((((int)(dG-sG)*(((int)A)+1))>>Abits)+sG):    \
    
  •    	((((int)(sG-dG)*(((int)Adiv)+1-A))>>Abits)+dG) );    \
    
  •   dB = 0xff & ((sB<dB)?    \
    
  •    	((((int)(dB-sB)*(((int)A)+1))>>Abits)+sB):    \
    
  •    	((((int)(sB-dB)*(((int)Adiv)+1-A))>>Abits)+dB) );    \
    
  • }
    }
    -#else
    -#define ALPHA_BLEND(sR, sG, sB, A, Adiv, dR, dG, dB)
    +#else /* CONDENSE_MULTIPLIES_ALPHA_BLEND */
    +#define ALPHA_BLEND(sR, sG, sB, A, Adiv, Abits, dR, dG, dB) +{ \
  • dR = 0xff & ((sR<dR)? \
  •    ((((int)(dR-sR)*(((int)A)+1))>>Abits)+sR):    \
    
  •    ((((int)(sR-dR)*(((int)Adiv)+1-A))>>Abits)+dR) );    \
    
  • dG = 0xff & ((sG<dG)? \
  •    ((((int)(dG-sG)*(((int)A)+1))>>Abits)+sG):    \
    
  •    ((((int)(sG-dG)*(((int)Adiv)+1-A))>>Abits)+dG) );    \
    
  • dB = 0xff & ((sB<dB)? \
  •    ((((int)(dB-sB)*(((int)A)+1))>>Abits)+sB):    \
    
  •    ((((int)(sB-dB)*(((int)Adiv)+1-A))>>Abits)+dB) );    \
    

+}
+#endif /* CONDENSE_MULTIPLIES_ALPHA_BLEND /
+#else /
FAST_ALPHA_BLEND /
+#define ALPHA_BLEND(sR, sG, sB, A, Adiv, Abits, dR, dG, dB) \ {
dR = ((Uint16)sR
(Adiv-A) + (Uint16)dRA) / Adiv;
dG = ((Uint16)sG
(Adiv-A) + (Uint16)dG*A) / Adiv;
diff -ruN src/SDL-1.1.2/src/video/SDL_blit_A.c src/SDL-1.1.2-new/src/video/SDL_blit_A.c
— src/SDL-1.1.2/src/video/SDL_blit_A.c Thu Mar 16 07:20:38 2000
+++ src/SDL-1.1.2-new/src/video/SDL_blit_A.c Wed May 3 23:10:04 2000
@@ -64,7 +64,7 @@
sB = srcpal[bit].b;
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      	  	ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	byte <<= 1;
    

@@ -96,7 +96,7 @@
sB = srcpal[*src].b;
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      	  	ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src++;
    

@@ -129,7 +129,7 @@
dR = dstfmt->palette->colors[*dst].r;
dG = dstfmt->palette->colors[*dst].g;
dB = dstfmt->palette->colors[*dst].b;

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		/* Pack RGB into 8bit pixel */
      		if ( palmap == NULL ) {
      			*dst =((dR>>5)<<(3+2))|
    

@@ -151,6 +151,7 @@
Uint32 pixel;
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -159,7 +160,7 @@
dR = dstfmt->palette->colors[*dst].r;
dG = dstfmt->palette->colors[*dst].g;
dB = dstfmt->palette->colors[*dst].b;

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		/* Pack RGB into 8bit pixel */
      		if ( palmap == NULL ) {
      			*dst =((dR>>5)<<(3+2))|
    

@@ -206,7 +207,7 @@
if ( 1 ) {
pixel16 = *dstp;
RGB_FROM_PIXEL(pixel16, dstfmt, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		PIXEL_FROM_RGB(*dstp, dstfmt, dR, dG, dB);
      	}
      	++srcp;
    

@@ -218,6 +219,7 @@
} else {
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -226,7 +228,7 @@
if ( 1 ) {
pixel16 = *dstp;
RGB_FROM_PIXEL(pixel16, dstfmt, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		PIXEL_FROM_RGB(*dstp, dstfmt, dR, dG, dB);
      	}
      	++srcp;
    

@@ -260,7 +262,7 @@
if ( 1 ) {
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src += srcbpp;
    

@@ -272,6 +274,7 @@
} else {
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -279,7 +282,7 @@
if ( 1 ) {
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src += srcbpp;
    

diff -ruN src/SDL-1.1.2/src/video/SDL_blit_AK.c src/SDL-1.1.2-new/src/video/SDL_blit_AK.c
— src/SDL-1.1.2/src/video/SDL_blit_AK.c Thu Mar 16 07:20:38 2000
+++ src/SDL-1.1.2-new/src/video/SDL_blit_AK.c Wed May 3 23:10:04 2000
@@ -64,7 +64,7 @@
sB = srcpal[bit].b;
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      	  	ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	byte <<= 1;
    

@@ -96,7 +96,7 @@
sB = srcpal[*src].b;
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      	  	ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src++;
    

@@ -129,7 +129,7 @@
dR = dstfmt->palette->colors[*dst].r;
dG = dstfmt->palette->colors[*dst].g;
dB = dstfmt->palette->colors[*dst].b;

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		/* Pack RGB into 8bit pixel */
      		if ( palmap == NULL ) {
      			*dst =((dR>>5)<<(3+2))|
    

@@ -151,6 +151,7 @@
Uint32 pixel;
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -159,7 +160,7 @@
dR = dstfmt->palette->colors[*dst].r;
dG = dstfmt->palette->colors[*dst].g;
dB = dstfmt->palette->colors[*dst].b;

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		/* Pack RGB into 8bit pixel */
      		if ( palmap == NULL ) {
      			*dst =((dR>>5)<<(3+2))|
    

@@ -206,7 +207,7 @@
if ( pixel32 != srcfmt->colorkey ) {
pixel16 = *dstp;
RGB_FROM_PIXEL(pixel16, dstfmt, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		PIXEL_FROM_RGB(*dstp, dstfmt, dR, dG, dB);
      	}
      	++srcp;
    

@@ -218,6 +219,7 @@
} else {
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -226,7 +228,7 @@
if ( pixel32 != srcfmt->colorkey ) {
pixel16 = *dstp;
RGB_FROM_PIXEL(pixel16, dstfmt, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		PIXEL_FROM_RGB(*dstp, dstfmt, dR, dG, dB);
      	}
      	++srcp;
    

@@ -260,7 +262,7 @@
if ( pixel != srcfmt->colorkey ) {
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, A, 255, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, A, 255, 8, dR, dG, dB);
      		ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src += srcbpp;
    

@@ -272,6 +274,7 @@
} else {
Uint8 sA;
const Uint8 Adiv = (srcfmt->Amask>>srcfmt->Ashift);

  •   const Uint8  Abits = (8-srcfmt->Aloss);
    
      while ( height-- ) {
      DUFFS_LOOP(
    

@@ -279,7 +282,7 @@
if ( pixel != srcfmt->colorkey ) {
DISEMBLE_RGB(dst, dstbpp, dstfmt,
pixel, dR, dG, dB);

  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, dR, dG, dB);
    
  •   		ALPHA_BLEND(sR, sG, sB, sA, Adiv, Abits, dR, dG, dB);
      		ASSEMBLE_RGB(dst, dstbpp, dstfmt, dR, dG, dB);
      	}
      	src += srcbpp;
    

=======END=======


Great news! Get free KNXmail here!
http://www.knx1070.com

Anyway, this is the last patch I plan on making to the
alpha-blitting code for a while, although I welcome any
questions or comments

It looks like there were a few problems in the cut-n-paste
for any line with a backslash in the middle of it.)
Below, please find the exact same patch as an email
attachment (hopefully without the line-break errors
from before.)

Sorry,

-Loren


Great news! Get free KNXmail here!
http://www.knx1070.com
-------------- next part --------------
A non-text attachment was scrubbed…
Name: alpha-blit.patch
Type: application/octet-stream
Size: 50 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20000504/543f7c87/attachment.objfrom my term window to the email web-interface. (Look

UNFORTUNATELY… I’ve now replaced a 6 line macro with a
180+ line macro (Ouch!)

Argh. This cannot be the answer, too many branches that stall your
pipeline (not being regular enough to be predicted).

  • if ( sizeof(int) >= 6 ) { /* Optimize for 64-bit processors */ \

You just optimized for a Cray. Alpha, Mips, IA-64 etc have 32-bit ints,
64-bit longs (under Unix, 32-bit longs under Windows).

UNFORTUNATELY… I’ve now replaced a 6 line macro with a
180+ line macro (Ouch!)

Argh. This cannot be the answer, too many branches that stall your
pipeline (not being regular enough to be predicted).

Hmm… Well if you look carefully I’m simply done three
conditional branches, followed by what was there before.
The sizeof(int) should optomize out at compile-time.
I could probably combine some of these (namely the 64-bit
and 32-bit versions have a good deal of overlap), but
alot of this (as I said above) should be optomized
out at compile-time. My basic thinking in breaking
out all 8 cases was to have to perform each comparison only
once… therefore saving cycles.

I’m open to suggestions if you can think of any… but this
is what came to mind.

  • if ( sizeof(int) >= 6 ) { /* Optimize for 64-bit processors */ \

You just optimized for a Cray. Alpha, Mips, IA-64 etc have 32-bit ints,
64-bit longs (under Unix, 32-bit longs under Windows).

Keep in mind this is only buying me 1 extra multiply 25% of
the time. This really isn’t that much of a time savings if
a 64-bit multiply isn’t as fast as a 32-bit multiply…
Like you said: The fastest type, almost by definition, is
called “int”.

Thanks for your comments.

Regards,

-Loren


Great news! Get free KNXmail here!
http://www.knx1070.com