Alpha blending bug - possible fix?

Daniel_F_Moisset · January 12, 2006, 9:29pm

Recently I posted about a problem with alpha blending that turned out to
be a problem with the generic blit function when there is no
acceleration, BlitNtoNPixelAlpha. Alex Volkov pointed me to the
following comment:

/* FIXME: for 8bpp source alpha, this doesn't get opaque values
   quite right. for <8bpp source alpha, it gets them very wrong
   (check all macros!)
   It is unclear whether there is a good general solution that doesn't
   need a branch (or a divide). */

The problem is a precision bug at the ALPHA_BLEND macro:

#define ALPHA_BLEND(sR, sG, sB, A, dR, dG, dB)
do {
dR = (((sR-dR)(A))>>8)+dR;
dG = (((sG-dG)(A))>>8)+dG;
dB = (((sB-dB)*(A))>>8)+dB;
} while(0)

you can make a slight correction changing this to:

#define ALPHA_BLEND(sR, sG, sB, A, dR, dG, dB)
do {
int premultR = (sR-dR)(A);
int premultG = (sG-dG)(A);
int premultB = (sB-dB)*(A);
dR += ((premultR>>8)+((A)>>7)+(premultR>>16);
dG += ((premultG>>8)+((A)>>7)+(premultG>>16);
dB += ((premultB>>8)+((A)>>7)+(premultB>>16);
} while(0)

That incurs into some extra shifts and adds, but no division or branch.
The correction is not better on the average (it is slightly worse,
although the maximum error in the function is the same), but it gives
equal or much better results at usual alpha values (0, 128, 255)

could this change be introduced? thanks a lot,

D.

PS: I tested the formula separately an it works, but have not tried
merging the above into SDL. perhaps something needs to be fixed first

Cheers,
D.–
Except - Free Software developers for hire - http://except.com.ar

slouken · January 19, 2006, 2:09pm

Recently I posted about a problem with alpha blending that turned out to
be a problem with the generic blit function when there is no
acceleration, BlitNtoNPixelAlpha. Alex Volkov pointed me to the
following comment:

/* FIXME: for 8bpp source alpha, this doesn’t get opaque values
quite right. for <8bpp source alpha, it gets them very wrong
(check all macros!)
It is unclear whether there is a good general solution that doesn’t
need a branch (or a divide). */

The problem is a precision bug at the ALPHA_BLEND macro:

#define ALPHA_BLEND(sR, sG, sB, A, dR, dG, dB)
do {
dR = (((sR-dR)(A))>>8)+dR;
dG = (((sG-dG)(A))>>8)+dG;
dB = (((sB-dB)*(A))>>8)+dB;
} while(0)

you can make a slight correction changing this to:

#define ALPHA_BLEND(sR, sG, sB, A, dR, dG, dB)
do {
int premultR = (sR-dR)(A);
int premultG = (sG-dG)(A);
int premultB = (sB-dB)*(A);
dR += ((premultR>>8)+((A)>>7)+(premultR>>16);
dG += ((premultG>>8)+((A)>>7)+(premultG>>16);
dB += ((premultB>>8)+((A)>>7)+(premultB>>16);
} while(0)

That incurs into some extra shifts and adds, but no division or branch.
The correction is not better on the average (it is slightly worse,
although the maximum error in the function is the same), but it gives
equal or much better results at usual alpha values (0, 128, 255)

could this change be introduced? thanks a lot,

Did you profile your code as opposed to simply dividing by 255?

-Sam Lantinga, Senior Software Engineer, Blizzard Entertainment

Stephane_Marchesin · January 21, 2006, 12:56am

Daniel F Moisset wrote:

Recently I posted about a problem with alpha blending that turned out to
be a problem with the generic blit function when there is no
acceleration, BlitNtoNPixelAlpha. Alex Volkov pointed me to the
following comment:

/* FIXME: for 8bpp source alpha, this doesn’t get opaque values
quite right. for <8bpp source alpha, it gets them very wrong
(check all macros!)
It is unclear whether there is a good general solution that doesn’t
need a branch (or a divide). */

The problem is a precision bug at the ALPHA_BLEND macro:

#define ALPHA_BLEND(sR, sG, sB, A, dR, dG, dB)
do {
dR = (((sR-dR)(A))>>8)+dR;
dG = (((sG-dG)(A))>>8)+dG;
dB = (((sB-dB)*(A))>>8)+dB;
} while(0)

you can make a slight correction changing this to:

#define ALPHA_BLEND(sR, sG, sB, A, dR, dG, dB)
do {
int premultR = (sR-dR)(A);
int premultG = (sG-dG)(A);
int premultB = (sB-dB)*(A);
dR += ((premultR>>8)+((A)>>7)+(premultR>>16);
dG += ((premultG>>8)+((A)>>7)+(premultG>>16);
dB += ((premultB>>8)+((A)>>7)+(premultB>>16);
} while(0)

That incurs into some extra shifts and adds, but no division or branch.
The correction is not better on the average (it is slightly worse,
although the maximum error in the function is the same), but it gives
equal or much better results at usual alpha values (0, 128, 255)

could this change be introduced? thanks a lot,

Is changing (even slightly) the alpha behaviour for one among many alpha
blitting functions a good idea ?

Stephane

Mattias_Karlsson · January 21, 2006, 12:58pm

Recently I posted about a problem with alpha blending that turned out to
be a problem with the generic blit function when there is no
acceleration, BlitNtoNPixelAlpha. Alex Volkov pointed me to the
following comment:

/* FIXME: for 8bpp source alpha, this doesn’t get opaque values
quite right. for <8bpp source alpha, it gets them very wrong
(check all macros!)
It is unclear whether there is a good general solution that doesn’t
need a branch (or a divide). */

The problem is a precision bug at the ALPHA_BLEND macro:

#define ALPHA_BLEND(sR, sG, sB, A, dR, dG, dB)
do {
dR = (((sR-dR)(A))>>8)+dR;
dG = (((sG-dG)(A))>>8)+dG;
dB = (((sB-dB)*(A))>>8)+dB;
} while(0)

you can make a slight correction changing this to:

#define ALPHA_BLEND(sR, sG, sB, A, dR, dG, dB)
do {
int premultR = (sR-dR)(A);
int premultG = (sG-dG)(A);
int premultB = (sB-dB)*(A);
dR += ((premultR>>8)+((A)>>7)+(premultR>>16);
dG += ((premultG>>8)+((A)>>7)+(premultG>>16);
dB += ((premultB>>8)+((A)>>7)+(premultB>>16);
} while(0)

That incurs into some extra shifts and adds, but no division or branch.
The correction is not better on the average (it is slightly worse,
although the maximum error in the function is the same), but it gives
equal or much better results at usual alpha values (0, 128, 255)

could this change be introduced? thanks a lot,

Did you profile your code as opposed to simply dividing by 255?

I have done some quick-and-dirty testing on both UltraSparc3 and
Xeon by blending two arrays. Some preliminary results:

gcc 3.4 replaces /255 with a multiply+shift on both processors.
The suggested replacement above is on average faster than division, but
slower than the current shifts.
On UltraSparc3 there is hardly any difference in speed between the
suggested replacement and using division, unless the arrays grow realy,
realy large.
The difference in time between cache-hit and cache-miss is larger than
the difference between shift and division; division + cache-hit is
faster than shift + cache-miss.

Note that this is not tested using SDL blitter code, but a seperate
implementation using the three different blending algorithms.

More tests are in progress…On Thu, 19 Jan 2006, Sam Lantinga wrote:

icculus · January 22, 2006, 8:27am

Recently I posted about a problem with alpha blending that turned out to
be a problem with the generic blit function when there is no
acceleration, BlitNtoNPixelAlpha. Alex Volkov pointed me to the
following comment:

I moved this to Bugzilla so it doesn’t get forgotten.
https://bugzilla.libsdl.org/show_bug.cgi?id=63

–ryan.