Combining per-pixel and per-surface alpha

I was going to write a simple fireworks demo, using small RGBA sparks
with decreasing per-surface alpha, but “per-pixel and per-surface alpha
cannot be combined; the per-pixel alpha is always used if available”
(from http://sdldoc.csn.ul.ie/sdlsetalpha.php).

So I think I could write some code for SDL to support combination of
per-pixel with per-surface alpha. For per-surface alpha 0, there’d be no
blitting, for 255 normal alpha blitting would be used, and for
everything between 1 and 254 every pixel’s alpha value would be
multiplied by per-surface alpha and shifted 8 bits right (yes, it should
be divided by 255, not 256 but it’d be much faster that way), except
maybe 128 and other powers of two where single right shift could be
used.

I’m also thinking about other “modes” of blitting surfaces, similar to
other than normal modes of displaying layers in Gimp, from which the
addition mode would be best for fireworks.

What do you think? I know it’d be slower and hardware acceleration
couldn’t be used (especially for such addition mode etc.), but it still
could be useful. Maybe there’s already someone working on it?

Merry X-mas! Happy New Year!

I was going to write a simple fireworks demo, using small RGBA sparks
with decreasing per-surface alpha, but “per-pixel and per-surface alpha
cannot be combined; the per-pixel alpha is always used if available”
(from http://sdldoc.csn.ul.ie/sdlsetalpha.php).

So I think I could write some code for SDL to support combination of
per-pixel with per-surface alpha. For per-surface alpha 0, there’d be no
blitting, for 255 normal alpha blitting would be used, and for
everything between 1 and 254 every pixel’s alpha value would be
multiplied by per-surface alpha and shifted 8 bits right (yes, it should
be divided by 255, not 256 but it’d be much faster that way), except
maybe 128 and other powers of two where single right shift could be
used.
I just had to do this a little while ago. I found by far the fastest
way to this was to pre-generate the entire list of results ahead of time
in a two dimensional array

Uint8 newAlpha[256][256];

Just step through the surface reassigning everything instead of doing
any math. It uses up a bit of memory, but the improvement in frame rate
was not trivial.On Wed, 2001-12-26 at 12:00, RaFaL Pocztarski wrote:

I’m also thinking about other “modes” of blitting surfaces, similar to
other than normal modes of displaying layers in Gimp, from which the
addition mode would be best for fireworks.

What do you think? I know it’d be slower and hardware acceleration
couldn’t be used (especially for such addition mode etc.), but it still
could be useful. Maybe there’s already someone working on it?

Merry X-mas! Happy New Year!

  • RaFaL Pocztarski, admin at rfl.pl

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

End of Rant.

Jimmy
JimmysWorld.org

Jimmy wrote:

I just had to do this a little while ago. I found by far the fastest
way to this was to pre-generate the entire list of results ahead of time
in a two dimensional array

Uint8 newAlpha[256][256];

Just step through the surface reassigning everything instead of doing
any math. It uses up a bit of memory, but the improvement in frame rate
was not trivial.

Good idea. 64kB is not a lot of memory today, especially for
applications using megabytes of graphics anyway. I almost forgot how I
optimized most of calculations that way. I remember when few years ago I
eliminated square root by filling the whole segment with results of sqrt
from 0 to 65535… I wasn’t even using sqrt to fill that segment! Those
were the good ol’ days! 500 lines of assembly for a simple animation…
:slight_smile:

Anyway, I think it’d be better to write an additional library for such
fancy blits, instead of messing with SDL. I’ll write code for that
anyway, so why not make it a library. I can name it SDL_slowblit or
something…

RaFaL Pocztarski wrote:

Good idea. 64kB is not a lot of memory today

64k can be a lot of L1 dcache

I can name it SDL_slowblit or
something…

please avoid using the “SDL” prefix for your own stuff

Mattias Engdeg?rd wrote:

RaFaL Pocztarski <@RaFaL_Pocztarski> wrote:

Good idea. 64kB is not a lot of memory today

64k can be a lot of L1 dcache

If it’s going to be used with 10MB of graphics, it’s < 1%.

I can name it SDL_slowblit or
something…

please avoid using the “SDL” prefix for your own stuff

It’s a joke. I’m not going to use “SDL” prefix and I’m not going to use
"slow" infix. Don’t worry.

RaFaL Pocztarski wrote:

64k can be a lot of L1 dcache

If it’s going to be used with 10MB of graphics, it’s < 1%.

which isn’t the point — locality of reference and working set size is

I was going to write a simple fireworks demo, using small RGBA sparks
with decreasing per-surface alpha, but “per-pixel and per-surface alpha
cannot be combined; the per-pixel alpha is always used if available”
(from http://sdldoc.csn.ul.ie/sdlsetalpha.php).

So I think I could write some code for SDL to support combination of
per-pixel with per-surface alpha. For per-surface alpha 0, there’d be
no blitting, for 255 normal alpha blitting would be used, and for
everything between 1 and 254 every pixel’s alpha value would be
multiplied by per-surface alpha and shifted 8 bits right (yes, it
should be divided by 255, not 256 but it’d be much faster that way),
except maybe 128 and other powers of two where single right shift could
be used.

Well, it would be a nice feature to have, but 3 or 4 multiplications and
the same number of shifts extra per pixel (plus the extra shuffling
around between registers for crap CPUs like x86) will result in quite a
performance hit - and alpha blending is slow enough as it is.

However, if it got in, I could remove the “RGBA XOR surface alpha” logic
in glSDL without breaking SDL compatibility. :wink:

(Actually, I can probably remove it anyway, as any code that would be
affected is broken.)

I’m also thinking about other “modes” of blitting surfaces, similar to
other than normal modes of displaying layers in Gimp, from which the
addition mode would be best for fireworks.

Additive blending is indeed useful, but unfortunately the required
saturation checks could be even slower than multiplications! (On Pentium
MMX and later, with 3 cycle MUL, that is.)

An MMX implementation could easilly be very, very fast, though, as MMX
has both saturating operations and test instructions that generate masks
rather than change the instruction flow.

What do you think? I know it’d be slower and hardware acceleration
couldn’t be used (especially for such addition mode etc.),

Currently, h/w acceleration cannot be used even for plain alpha, so
that’s not really an argument.

Anyway, h/w acceleration can be used for various blending operations
other than alpha. All you need is a driver and an API that support it.
Look at the OpenGL blending control - most 3D cards accelerate most of
the modes you can set up with it. (At least I know for sure that both
alpha and additive works on the G400. :slight_smile:

but it still could be useful. Maybe there’s already someone working on
it?

Well… I can just remove two or three lines on glSDL, and voila! :wink:

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Wednesday 26 December 2001 18:00, RaFaL Pocztarski wrote:

It’s just that many CPUs don’t even have 64k of L1 cache, and only 128 or
256 kB of L2 cache, so that doesn’t matter.

While “streaming” tons of data through the CPU, there’s a great risk that
the data you’re processing results in only a small part of the table
being used at a time. The rest will get flush, and you end up getting
cache misses when you eventually need those parts.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Thursday 27 December 2001 19:09, RaFaL Pocztarski wrote:

Mattias Engdeg?rd wrote:

RaFaL Pocztarski wrote:

Good idea. 64kB is not a lot of memory today

64k can be a lot of L1 dcache

If it’s going to be used with 10MB of graphics, it’s < 1%.

“David Olofson” <david.olofson at reologica.se> wrote in message
news:mailman.1010436783.18891.sdl at libsdl.org
| Well, it would be a nice feature to have, but 3 or 4
| multiplications and
| the same number of shifts extra per pixel (plus the extra shuffling
| around between registers for crap CPUs like x86) will result
| in quite a
| performance hit - and alpha blending is slow enough as it is.

Actually only one multiplication and one divide is required (where the
divide can be replaced by a shift with some loss of quality):

combined_alpha = (surface_alpha * pixel_alpha) / 255;–
Rainer Deyke (root at rainerdeyke.com)
Shareware computer games - http://rainerdeyke.com
"In ihren Reihen zu stehen heisst unter Feinden zu kaempfen" - Abigor

DOH! Of course - thanks for being awake there. :slight_smile:

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Tuesday 08 January 2002 04:59, Rainer Deyke wrote:

“David Olofson” <david.olofson at reologica.se> wrote in message
news:mailman.1010436783.18891.sdl at libsdl.org

| Well, it would be a nice feature to have, but 3 or 4
| multiplications and
| the same number of shifts extra per pixel (plus the extra shuffling
| around between registers for crap CPUs like x86) will result
| in quite a
| performance hit - and alpha blending is slow enough as it is.

Actually only one multiplication and one divide is required (where the
divide can be replaced by a shift with some loss of quality):

combined_alpha = (surface_alpha * pixel_alpha) / 255;