Fastest way to add two 32bit pixels together

Piotr_Dubla · November 13, 2000, 1:24pm

Hi I have been working with single pixels and was wondering what the fastest
wasy is display a 32bit pixel on a surface and more importatantly how to get
a pixel from a surface. Also adding adding to 32bit pixels and being able to
clamp the RGB values.

Piotr Dubla

Mattias_Engdegard · November 13, 2000, 4:07pm

Also adding adding to 32bit pixels and being able to
clamp the RGB values.

Adding two pixels with clamping is an interesting exercise. It can be done
quite easily with the “multimedia” instructions that have appeared as an
extension (or intrinsic part) of most modern architectures, but that
requires assembler (or a very clever compiler). The challenge is in writing
it in a portable high-level language, without making it too slow.

I had some fun exploring this, and found some interesting results.
The trivial way is something like:

r = (src & 0xff0000) + (dst & 0xff0000);
if(r > 0xff0000)
r = 0xff0000;
g = (src & 0x00ff00) + (dst & 0x00ff00);
if(g > 0x00ff00)
g = 0x00ff00;
b = (src & 0x0000ff) + (dst & 0x0000ff);
if(b > 0x0000ff)
b = 0x0000ff;
dst = r | g | b;

but there are ways to write it without branches which are usually
quite a bit faster. If you like, I can send you some code, but I don’t
want to contaminate your thinking (I’m hoping that someone will do
better than I have)

David_Olofson · November 13, 2000, 3:35pm

Mon, 13 Nov 2000 Piotr Dubla wrote:

Hi I have been working with single pixels and was wondering what the fastest
wasy is display a 32bit pixel on a surface and more importatantly how to get
a pixel from a surface.

Either make sure you write correctly aligned words of the bus size, and DON’T
read from VRAM at all, even on AGP cards. DMA would be the fastest way, but
AFAIK, no driver arch on Linux supports system<->VRAM DMA. (Except for some
3D drivers supporting cards that can blit textures from the part of the sysram
that’s in the AGP aperture, I think.)

As to other platforms and SDL, I’m not sure what is supported and not - I’m
pretty new around here.

Also adding adding to 32bit pixels and being able to clamp the RGB values.

Adding: Mask away the LSBs and shift right first, then add - there will be no
overflows between the pixels. You’ll lose half a bit in rounding errors, but
that’s probably not the end of the world with 24 bit modes.

Clamping: Also called saturation, and supported in hardware by MMX for 8, 16
and 32 bit word sizes, all packed to 64 bits and processed in parallel. Other
methods include the usual conditional code or look-up tables, but both of these
are painfully expensive, at least on current workstation type CPUs, like the
P-II, P-III and K7.

//David