Alpha Blitting with Tables (was: Re: Bug in Alpha blitting?)

Mattias Engdeg?rd wrote:

static unsigned char table[256][256] = {{ (00)>>8, (01)>>8, (0*2)>>8,

Good point though; table-based optimization shouldn’t be underestimated,
even in these days of memory bandwidth worship.

Yes. Reminds me of a question Microsoft used to (and maybe still does)
give to interviewees: Write a routine to count the number of 1’s in a
byte as quickly as possible. The answer they wanted was to use a
256-byte table containing the precomputed values.

(At the time, I thought that was the most useless routine possible; but
10 years later, I ended up using it in the EDA world.)

Yes. Reminds me of a question Microsoft used to (and maybe still does)
give to interviewees: Write a routine to count the number of 1’s in a
byte as quickly as possible. The answer they wanted was to use a
256-byte table containing the precomputed values.

As always, the answer depends. Yes, it seems that table-driven methods
(unfortunately) still outperform the cool tricks on today’s platforms, but
it might not always be the case. The speed gap between CPU and memory is
increasing after all.

An answer I’d prefer, had I given the interview: “If it is really that
important, benchmark a couple of different methods, after searching
the literature.”

(At the time, I thought that was the most useless routine possible; but
10 years later, I ended up using it in the EDA world.)

For some reason, many supercomputer architectures have bit count
instructions in hardware. The story goes that it was on the insistence
of a particular (and very large) customer. No Such Agency?

If you are interested (perhaps it is needed in SDL one day :-), I have
a cool program that compares different methods of bit counting, posted by
Chris Torek to Usenet many years ago:
ftp://ptah.lnf.kth.se/pub/misc/bitcount.c.gz