Generalised BitBlit proposal

Dear all,

I have been thinking about the problem of optimising bitblit operations, and I
believe that I have found a good general solution, though it is far from
complete.

The following is a general description of my proposed solution. I would be
grateful for any criticism, comments, suggestions, etc. I have a prototype
mostly implemented, which currently interprets a parsed tree structure,
though I am in the process of writing a simple compiler. Though my background
is in ARM coding and optimisation, I’m tackling x86 as best I can.

The correct model to use, I believe, is that of regex facilities in many
libraries, including GNU libc. In essence, the intended result is specified
in a suitable language. This is compiled, and may then be executed as and
when needed.

A blit is specified using a language vaguely inspired by LISP. The destination
pixel is specified in terms of the coordinates of that pixel. For example,
here is a ‘zero fill’ operation:

zerofill16bpp(x y)=(setwidth 0 16)

And here is a ‘fill with interesting colours’ operation:

myblit(x y)=(packbits (setwidth x 5) (setwidth y 6) (setwidth x 5))

General characteristics of the destination surface (bpp, etc) and blit rect
are supplied to the compiler. Flags allow optimisation for special cases. For
example, if the blit rect is fixed.

A blit can have 0 or more sources. Each of these have their own
characteristics and flags.

Here is a simple ‘copy’ blit. It copies from source 0:

copyblit16bpp(x y)=(fetch 0 x y)

More interesting, here is a simple 1-bit mask. Source 2 is the same as the
destination, source 1 is the mask, and source 0 is the image.

maskblit16bpp(x y)=(if (eq (fetch 1 x y) 1) (fetch 0 x y) (fetch 2 x y))

(Highly optimised image processing filters are also a distinct possibility.)

It is important to note a few important things:

  • The language described is inherently parallel. It does not specify ordering.
  • The operations described operate (theoretically) on pixels or usually small
    numbers of bits. These can often be combined into a 32-bit or wider
    operation.
  • The compiler can easily discover CPU features such as MMX and optimise at
    runtime.

As such, it is relatively easy to optimise for specific features. (I say
’relatively’ easy, since optimisation is not particularly easy.)

I’d be grateful for any comments. Would people find this useful? Would anyone
be interested in collaborating on this?

Jake Waskett

I’d be grateful for any comments. Would people find this useful? Would anyone
be interested in collaborating on this?

Are you basically suggesting we add CPU-bound pixel shaders to SDL?

There’s a reason blits are piles of nasty code, usually using magic
macros and inline assembly…I don’t know that replacing them with
scripts are a good idea.

–ryan.

I’d be grateful for any comments. Would people find this useful? Would
anyone be interested in collaborating on this?

Are you basically suggesting we add CPU-bound pixel shaders to SDL?

In essence, I guess, though at a higher level of abstraction and - in
principle at least - not necessarily implemented on the CPU.

There’s a reason blits are piles of nasty code, usually using magic
macros and inline assembly…I don’t know that replacing them with
scripts are a good idea.

Your characterisation of blits as nasty code is absolutely right, and to my
mind this is the best reason for automatically generating them. They’re hard
to write, understand, and port, bug-prone, and need to be optimised for
several different architectures. These are exactly the same reasons why
compilers were invented.

A blit compiler is certainly possible to write, and do so efficiently.
Microsoft did so for the GDI blits in Windoze, and the ReactOS developers
have apparently written somewhat better-optimised code for the same task
(I’ve no idea if they’ve taken advantage of SIMD facilities). These are less
general-purpose than my proposal, and couldn’t for example handle
alpha-blending. But they demonstrate that such a thing can be done.

JakeOn Tuesday 27 September 2005 13:18, Ryan C. Gordon wrote:

–ryan.


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Your characterisation of blits as nasty code is absolutely right, and to my
mind this is the best reason for automatically generating them. They’re hard
to write, understand, and port, bug-prone, and need to be optimised for
several different architectures. These are exactly the same reasons why
compilers were invented.

And we use a compiler to generate the machine code already, at least for
the C fallbacks. I don’t see a practical use to writing a simple
compiler, and targeting it at all sorts of processors because the
existing blitting code is inelegant. I think it would turn out to be a
lot of work that adds a ton of code to the project, turns out to be
messy and buggy as well, and the final output would underperform
compared to the existing blitters.

I don’t mind being proven wrong, though. Feel free to take a shot at it
if you are so inclined and we’ll address this anew with working code.

–ryan.