RLE + alpha performance issues on Win32

I’ve been trying to figure out why newer versions of Kobo Deluxe run
slower than the old ones (SKobo) on my Windoze box;

* P-II 233 (dual, but only one in use)
* Matrox G400 MAX, DDraw/D3D driver 4.12.01.1730
* Win95 4.00.950a
* DirectX 4.07.00.0700
* Matrox G400 DDraw/D3D driver 4.12.01.1730

I tracked it down to newer versions of the graphics engine processing
all graphics in RGBA 32bpp format, and then converting using
SDL_DisplayFormatAlpha(). If I use SDL_DisplayFormat() instead, I get
great frame rates, but of course, then I have no antialiasing, nor
colorkeying.

Kobo deals only with alpha, even when it’s used only to emulate
colorkeying. SDL’s RLE encoder is expected to do the Right Thing™,
but it seems like it doesn’t on Win32/DDraw.

I hacked a test program that can be found here:

http://www.olofson.net/download/rgbatest-0.1.tar.gz

Precompiled Win32 binary (not data files needed):

http://www.olofson.net/download/rgbatest.exe

By default, this program generates a 48x48 sprite image in 32 bit RGBA
format (only 0 and 255 alpha pixels; no translucency) and opens a
320x240 single buffered window with a s/w surface at the default bpp.
Then it converts the sprite using SDL_DisplayFormatAlpha(), and
starts a loop where it clears the screen, renders 100 sprites, and
flips using SDL_Flip(). Any click or key event stops the loop, and
before cleaning up and exiting, the program prints the average frame
rate to stdout.

Observations:

* 'rgbatest -c' (use SDL_DisplayFormat() with CK
  <0,0,0,0> instead of SDL_DisplayFormatAlpha())
  gives great frame rates on Win32. On X11, this
  switch has no impact on the frame rate, provided
  the sprites contain only 0 and 255 alpha pixels.

* The -a switch (antialiazed sprites) has no effect
  on the frame rate on Win32. Sprites with only 0
  and 255 alpha pixels are as slow as sprites with
  antialiazed edges.

* Indeed, the -t option (33% translucent sprites)
  has no effect on the frame rate on Win32. Opaque
  pixels are as expensive as blended pixels! On
  X11, this has a major cost, as one would expect.
  (Or rather, normal sprites are much faster than
  trancucent sprites on X11.)

* Rendering translucent sprites into VRAM (-h -t)
  is very, very slow (as expected), but doing it
  without translucency is no faster. However,
  using SDL colorkeying instead (-c) results in
  great performance. (-t has no effect together
  with -c, since SDL_DisplayFormat() removes the
  alpha channel anyway.)

* SDL 1.2.6 is much slower than with 1.2.3 when
  this alpha issue kicks in. SDL 1.2.3 does appear
  to have the same issue with alpha + RLE, though.

Apparently, when using SDL_DisplayFormatAlpha() on Win32/DX, alpha
blending is done for pixels with a == 255, whereas on X11, they’re
treated as opaque and just copied.

So, am I doing something stoopid, or what’s going on here?

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
http://olofson.nethttp://www.reologica.se