Another RLE patch

Here’s another attempt to fix the RLE blitting, drawing on Xark’s improvements
to the original code. I have changed the encoding to an opcodeless form,
and benchmarks (on Sparc and x86) suggest that it is a little faster than
Xark’s code, which in turn is much faster than Sam’s :slight_smile:

The code is no more complex than before, but the binary is likely to be a
little larger due to macro expansion. Since most of it will never enter
I-cache anyway, this has no impact on performance and speeding up such a
critical component should be well worth it.

The code has undergone some systematic testing, and appears to work well,
but I still would like people to try it on real code, as opposed to
synthetic benchmarks. It should make a difference on code limited by how
fast they can spit out sprites on the screen (like all good shooters :-).

The complete source file (a patch made little sense, most of it has changed)
can be found at ftp://ptah.lnf.kth.se/pub/misc/SDL_RLEaccel.c.gz .

Many thanks to Xark for code, helpful discussions and benchmarks.

– Mattias (Yorick)

Mattias Engdeg?rd wrote:

Here’s another attempt to fix the RLE blitting, drawing on Xark’s improvements
to the original code. I have changed the encoding to an opcodeless form,
and benchmarks (on Sparc and x86) suggest that it is a little faster than
Xark’s code, which in turn is much faster than Sam’s :slight_smile:

The code is no more complex than before, but the binary is likely to be a
little larger due to macro expansion. Since most of it will never enter
I-cache anyway, this has no impact on performance and speeding up such a
critical component should be well worth it.

The code has undergone some systematic testing, and appears to work well,
but I still would like people to try it on real code, as opposed to
synthetic benchmarks. It should make a difference on code limited by how
fast they can spit out sprites on the screen (like all good shooters :-).

The complete source file (a patch made little sense, most of it has changed)
can be found at ftp://ptah.lnf.kth.se/pub/misc/SDL_RLEaccel.c.gz .

Many thanks to Xark for code, helpful discussions and benchmarks.

This implementation is very simular to my. I currently think about to add
a table of the line starts, than equal rows could be compressed and the
clipping on the top edge is a little faster.

Also an alpha blending and translucent functions whould be nice.

Do you have made any thoughts about this?

Bye,
Johns–
Become famous, earn no money, create graphics for FreeCraft.

http://FreeCraft.Org - A free fantasy real-time strategy game engine
http://fgp.cjb.net - The FreeCraft Graphics Project

This implementation is very simular to my. I currently think about to add
a table of the line starts, than equal rows could be compressed and the
clipping on the top edge is a little faster.

Yes, sorry, I should have credited you as well (you gave me the idea).

I first allowed skips (but not runs) to wrap around line boundaries, and
while this saved some space it made the decoder more complex so I decided
that it wasn’t worth it. A line index might be, but only for clipped blits,
and the by far most common case is the unclipped one. Also, it would only
help with clipping at the top.

It could help with really big overlays (or boss monsters :-), but those
are perhaps best done subdivided into smaller tiles anyway.

Also an alpha blending and translucent functions whould be nice.

Yes. Two cases:

  1. constant alpha for the entire surface: The same encoding can be used,
    just the decoder needs replacement. alpha=0.5 should be special-cased since
    it can be done very efficiently.

  2. per-pixel alpha: This may need a new encoding, if we believe that
    alpha=1.0 and 0.0 are more common than intermediary values, which probably
    is the case (think of sprites with antialiased edges). This likely needs
    an opcode-based encoding, similar to SDL’s original one.

Drop by #sdl and we could have a chat about it.

– Mattias (Yorick)