[patch] MMX alpha blit patches with MMX detection

Hi,

I’ve added runtime mmx cpu & prefetch detection to Rafal Bursig’s and
Rene Dudfield’s patches (those two patches add MMX to alpha blitting
functions).
The resulting patch is here :
http://icps.u-strasbg.fr/~marchesin/sdl_mmxalphablit.patch

Stephane

Stephane Marchesin wrote:

Hi,

I’ve added runtime mmx cpu & prefetch detection to Rafal Bursig’s and
Rene Dudfield’s patches (those two patches add MMX to alpha blitting
functions).
The resulting patch is here :
http://icps.u-strasbg.fr/~marchesin/sdl_mmxalphablit.patch

Stephane

Awesome :slight_smile:

why don’t you put this in cvs?> ----- Original Message -----

From: illumen@yahoo.com (Rene Dudfield)
To:
Sent: Tuesday, August 12, 2003 8:59 AM
Subject: Re: [SDL] [patch] MMX alpha blit patches with MMX detection

Stephane Marchesin wrote:

Hi,

I’ve added runtime mmx cpu & prefetch detection to Rafal Bursig’s and
Rene Dudfield’s patches (those two patches add MMX to alpha blitting
functions).
The resulting patch is here :
http://icps.u-strasbg.fr/~marchesin/sdl_mmxalphablit.patch

Stephane

Awesome :slight_smile:


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

why don’t you put this in cvs?

SDL_RLEaccel.c: In function SDL_RLEAlphaBlit': SDL_RLEaccel.c:1321: invalid lvalue in unary&’

…that’s why.

Could someone please dig through all those macros and figure out why gcc
2.95.3 doesn’t like that expression?

–ryan.

Ryan C. Gordon wrote:

why don’t you put this in cvs?

SDL_RLEaccel.c: In function SDL_RLEAlphaBlit': SDL_RLEaccel.c:1321: invalid lvalue in unary&’

…that’s why.

Could someone please dig through all those macros and figure out why gcc
2.95.3 doesn’t like that expression?

Ok, done (I had to fix another problem with gcc 2, btw).

The (now updated) patch is still there :
http://icps.u-strasbg.fr/~marchesin/sdl_mmxalphablit.patch

Stephane

why don’t you put this in cvs?

SDL_RLEaccel.c: In function SDL_RLEAlphaBlit': SDL_RLEaccel.c:1321: invalid lvalue in unary&’

…that’s why.

I also seem to remember that the patch changed the semantics of the destination
alpha channel.

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Sam Lantinga wrote:

why don’t you put this in cvs?

SDL_RLEaccel.c: In function SDL_RLEAlphaBlit': SDL_RLEaccel.c:1321: invalid lvalue in unary&’

…that’s why.

I also seem to remember that the patch changed the semantics of the destination
alpha channel.

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Yeah it did. I’m not sure why? Is there a good reason to change it?
Or is any change unacceptable?

My one function didn’t change the semantics. Which is the
BlitRGBtoRGBPixelAlphaMMXPrefetch one in the patch. But it doesn’t use
those nice macros only gcc inline asm.

Rene Dudfield wrote:

I also seem to remember that the patch changed the semantics of the
destination
alpha channel.

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Yeah it did. I’m not sure why? Is there a good reason to change it?
Or is any change unacceptable?

My one function didn’t change the semantics. Which is the
BlitRGBtoRGBPixelAlphaMMXPrefetch one in the patch. But it doesn’t
use those nice macros only gcc inline asm.

Well, SDL behaviour shouldn’t change because programs rely on it.

I also noticed some mistakes in the code. I’ll fix those and check that
every asm function behaviour matches its C equivalent.

And then I’ll repost…

Stephane

I also noticed some mistakes in the code. I’ll fix those and check that
every asm function behaviour matches its C equivalent.

And then I’ll repost…

Thanks!
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Sam Lantinga wrote:

And then I’ll repost…

Thanks!
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Hi,

I think everything is correct now. I’ve done as much testing as I could,
but some real-world testing wouldn’t hurt, I think.
The patch is here : http://icps.u-strasbg.fr/~marchesin/sdl_mmxblit.patch

If you do byte-by-byte comparison of the output between C and MMX
functions, you’ll notice that the results for 555 and 565 RGB alpha
blits aren’t exactly the same. This is because MMX functions for 555 and
565 RGB have an higher accuracy. If you want the exact same behaviour
that’s possible by masking the three lower alpha bits in the MMX
functions. Just ask !

I removed one MMX function because after I fixed it to match its C
equivalent, it revealed to be slower than the C version on a PIII
(although a bit faster on an Athlon XP).

I’ve also added MMX and PIII replacements for SDL_memcpy. Those provide
some speed up in testvidinfo -benchmark (at least for me, under linux &
X11).

Stephane

I’ve also added MMX and PIII replacements for SDL_memcpy. Those provide
some speed up in testvidinfo -benchmark (at least for me, under linux &
X11).

Thanks! I’ll take a look when I get a chance! :slight_smile:

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment

I think everything is correct now. I’ve done as much testing as I could,
but some real-world testing wouldn’t hurt, I think.
The patch is here : http://icps.u-strasbg.fr/~marchesin/sdl_mmxblit.patch

Looks good, I added it to CVS.

What tests did you run to compare performance of the blitting functions?

See ya!
-Sam Lantinga, Software Engineer, Blizzard Entertainment

Sam Lantinga wrote:

I think everything is correct now. I’ve done as much testing as I could,
but some real-world testing wouldn’t hurt, I think.
The patch is here : http://icps.u-strasbg.fr/~marchesin/sdl_mmxblit.patch

Looks good, I added it to CVS.

Great :slight_smile:

What tests did you run to compare performance of the blitting functions?

I used quick & dirty benchmarks like this :

    start=SDL_GetTicks();
    for(j = 0; j < TIMES; j++)
            for(i = 0; i < SIZE ; i++)
            BLIT_TRANSL_888(src1[i], dst1[i]);
    end=SDL_GetTicks();
    printf("time taken : %d\n",end-start);

    start=SDL_GetTicks();
    for(j = 0; j < TIMES; j++)
            for(i = 0; i < SIZE ; i++)
            BLIT_TRANSL_888MMX(src2[i], dst2[i]);
    end=SDL_GetTicks();
    printf("time taken : %d\n",end,end-start);

Of course, I had to cut & paste the functions on the source code of the
benchmark (yes, dirty).

I didn’t find any benchmark testing the alpha blitting functions (and
testing them all would mean having to switch video depth anyway) so I
used little hacks like this. Maybe an extension to testvidinfo could
test alpha blits…

Stephane

Stephane Marchesin wrote:

Sam Lantinga wrote:

I think everything is correct now. I’ve done as much testing as I
could, but some real-world testing wouldn’t hurt, I think.
The patch is here :
http://icps.u-strasbg.fr/~marchesin/sdl_mmxblit.patch

Looks good, I added it to CVS.

Great :slight_smile:

What tests did you run to compare performance of the blitting functions?

I used quick & dirty benchmarks like this :

   start=SDL_GetTicks();
   for(j = 0; j < TIMES; j++)
           for(i = 0; i < SIZE ; i++)
           BLIT_TRANSL_888(src1[i], dst1[i]);
   end=SDL_GetTicks();
   printf("time taken : %d\n",end-start);

   start=SDL_GetTicks();
   for(j = 0; j < TIMES; j++)
           for(i = 0; i < SIZE ; i++)
           BLIT_TRANSL_888MMX(src2[i], dst2[i]);
   end=SDL_GetTicks();
   printf("time taken : %d\n",end,end-start);

Of course, I had to cut & paste the functions on the source code of
the benchmark (yes, dirty).

I didn’t find any benchmark testing the alpha blitting functions (and
testing them all would mean having to switch video depth anyway) so I
used little hacks like this. Maybe an extension to testvidinfo could
test alpha blits…

Stephane

Perhaps everyone on the list could test their projects(or others games)
with the latest cvs and compare the fps to the last release? Report
your results here :slight_smile:

These are my results from before:
50.6342458682 / 37.2473964489 * 100 - 100 = 35.94% increase
34.4127871498 / 21.7745967561 * 100 - 100 = 58.04% increase

That was on a duron 850, running linux with 32 bit, per pixel alpha images.

These are my results from before:
50.6342458682 / 37.2473964489 * 100 - 100 = 35.94% increase
34.4127871498 / 21.7745967561 * 100 - 100 = 58.04% increase

That was on a duron 850, running linux with 32 bit, per pixel alpha images.

Sweet. :slight_smile:

See ya!
-Sam Lantinga, Software Engineer, Blizzard Entertainment

(I’m new here)

But I have a question regarding the alpha blending code… If your
source image pixel has a color=255(bright) and alpha=255(opaque), you
would expect the output color to be 255, right?

After briefly looking at the MMX math, it appears to multiply color by
alpha, then divide by 256. This would yield an output color of 254 in
the above case:
truncate((255 * 255) / 256) = 254

Am I correct?

-SeanOn Thursday, August 21, 2003, at 11:52 PM, Sam Lantinga wrote:

I think everything is correct now. I’ve done as much testing as I
could,
but some real-world testing wouldn’t hurt, I think.
The patch is here :
http://icps.u-strasbg.fr/~marchesin/sdl_mmxblit.patch

Looks good, I added it to CVS.

What tests did you run to compare performance of the blitting
functions?

See ya!
-Sam Lantinga, Software Engineer, Blizzard Entertainment


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Probably doesnt matter if you are right. The new code is faster enough that it
isnt worth it to care. SDL isnt known for its blitting accuracy.On 22-Aug-2003, Sean Gies wrote:

(I’m new here)

But I have a question regarding the alpha blending code… If your
source image pixel has a color=255(bright) and alpha=255(opaque), you
would expect the output color to be 255, right?

After briefly looking at the MMX math, it appears to multiply color by
alpha, then divide by 256. This would yield an output color of 254 in
the above case:
truncate((255 * 255) / 256) = 254


Patrick “Diablo-D3” McFarland || unknown at panax.com
"Computer games don’t affect kids; I mean if Pac-Man affected us as kids, we’d
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." – Kristian Wilson, Nintendo, Inc, 1989

Sean Gies wrote:

(I’m new here)

But I have a question regarding the alpha blending code… If your
source image pixel has a color=255(bright) and alpha=255(opaque), you
would expect the output color to be 255, right?

After briefly looking at the MMX math, it appears to multiply color by
alpha, then divide by 256. This would yield an output color of 254 in
the above case:
truncate((255 * 255) / 256) = 254

Am I correct?

Yes.

Now look at the C code surrouding it, not only at the patch. It has the
same behaviour (divides by 256, notice the ">> 8"s here and there). This
issue has always been there, you just didn’t notice it before.

In most of the functions, for example in BlitRGBtoRGBPixelAlpha(), there
is a special case for (alpha == SDL_ALPHA_OPAQUE) that the MMX functions
have too. So the MMX functions reproduct the same behaviour as the C
ones and the result of blitting is 255 in both cases. The only place
where there is no such special case is BlitRGBtoRGBSurfaceAlpha() (don’t
ask me why, because I don’t know :slight_smile:

Anyway, we aren’t able to visually tell the difference between
(255,255,255) and (254,254,254). That means the only case I can think of
where this would be a problem is when you blit a surface with surface
alpha to another surface and then add colorkeying to the resulting surface.

Stephane