Some faster alpha blitting code

Hi,

a while a go I tried to get alpha blitting of per pixel images going faster.

Some c improvements were made, and some asm mmx/ prefetching stuff was
done as well. The asm uses gcc inline. It’s attached to this email.

I don’t really have time to add in the detection of mmx/prefetching
instructions the way sdl does it. I’d appreciate some help doing it The
SDL Way. Please :slight_smile: Pretty please with honey on top(you don’t want to
be eating too much sugar).

The other things left to do are to optionally use SSE prefetching
instead of 3dnow prefetching. Also using femms on k6 processors instead
of emms would be nice. Also need someone to translate it into msvc
inline asm.

If no one helps me with integrating the asm then just doing the c
optimization can speed things up by quite a bit. The major speed boost
from the asm came from the prefetching, not the mmx. Inlining the
function gave an ok speed boost too(1-2 fps).

The function to look for is BlitRGBtoRGBPixelAlpha.

These are FPS of two of my games I tested with, and then a comparison
between the new version and old. Other apps I tried this with had
similar increases in speed. To test use a game which uses 32 bit per
pixel alpha surfaces. These results are from my duron 850 with gcc
3.2.3. Haven’t tried it on anything else, so any bug reports/benchmarks
are welcomed.

With no optimizations.
bleten | dohfighters
average score: 21.7745967561 | 37.2473964489

Final results with all optimizations:
bleten | dohfighters
lowest score: 34.3333557719 |48.1210294823
highest score: 34.4127871498 |50.6342458682

50.6342458682 / 100 * 37.2473964489 == 18.85 % increase in fps.
34.4127871498 / 100 * 21.7745967561 == 7.493 % increase in fps.

Thanks to Matt Taylor the ever helpful asm guru from c.l.a.x86 for help
fixing some bugs.

-------------- next part --------------
A non-text attachment was scrubbed…
Name: SDL_blit_A.c.zip
Type: application/zip
Size: 5443 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20030417/bc99f76e/attachment.zip

Hi, it’s very cool!!
I think that the alpha blitting is the slowest task of SDL (in graphic), I
will try to port it to Visual C.
cheers> ----- Original Message -----

From: illumen@yahoo.com (Rene Dudfield)
To:
Sent: Friday, April 18, 2003 3:00 AM
Subject: [SDL] some faster alpha blitting code.

Hi,

a while a go I tried to get alpha blitting of per pixel images going
faster.

Some c improvements were made, and some asm mmx/ prefetching stuff was
done as well. The asm uses gcc inline. It’s attached to this email.

I don’t really have time to add in the detection of mmx/prefetching
instructions the way sdl does it. I’d appreciate some help doing it The
SDL Way. Please :slight_smile: Pretty please with honey on top(you don’t want to
be eating too much sugar).

The other things left to do are to optionally use SSE prefetching
instead of 3dnow prefetching. Also using femms on k6 processors instead
of emms would be nice. Also need someone to translate it into msvc
inline asm.

If no one helps me with integrating the asm then just doing the c
optimization can speed things up by quite a bit. The major speed boost
from the asm came from the prefetching, not the mmx. Inlining the
function gave an ok speed boost too(1-2 fps).

The function to look for is BlitRGBtoRGBPixelAlpha.

These are FPS of two of my games I tested with, and then a comparison
between the new version and old. Other apps I tried this with had
similar increases in speed. To test use a game which uses 32 bit per
pixel alpha surfaces. These results are from my duron 850 with gcc
3.2.3. Haven’t tried it on anything else, so any bug reports/benchmarks
are welcomed.

With no optimizations.
bleten | dohfighters
average score: 21.7745967561 | 37.2473964489

Final results with all optimizations:
bleten | dohfighters
lowest score: 34.3333557719 |48.1210294823
highest score: 34.4127871498 |50.6342458682

50.6342458682 / 100 * 37.2473964489 == 18.85 % increase in fps.
34.4127871498 / 100 * 21.7745967561 == 7.493 % increase in fps.

Thanks to Matt Taylor the ever helpful asm guru from c.l.a.x86 for help
fixing some bugs.

50.6342458682 / 100 * 37.2473964489 == 18.85 % increase in fps.
34.4127871498 / 100 * 21.7745967561 == 7.493 % increase in fps.

Good job… But in fact, it’s even better than what you think :wink:

37fps -> 50fps = 35% increase
22fps -> 34fps = 55% increase

-Gaetan.On Fri, 2003-04-18 at 03:00, Rene Dudfield wrote:

Gaetan de Menten wrote:>On Fri, 2003-04-18 at 03:00, Rene Dudfield wrote:

50.6342458682 / 100 * 37.2473964489 == 18.85 % increase in fps.
34.4127871498 / 100 * 21.7745967561 == 7.493 % increase in fps.

Good job… But in fact, it’s even better than what you think :wink:

37fps -> 50fps = 35% increase
22fps -> 34fps = 55% increase

-Gaetan.

aha! doh I’m and idiot :slight_smile:

Rene Dudfield writes:

50.6342458682 / 100 * 37.2473964489 == 18.85 % increase in fps.
34.4127871498 / 100 * 21.7745967561 == 7.493 % increase in fps.

Well, you need to check your percentage calculations. Correct
calculations paint an even better picture:

50.6342458682 / 37.2473964489 * 100 - 100 = 35.94% increase
34.4127871498 / 21.7745967561 * 100 - 100 = 58.04% increase–
[ Below is a random fortune, which is unrelated to the above message. ]
Let him choose out of my files, his projects to accomplish.
– Shakespeare, “Coriolanus”

Roberto Prieto wrote:

Hi, it’s very cool!!
I think that the alpha blitting is the slowest task of SDL (in graphic), I
will try to port it to Visual C.
cheers

Awesome! Please keep me informed of how it goes.

Hi Rene,
I was holidays and today I try to port to Visual C but I have a problem
with
macros and inline asm:

in the function BlitRGBtoRGBPixelAlpha is the next while:

while(height–)
{
DUFFS_LOOP4({
Uint32 dalpha;
Uint32 d;
Uint32 s1;
Uint32 d1;
Uint32 s = *srcp;
Uint32 alpha = s >> 24;


} , width);
srcp += srcskip;
dstp += dstskip;
}

Well, If I put, for example, any asm instrucction in DUFFS_LOOP4, the
visual
C give me a lot of errors, maybe I could delete macro and to code all but…
any idea about this?

Roberto Prieto wrote:

Hi Rene,
I was holidays and today I try to port to Visual C but I have a problem
with
macros and inline asm:

in the function BlitRGBtoRGBPixelAlpha is the next while:

while(height–)
{
DUFFS_LOOP4({
Uint32 dalpha;
Uint32 d;
Uint32 s1;
Uint32 d1;
Uint32 s = *srcp;
Uint32 alpha = s >> 24;


} , width);
srcp += srcskip;
dstp += dstskip;
}

Well, If I put, for example, any asm instrucction in DUFFS_LOOP4, the
visual
C give me a lot of errors, maybe I could delete macro and to code all but…
any idea about this?

Hi,

Sorry no ideas apart from expanding the macro :frowning:

Perhaps look for some other msvc code in sdl, which may have similar macros.

What are your errors, perhaps a msvc asm coder will know how to fix them?

how are you putting the asm inside of there? like this?

__asm{
mov ax, 20;
mov es, ax;
};> ----- Original Message -----

From: illumen@yahoo.com (Rene Dudfield)
To:
Sent: Monday, April 21, 2003 9:03 AM
Subject: Re: [SDL] some faster alpha blitting code.

Roberto Prieto wrote:

Hi Rene,
I was holidays and today I try to port to Visual C but I have a problem
with
macros and inline asm:

in the function BlitRGBtoRGBPixelAlpha is the next while:

while(height–)
{
DUFFS_LOOP4({
Uint32 dalpha;
Uint32 d;
Uint32 s1;
Uint32 d1;
Uint32 s = *srcp;
Uint32 alpha = s >> 24;


} , width);
srcp += srcskip;
dstp += dstskip;
}

Well, If I put, for example, any asm instrucction in DUFFS_LOOP4, the
visual
C give me a lot of errors, maybe I could delete macro and to code all
but…

any idea about this?

Hi,

Sorry no ideas apart from expanding the macro :frowning:

Perhaps look for some other msvc code in sdl, which may have similar
macros.

What are your errors, perhaps a msvc asm coder will know how to fix them?


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Hi,

no, the code is something like:

while(height–) {
DUFFS_LOOP4({
//*dstp = alpha_blend_mmx_pointer(srcp, dstp);
__asm
{
prefetch [srcp+64]
prefetch [dstp+64]
}

this asm code was an example but it gave me a lot of errors, I am coding an
unroll macro, on the other hand… where is the best place to check the mmx
support?
also, I am coding mmx alpha blend for surface and a new SDL_memcpy to speed
up blit but I need to check mmx support, something like global var but maybe
that SDL already has it.> ----- Original Message -----

From: atrix2@cox.net (atrix2)
To:
Sent: Monday, April 21, 2003 6:34 PM
Subject: Re: [SDL] some faster alpha blitting code.

how are you putting the asm inside of there? like this?

__asm{
mov ax, 20;
mov es, ax;
};

----- Original Message -----
From: “Rene Dudfield”
To:
Sent: Monday, April 21, 2003 9:03 AM
Subject: Re: [SDL] some faster alpha blitting code.

Roberto Prieto wrote:

Hi Rene,
I was holidays and today I try to port to Visual C but I have a
problem

with
macros and inline asm:

in the function BlitRGBtoRGBPixelAlpha is the next while:

while(height–)
{
DUFFS_LOOP4({
Uint32 dalpha;
Uint32 d;
Uint32 s1;
Uint32 d1;
Uint32 s = *srcp;
Uint32 alpha = s >> 24;


} , width);
srcp += srcskip;
dstp += dstskip;
}

Well, If I put, for example, any asm instrucction in DUFFS_LOOP4, the
visual
C give me a lot of errors, maybe I could delete macro and to code all
but…

any idea about this?

Hi,

Sorry no ideas apart from expanding the macro :frowning:

Perhaps look for some other msvc code in sdl, which may have similar
macros.

What are your errors, perhaps a msvc asm coder will know how to fix
them?


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

At 14:41 4/22/2003, Roberto Prieto wrote:

while(height–) {
DUFFS_LOOP4({
//*dstp = alpha_blend_mmx_pointer(srcp, dstp);
__asm
{
prefetch [srcp+64]
prefetch [dstp+64]
}

this asm code was an example but it gave me a lot of errors, I am coding an
unroll macro, on the other hand… where is the best place to check the mmx
support?

Not sure if you meant that “best place to check the mmx support” comment
with regard to the prefetch opcode. But IIRC prefetch was introduced with
SSE 1, not MMX.

Regards,
Dimitri

Yes and no, we are going to code with mmx(and sse1 too) in SDL blit code, in
this example I used “prefetch” but it is only an example :), although we are
using the prefetch :). Whatever I search is the best place to check mmx (and
sse1 too) support, any idea about this?> ----- Original Message -----

From: dimitri@shortcut.nl (Dimitri)
To:
Sent: Tuesday, April 22, 2003 2:56 PM
Subject: Re: [SDL] some faster alpha blitting code.

At 14:41 4/22/2003, Roberto Prieto wrote:

while(height–) {
DUFFS_LOOP4({
//*dstp = alpha_blend_mmx_pointer(srcp, dstp);
__asm
{
prefetch [srcp+64]
prefetch [dstp+64]
}

this asm code was an example but it gave me a lot of errors, I am coding
an

unroll macro, on the other hand… where is the best place to check the
mmx

support?

Not sure if you meant that “best place to check the mmx support” comment
with regard to the prefetch opcode. But IIRC prefetch was introduced with
SSE 1, not MMX.

Regards,
Dimitri


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

ive tried to use SIMD instructions in MSVC before for matrix/vector math but
it wouldnt take them. I think MSVC has a neutered inline asm component.> ----- Original Message -----

From: dm2@mi.madritel.es (Roberto Prieto)
To:
Sent: Tuesday, April 22, 2003 7:06 AM
Subject: Re: [SDL] some faster alpha blitting code.

Yes and no, we are going to code with mmx(and sse1 too) in SDL blit code,
in
this example I used “prefetch” but it is only an example :), although we
are
using the prefetch :). Whatever I search is the best place to check mmx
(and
sse1 too) support, any idea about this?

----- Original Message -----
From: “Dimitri”
To:
Sent: Tuesday, April 22, 2003 2:56 PM
Subject: Re: [SDL] some faster alpha blitting code.

At 14:41 4/22/2003, Roberto Prieto wrote:

while(height–) {
DUFFS_LOOP4({
//*dstp = alpha_blend_mmx_pointer(srcp, dstp);
__asm
{
prefetch [srcp+64]
prefetch [dstp+64]
}

this asm code was an example but it gave me a lot of errors, I am
coding
an

unroll macro, on the other hand… where is the best place to check the
mmx

support?

Not sure if you meant that “best place to check the mmx support” comment
with regard to the prefetch opcode. But IIRC prefetch was introduced
with

SSE 1, not MMX.

Regards,
Dimitri


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Cant you just #ifdef your way out of the problem?On 22-Apr-2003, Atrix Wolfe wrote:

ive tried to use SIMD instructions in MSVC before for matrix/vector math but
it wouldnt take them. I think MSVC has a neutered inline asm component.


Patrick “Diablo-D3” McFarland || unknown at panax.com
"Computer games don’t affect kids; I mean if Pac-Man affected us as kids, we’d
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." – Kristian Wilson, Nintendo, Inc, 1989