Proposal: remove DUFFS_LOOP macro from SDL blit code

I’ve got a strange problem that goes something like this:

Load two surfaces with SDL_Image.? Call them A and B.
Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for all the mask values.? This creates a surface that does not necessarily have the same pixel format as the two surfaces already loaded.? Call it Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target.? B contains an alpha channel, and I expect Target to contain an image overlaying B onto A.? Instead, I end up with the entirety of Target having an alpha of 0.

I traced into SDL in the debugger, trying to figure out what was going wrong, only to be stopped cold inside of BlitNtoN because the meat of the function takes place inside a silly DUFFS_LOOP macro.? You can’t trace into a macro like you can in a function call, and this seems like a strange thing to have in there anyway, for two reasons.

First, Duff’s Device was designed to squeeze every last cycle out of direct memory copies, back when cycles were a lot more expensive than they are now, but there’s a sizeable block of code in the body of the loop here, not just a simple “*dest++ = *src++” type thing.
Second, Duff’s Device was invented back before a lot of modern optimizations, both in compiler design and in CPU architecture, and according to the Wikipedia article, it tends to screw some of them up rather badly, leading to a net decrease in performance.

So it hurts readability, debugabilty and performance on modern computers.? Could we just replace it with a simple loop please?

You can disable the use of DUFFS_LOOP in SDL_blit.h by undef’ing the macro
USE_DUFFS_LOOP. Then you will get the simple for loop you’re looking for in
order to debug your issue.

As a point of reference, the PNG, ZLIB, and OpenSSL libraries all employ similar
loop-unrolling optimizations.On 02/17/2013 10:36 AM, Mason Wheeler wrote:

I’ve got a strange problem that goes something like this:

Load two surfaces with SDL_Image. Call them A and B.
Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for all the mask
values. This creates a surface that does not necessarily have the same pixel
format as the two surfaces already loaded. Call it Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target. B contains an alpha channel, and I expect
Target to contain an image overlaying B onto A. Instead, I end up with the
entirety of Target having an alpha of 0.

I traced into SDL in the debugger, trying to figure out what was going wrong,
only to be stopped cold inside of BlitNtoN because the meat of the function
takes place inside a silly DUFFS_LOOP macro. You can’t trace into a macro like
you can in a function call, and this seems like a strange thing to have in there
anyway, for two reasons.

First, Duff’s Device was designed to squeeze every last cycle out of direct
memory copies, back when cycles were a lot more expensive than they are now, but
there’s a sizeable block of code in the body of the loop here, not just a simple
"*dest++ = *src++" type thing.
Second, Duff’s Device was invented back before a lot of modern optimizations,
both in compiler design and in CPU architecture, and according to the Wikipedia
article, it tends to screw some of them up rather badly, leading to a net
decrease in performance.

So it hurts readability, debugabilty and performance on modern computers. Could
we just replace it with a simple loop please?


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Umm… there’s nothing in the code that would respond to a #define like that.________________________________
From: John
To: sdl at lists.libsdl.org
Sent: Sunday, February 17, 2013 9:07 AM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit code

You can disable the use of DUFFS_LOOP in SDL_blit.h by undef’ing the macro
USE_DUFFS_LOOP. Then you will get the simple for loop you’re looking for in
order to debug your issue.

As a point of reference, the PNG, ZLIB, and OpenSSL libraries all employ similar
loop-unrolling optimizations.

On 02/17/2013 10:36 AM, Mason Wheeler wrote:

I’ve got a strange problem that goes something like this:

Load two surfaces with SDL_Image.? Call them A and B.
Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for all the mask
values.? This creates a surface that does not necessarily have the same pixel
format as the two surfaces already loaded.? Call it Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target.? B contains an alpha channel, and I expect
Target to contain an image overlaying B onto A.? Instead, I end up with the
entirety of Target having an alpha of 0.

I traced into SDL in the debugger, trying to figure out what was going wrong,
only to be stopped cold inside of BlitNtoN because the meat of the function
takes place inside a silly DUFFS_LOOP macro.? You can’t trace into a macro like
you can in a function call, and this seems like a strange thing to have in there
anyway, for two reasons.

First, Duff’s Device was designed to squeeze every last cycle out of direct
memory copies, back when cycles were a lot more expensive than they are now, but
there’s a sizeable block of code in the body of the loop here, not just a simple
"*dest++ = *src++" type thing.
Second, Duff’s Device was invented back before a lot of modern optimizations,
both in compiler design and in CPU architecture, and according to the Wikipedia
article, it tends to screw some of them up rather badly, leading to a net
decrease in performance.

So it hurts readability, debugabilty and performance on modern computers.? Could
we just replace it with a simple loop please?


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

It sounds like your symptom is of blending into a fully transparent
surface. It’s not clear to me why Target would have an alpha channel at
all if you passed an Amask of 0. Have you tried SDL_SetSurfaceBlendMode()
to explicitly tell SDL how to compose the surfaces?

Jonny DOn Sun, Feb 17, 2013 at 12:18 PM, Mason Wheeler wrote:

Umm… there’s nothing in the code that would respond to a #define like
that.


From: John
To: sdl at lists.libsdl.org
Sent: Sunday, February 17, 2013 9:07 AM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit code

You can disable the use of DUFFS_LOOP in SDL_blit.h by undef’ing the macro
USE_DUFFS_LOOP. Then you will get the simple for loop you’re looking for
in
order to debug your issue.

As a point of reference, the PNG, ZLIB, and OpenSSL libraries all employ
similar
loop-unrolling optimizations.

On 02/17/2013 10:36 AM, Mason Wheeler wrote:

I’ve got a strange problem that goes something like this:

Load two surfaces with SDL_Image. Call them A and B.
Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for all
the mask
values. This creates a surface that does not necessarily have the same
pixel
format as the two surfaces already loaded. Call it Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target. B contains an alpha channel, and I
expect
Target to contain an image overlaying B onto A. Instead, I end up with
the
entirety of Target having an alpha of 0.

I traced into SDL in the debugger, trying to figure out what was going
wrong,
only to be stopped cold inside of BlitNtoN because the meat of the
function
takes place inside a silly DUFFS_LOOP macro. You can’t trace into a
macro like
you can in a function call, and this seems like a strange thing to have
in there
anyway, for two reasons.

First, Duff’s Device was designed to squeeze every last cycle out of
direct
memory copies, back when cycles were a lot more expensive than they are
now, but
there’s a sizeable block of code in the body of the loop here, not just
a simple
"*dest++ = *src++" type thing.
Second, Duff’s Device was invented back before a lot of modern
optimizations,
both in compiler design and in CPU architecture, and according to the
Wikipedia
article, it tends to screw some of them up rather badly, leading to a net
decrease in performance.

So it hurts readability, debugabilty and performance on modern
computers. Could
we just replace it with a simple loop please?


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Regarding duff loops, actually it seems that unrolling loops in the
source code hurts optimizations. I know because I made that stupid
mistake, I had a loop that would make 16 writes each iteration and was
still taking too long for my taste, I decided to replace it with 1
write per iteration just to try and as you can imagine speed
skyrocketed (compared to its original performance). Optimizations in
the source code actually hurt compiler optimizations and result in
slower code, not faster.

2013/2/17, Jonathan Dearborn :> It sounds like your symptom is of blending into a fully transparent

surface. It’s not clear to me why Target would have an alpha channel at
all if you passed an Amask of 0. Have you tried SDL_SetSurfaceBlendMode()
to explicitly tell SDL how to compose the surfaces?

Jonny D

On Sun, Feb 17, 2013 at 12:18 PM, Mason Wheeler wrote:

Umm… there’s nothing in the code that would respond to a #define like
that.


From: John
To: sdl at lists.libsdl.org
Sent: Sunday, February 17, 2013 9:07 AM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit code

You can disable the use of DUFFS_LOOP in SDL_blit.h by undef’ing the
macro
USE_DUFFS_LOOP. Then you will get the simple for loop you’re looking
for
in
order to debug your issue.

As a point of reference, the PNG, ZLIB, and OpenSSL libraries all employ
similar
loop-unrolling optimizations.

On 02/17/2013 10:36 AM, Mason Wheeler wrote:

I’ve got a strange problem that goes something like this:

Load two surfaces with SDL_Image. Call them A and B.
Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for all
the mask
values. This creates a surface that does not necessarily have the same
pixel
format as the two surfaces already loaded. Call it Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target. B contains an alpha channel, and I
expect
Target to contain an image overlaying B onto A. Instead, I end up with
the
entirety of Target having an alpha of 0.

I traced into SDL in the debugger, trying to figure out what was going
wrong,
only to be stopped cold inside of BlitNtoN because the meat of the
function
takes place inside a silly DUFFS_LOOP macro. You can’t trace into a
macro like
you can in a function call, and this seems like a strange thing to have
in there
anyway, for two reasons.

First, Duff’s Device was designed to squeeze every last cycle out of
direct
memory copies, back when cycles were a lot more expensive than they are
now, but
there’s a sizeable block of code in the body of the loop here, not just
a simple
"*dest++ = *src++" type thing.
Second, Duff’s Device was invented back before a lot of modern
optimizations,
both in compiler design and in CPU architecture, and according to the
Wikipedia
article, it tends to screw some of them up rather badly, leading to a
net
decrease in performance.

So it hurts readability, debugabilty and performance on modern
computers. Could
we just replace it with a simple loop please?


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

As John said, you can turn it off by commenting out USE_DUFFS_LOOP in
SDL_blit.h

It was originally added when I was optimizing the blitters and that yielded
about 15% speed increase over standard loops with gcc using -O2.

If you a proposing disabling it by default, I’m not opposed, but I would
suggest benchmarking first.

Regarding your original issue, the SDL destination alpha semantics are a
little weird. From the SDL documentation:

RGBA->RGB:
  SDL_SRCALPHA set:
    alpha-blend (using alpha-channel).
    SDL_SRCCOLORKEY ignored.
  SDL_SRCALPHA not set:
    copy RGB.
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    RGB values of the source colour key, ignoring alpha in the
    comparison.

RGB->RGBA:
  SDL_SRCALPHA set:
    alpha-blend (using the source per-surface alpha value);
    set destination alpha to opaque.
  SDL_SRCALPHA not set:
    copy RGB, set destination alpha to source per-surface alpha value.
  both:
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    source colour key.

RGBA->RGBA:
  SDL_SRCALPHA set:
    alpha-blend (using the source alpha channel) the RGB values;
    leave destination alpha untouched. [Note: is this correct?]
    SDL_SRCCOLORKEY ignored.
  SDL_SRCALPHA not set:
    copy all of RGBA to the destination.
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    RGB values of the source colour key, ignoring alpha in the
   comparison.

RGB->RGB:
  SDL_SRCALPHA set:
    alpha-blend (using the source per-surface alpha value).
  SDL_SRCALPHA not set:
    copy RGB.
  both:
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    source colour key.On Sun, Feb 17, 2013 at 7:36 AM, Mason Wheeler <masonwheeler at yahoo.com>wrote:

I’ve got a strange problem that goes something like this:

Load two surfaces with SDL_Image. Call them A and B.
Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for all the
mask values. This creates a surface that does not necessarily have the
same pixel format as the two surfaces already loaded. Call it Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target. B contains an alpha channel, and I
expect Target to contain an image overlaying B onto A. Instead, I end up
with the entirety of Target having an alpha of 0.

I traced into SDL in the debugger, trying to figure out what was going
wrong, only to be stopped cold inside of BlitNtoN because the meat of the
function takes place inside a silly DUFFS_LOOP macro. You can’t trace into
a macro like you can in a function call, and this seems like a strange
thing to have in there anyway, for two reasons.

First, Duff’s Device was designed to squeeze every last cycle out of
direct memory copies, back when cycles were a lot more expensive than they
are now, but there’s a sizeable block of code in the body of the loop here,
not just a simple “*dest++ = *src++” type thing.
Second, Duff’s Device was invented back before a lot of modern
optimizations, both in compiler design and in CPU architecture, and
according to the Wikipedia article, it tends to screw some of them up
rather badly, leading to a net decrease in performance.

So it hurts readability, debugabilty and performance on modern computers.
Could we just replace it with a simple loop please?


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

At least, maybe we could change USE_DUFFS_LOOP to DONT_USE_DUFFS_LOOP
instead so you can disable it without editing the source?

Jonny DOn Mon, Feb 18, 2013 at 2:25 AM, Sam Lantinga wrote:

As John said, you can turn it off by commenting out USE_DUFFS_LOOP in
SDL_blit.h

It was originally added when I was optimizing the blitters and that
yielded about 15% speed increase over standard loops with gcc using -O2.

If you a proposing disabling it by default, I’m not opposed, but I would
suggest benchmarking first.

Regarding your original issue, the SDL destination alpha semantics are a
little weird. From the SDL documentation:

RGBA->RGB:
  SDL_SRCALPHA set:
    alpha-blend (using alpha-channel).
    SDL_SRCCOLORKEY ignored.
  SDL_SRCALPHA not set:
    copy RGB.
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    RGB values of the source colour key, ignoring alpha in the
    comparison.

RGB->RGBA:
  SDL_SRCALPHA set:
    alpha-blend (using the source per-surface alpha value);
    set destination alpha to opaque.
  SDL_SRCALPHA not set:
    copy RGB, set destination alpha to source per-surface alpha value.
  both:
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    source colour key.

RGBA->RGBA:
  SDL_SRCALPHA set:
    alpha-blend (using the source alpha channel) the RGB values;
    leave destination alpha untouched. [Note: is this correct?]
    SDL_SRCCOLORKEY ignored.
  SDL_SRCALPHA not set:
    copy all of RGBA to the destination.
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    RGB values of the source colour key, ignoring alpha in the
   comparison.

RGB->RGB:
  SDL_SRCALPHA set:
    alpha-blend (using the source per-surface alpha value).
  SDL_SRCALPHA not set:
    copy RGB.
  both:
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    source colour key.

On Sun, Feb 17, 2013 at 7:36 AM, Mason Wheeler wrote:

I’ve got a strange problem that goes something like this:

Load two surfaces with SDL_Image. Call them A and B.
Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for all the
mask values. This creates a surface that does not necessarily have the
same pixel format as the two surfaces already loaded. Call it Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target. B contains an alpha channel, and I
expect Target to contain an image overlaying B onto A. Instead, I end up
with the entirety of Target having an alpha of 0.

I traced into SDL in the debugger, trying to figure out what was going
wrong, only to be stopped cold inside of BlitNtoN because the meat of the
function takes place inside a silly DUFFS_LOOP macro. You can’t trace into
a macro like you can in a function call, and this seems like a strange
thing to have in there anyway, for two reasons.

First, Duff’s Device was designed to squeeze every last cycle out of
direct memory copies, back when cycles were a lot more expensive than they
are now, but there’s a sizeable block of code in the body of the loop here,
not just a simple “*dest++ = *src++” type thing.
Second, Duff’s Device was invented back before a lot of modern
optimizations, both in compiler design and in CPU architecture, and
according to the Wikipedia article, it tends to screw some of them up
rather badly, leading to a net decrease in performance.

So it hurts readability, debugabilty and performance on modern
computers. Could we just replace it with a simple loop please?


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

As I said to John, there’s nothing in this code that would respond to a #define like that. Looking at this further, it appears that several of the other blit routines are coded to respect this #define, but a bunch of them, including the one my code is going through (BlitNtoN) are not.? An oversight?

Mason________________________________
From: Sam Lantinga
To: Mason Wheeler <@Mason_Wheeler>; SDL Development List
Sent: Sunday, February 17, 2013 11:25 PM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit code

As John said, you can turn it off by commenting out?USE_DUFFS_LOOP in SDL_blit.h

It was originally added when I was optimizing the blitters and that yielded about 15% speed increase over standard loops with gcc using -O2.

If you a proposing disabling it by default, I’m not opposed, but I would suggest benchmarking first.

Regarding your original issue, the SDL destination alpha semantics are a little weird. ?From the SDL documentation:

? ? RGBA->RGB:
? ? ? SDL_SRCALPHA set:
? ? ? ? alpha-blend (using alpha-channel).
? ? ? ? SDL_SRCCOLORKEY ignored.
? ? ? SDL_SRCALPHA not set:
? ? ? ? copy RGB.
? ? ? ? if SDL_SRCCOLORKEY set, only copy the pixels matching the
? ? ? ? RGB values of the source colour key, ignoring alpha in the
? ? ? ? comparison.

? ? RGB->RGBA:
? ? ? SDL_SRCALPHA set:
? ? ? ? alpha-blend (using the source per-surface alpha value);
? ? ? ? set destination alpha to opaque.
? ? ? SDL_SRCALPHA not set:
? ? ? ? copy RGB, set destination alpha to source per-surface alpha value.
? ? ? both:
? ? ? ? if SDL_SRCCOLORKEY set, only copy the pixels matching the
? ? ? ? source colour key.

? ? RGBA->RGBA:
? ? ? SDL_SRCALPHA set:
? ? ? ? alpha-blend (using the source alpha channel) the RGB values;
? ? ? ? leave destination alpha untouched. [Note: is this correct?]
? ? ? ? SDL_SRCCOLORKEY ignored.
? ? ? SDL_SRCALPHA not set:
? ? ? ? copy all of RGBA to the destination.
? ? ? ? if SDL_SRCCOLORKEY set, only copy the pixels matching the
? ? ? ? RGB values of the source colour key, ignoring alpha in the
? ? ? ?comparison.

? ? RGB->RGB:
? ? ? SDL_SRCALPHA set:
? ? ? ? alpha-blend (using the source per-surface alpha value).
? ? ? SDL_SRCALPHA not set:
? ? ? ? copy RGB.
? ? ? both:
? ? ? ? if SDL_SRCCOLORKEY set, only copy the pixels matching the
? ? ? ? source colour key.

On Sun, Feb 17, 2013 at 7:36 AM, Mason Wheeler <@Mason_Wheeler> wrote:

I’ve got a strange problem that goes something like this:

Load two surfaces with SDL_Image.? Call them A and B.
Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for all the mask values.? This creates a surface that does not necessarily have the same pixel format as the two surfaces already loaded.? Call it Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target.? B contains an alpha channel, and I expect Target to contain an image overlaying B onto A.? Instead, I end up with the entirety of Target having an alpha of 0.

I traced into SDL in the debugger, trying to figure out what was going wrong, only to be stopped cold inside of BlitNtoN because the meat of the function takes place inside a silly DUFFS_LOOP macro.? You can’t trace
into a macro like you can in a function call, and this seems like a strange thing to have in there anyway, for two reasons.

First, Duff’s Device was designed to squeeze every last cycle out of direct memory copies, back when cycles were a lot more expensive than they are now, but there’s a sizeable block of code in the body of the loop here, not just a simple “*dest++ = *src++” type thing.
Second, Duff’s Device was invented back before a lot of modern optimizations, both in compiler design and in CPU architecture, and according to the Wikipedia article, it tends to screw some of them up rather badly, leading to a net decrease in performance.

So it hurts readability, debugabilty and performance on modern computers.? Could we just replace it with a simple loop please?


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

~/sources/SDL$ find -iname “*.[ch]” | xargs grep -in USE_DUFFS_LOOP | wc -l
35

It appears 35 times in my copy.On 18/02/13 16:30, Mason Wheeler wrote:

As I said to John, there’s nothing in this code that would respond to a
#define like that.

Yeah, and you really should read the whole message before criticizing something I already addressed:

Looking at this further, it appears that several of the other blit
routines are coded
to respect this #define, but a bunch of them,
including the one my code is going
through (BlitNtoN) are not.? An
oversight?

Mason________________________________
From: Tim Angus
To: Mason Wheeler <@Mason_Wheeler>; SDL Development List
Sent: Monday, February 18, 2013 8:37 AM
Subject: Re: Proposal: remove DUFFS_LOOP macro from SDL blit code

On 18/02/13 16:30, Mason Wheeler wrote:

As I said to John, there’s nothing in this code that would respond to a
#define like that.

~/sources/SDL$ find -iname “*.[ch]” | xargs grep -in USE_DUFFS_LOOP | wc -l
35

It appears 35 times in my copy.

Yup, that’s just an oversight.On Mon, Feb 18, 2013 at 8:30 AM, Mason Wheeler wrote:

As I said to John, there’s nothing in this code that would respond to a
#define like that. Looking at this further, it appears that several of the
other blit routines are coded to respect this #define, but a bunch of them,
including the one my code is going through (BlitNtoN) are not. An
oversight?

Mason


From: Sam Lantinga <@slouken>
To: Mason Wheeler ; SDL Development List <
sdl at lists.libsdl.org>
Sent: Sunday, February 17, 2013 11:25 PM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit code

As John said, you can turn it off by commenting out USE_DUFFS_LOOP in
SDL_blit.h

It was originally added when I was optimizing the blitters and that
yielded about 15% speed increase over standard loops with gcc using -O2.

If you a proposing disabling it by default, I’m not opposed, but I would
suggest benchmarking first.

Regarding your original issue, the SDL destination alpha semantics are a
little weird. From the SDL documentation:

RGBA->RGB:
  SDL_SRCALPHA set:
    alpha-blend (using alpha-channel).
    SDL_SRCCOLORKEY ignored.
  SDL_SRCALPHA not set:
    copy RGB.
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    RGB values of the source colour key, ignoring alpha in the
    comparison.

RGB->RGBA:
  SDL_SRCALPHA set:
    alpha-blend (using the source per-surface alpha value);
    set destination alpha to opaque.
  SDL_SRCALPHA not set:
    copy RGB, set destination alpha to source per-surface alpha value.
  both:
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    source colour key.

RGBA->RGBA:
  SDL_SRCALPHA set:
    alpha-blend (using the source alpha channel) the RGB values;
    leave destination alpha untouched. [Note: is this correct?]
    SDL_SRCCOLORKEY ignored.
  SDL_SRCALPHA not set:
    copy all of RGBA to the destination.
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    RGB values of the source colour key, ignoring alpha in the
   comparison.

RGB->RGB:
  SDL_SRCALPHA set:
    alpha-blend (using the source per-surface alpha value).
  SDL_SRCALPHA not set:
    copy RGB.
  both:
    if SDL_SRCCOLORKEY set, only copy the pixels matching the
    source colour key.

On Sun, Feb 17, 2013 at 7:36 AM, Mason Wheeler wrote:

I’ve got a strange problem that goes something like this:

Load two surfaces with SDL_Image. Call them A and B.
Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for all the
mask values. This creates a surface that does not necessarily have the
same pixel format as the two surfaces already loaded. Call it Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target. B contains an alpha channel, and I
expect Target to contain an image overlaying B onto A. Instead, I end up
with the entirety of Target having an alpha of 0.

I traced into SDL in the debugger, trying to figure out what was going
wrong, only to be stopped cold inside of BlitNtoN because the meat of the
function takes place inside a silly DUFFS_LOOP macro. You can’t trace into
a macro like you can in a function call, and this seems like a strange
thing to have in there anyway, for two reasons.

First, Duff’s Device was designed to squeeze every last cycle out of
direct memory copies, back when cycles were a lot more expensive than they
are now, but there’s a sizeable block of code in the body of the loop here,
not just a simple “*dest++ = *src++” type thing.
Second, Duff’s Device was invented back before a lot of modern
optimizations, both in compiler design and in CPU architecture, and
according to the Wikipedia article, it tends to screw some of them up
rather badly, leading to a net decrease in performance.

So it hurts readability, debugabilty and performance on modern computers.
Could we just replace it with a simple loop please?


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Speaking of reading the whole message… the macro you need to #undef is in
SDL_blit.h. If you #undef it in SDL_blit.h, it disables DUFFS_LOOP across all
the code (including BlitNtoN) by defining DUFFS_LOOP as a simple for loop.On 02/18/2013 11:42 AM, Mason Wheeler wrote:

Yeah, and you really should read the whole message before criticizing something
I already addressed:

Looking at this further, it appears that several of the other blit routines
are coded
to respect this #define, but a bunch of them, including the one my code is going
through (BlitNtoN) are not. An oversight?

Mason


From: Tim Angus
To: Mason Wheeler ; SDL Development List

Sent: Monday, February 18, 2013 8:37 AM
Subject: Re: Proposal: remove DUFFS_LOOP macro from SDL blit code

On 18/02/13 16:30, Mason Wheeler wrote:

As I said to John, there’s nothing in this code that would respond to a
#define like that.

~/sources/SDL$ find -iname “*.[ch]” | xargs grep -in USE_DUFFS_LOOP | wc -l
35

It appears 35 times in my copy.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

…but it remains a macro, which can’t be traced into in the debugger, which was my original concern. :wink:

Mason________________________________
From: John
To: sdl at lists.libsdl.org
Sent: Monday, February 18, 2013 9:46 AM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit code

Speaking of reading the whole message… the macro you need to #undef is in
SDL_blit.h.? If you #undef it in SDL_blit.h, it disables DUFFS_LOOP across all
the code (including BlitNtoN) by defining DUFFS_LOOP as a simple for loop.

On 02/18/2013 11:42 AM, Mason Wheeler wrote:

Yeah, and you really should read the whole message before criticizing something
I already addressed:

? >Looking at this further, it appears that several of the other blit routines
are coded
? >to respect this #define, but a bunch of them, including the one my code is going
? >through (BlitNtoN) are not.? An oversight?

Mason


From: Tim Angus
To: Mason Wheeler <@Mason_Wheeler>; SDL Development List

Sent: Monday, February 18, 2013 8:37 AM
Subject: Re: Proposal: remove DUFFS_LOOP macro from SDL blit code

On 18/02/13 16:30, Mason Wheeler wrote:
? > As I said to John, there’s nothing in this code that would respond to a
? > #define like that.

~/sources/SDL$ find -iname “*.[ch]” | xargs grep -in USE_DUFFS_LOOP | wc -l
35

It appears 35 times in my copy.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Ah, in that case, DUFFS_LOOP is not the issue at all. The SDL blitters use
macros all over the place, with or without DUFFS_LOOP. For your specific case,
you can inspect the src* and dst* variables, which are local to the stack frame,
and should be visible within your debugger.On 02/18/2013 12:59 PM, Mason Wheeler wrote:

…but it remains a macro, which can’t be traced into in the debugger, which was
my original concern. :wink:

Mason


From: John <@John6>
To: sdl at lists.libsdl.org
Sent: Monday, February 18, 2013 9:46 AM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit code

Speaking of reading the whole message… the macro you need to #undef is in
SDL_blit.h. If you #undef it in SDL_blit.h, it disables DUFFS_LOOP across all
the code (including BlitNtoN) by defining DUFFS_LOOP as a simple for loop.

On 02/18/2013 11:42 AM, Mason Wheeler wrote:

Yeah, and you really should read the whole message before criticizing something
I already addressed:

Looking at this further, it appears that several of the other blit routines
are coded
to respect this #define, but a bunch of them, including the one my code is
going

through (BlitNtoN) are not. An oversight?

Mason


From: Tim Angus <tim at ngus.net <mailto:tim at ngus.net>>
To: Mason Wheeler <masonwheeler at yahoo.com <mailto:masonwheeler at yahoo.com>>;
SDL Development List
<sdl at lists.libsdl.org <mailto:sdl at lists.libsdl.org>>
Sent: Monday, February 18, 2013 8:37 AM
Subject: Re: Proposal: remove DUFFS_LOOP macro from SDL blit code

On 18/02/13 16:30, Mason Wheeler wrote:

As I said to John, there’s nothing in this code that would respond to a
#define like that.

~/sources/SDL$ find -iname “*.[ch]” | xargs grep -in USE_DUFFS_LOOP | wc -l
35

It appears 35 times in my copy.


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Some rough benchmarks on x86_64 suggest that DUFFS_LOOP has no measurable
performance effect on the general software blit routines used by the SDL tests.
(–renderer software)

TEST, DUFF (fps), NODUFF (fps),
testdraw2, 85.99, 86.82,
testrendertarget, 32.65, 31.82,
testscale, 35.49, 35.11,
testsprite2, 129.95, 125.69,

We might conclude there’s no need for this optimization, while at the same time
acknowledge that leaving it in is not harmful, either.On 02/18/2013 12:33 PM, Sam Lantinga wrote:

Yup, that’s just an oversight.

On Mon, Feb 18, 2013 at 8:30 AM, Mason Wheeler <masonwheeler at yahoo.com <mailto:masonwheeler at yahoo.com>> wrote:

As I said to John, there's nothing in this code that would respond to a
#define like that. Looking at this further, it appears that several of the
other blit routines are coded to respect this #define, but a bunch of them,
including the one my code is going through (BlitNtoN) are not.  An oversight?

Mason

--------------------------------------------------------------------------------
*From:* Sam Lantinga <slouken at libsdl.org <mailto:slouken at libsdl.org>>
*To:* Mason Wheeler <masonwheeler at yahoo.com
<mailto:masonwheeler at yahoo.com>>; SDL Development List <sdl at lists.libsdl.org
<mailto:sdl at lists.libsdl.org>>
*Sent:* Sunday, February 17, 2013 11:25 PM
*Subject:* Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit code

As John said, you can turn it off by commenting out USE_DUFFS_LOOP in SDL_blit.h

It was originally added when I was optimizing the blitters and that yielded
about 15% speed increase over standard loops with gcc using -O2.

If you a proposing disabling it by default, I'm not opposed, but I would
suggest benchmarking first.

Regarding your original issue, the SDL destination alpha semantics are a
little weird.  From the SDL documentation:

     RGBA->RGB:
       SDL_SRCALPHA set:
         alpha-blend (using alpha-channel).
         SDL_SRCCOLORKEY ignored.
       SDL_SRCALPHA not set:
         copy RGB.
         if SDL_SRCCOLORKEY set, only copy the pixels matching the
         RGB values of the source colour key, ignoring alpha in the
         comparison.

     RGB->RGBA:
       SDL_SRCALPHA set:
         alpha-blend (using the source per-surface alpha value);
         set destination alpha to opaque.
       SDL_SRCALPHA not set:
         copy RGB, set destination alpha to source per-surface alpha value.
       both:
         if SDL_SRCCOLORKEY set, only copy the pixels matching the
         source colour key.

     RGBA->RGBA:
       SDL_SRCALPHA set:
         alpha-blend (using the source alpha channel) the RGB values;
         leave destination alpha untouched. [Note: is this correct?]
         SDL_SRCCOLORKEY ignored.
       SDL_SRCALPHA not set:
         copy all of RGBA to the destination.
         if SDL_SRCCOLORKEY set, only copy the pixels matching the
         RGB values of the source colour key, ignoring alpha in the
        comparison.

     RGB->RGB:
       SDL_SRCALPHA set:
         alpha-blend (using the source per-surface alpha value).
       SDL_SRCALPHA not set:
         copy RGB.
       both:
         if SDL_SRCCOLORKEY set, only copy the pixels matching the
         source colour key.


On Sun, Feb 17, 2013 at 7:36 AM, Mason Wheeler <masonwheeler at yahoo.com <mailto:masonwheeler at yahoo.com>> wrote:

    I've got a strange problem that goes something like this:

    Load two surfaces with SDL_Image.  Call them A and B.
    Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for all
    the mask values.  This creates a surface that does not necessarily have
    the same pixel format as the two surfaces already loaded.  Call it Target.
    Perform a blit from A to Target. This works fine.
    Perform a blit from B to Target.  B contains an alpha channel, and I
    expect Target to contain an image overlaying B onto A.  Instead, I end
    up with the entirety of Target having an alpha of 0.

    I traced into SDL in the debugger, trying to figure out what was going
    wrong, only to be stopped cold inside of BlitNtoN because the meat of
    the function takes place inside a silly DUFFS_LOOP macro.  You can't
    trace into a macro like you can in a function call, and this seems like
    a strange thing to have in there anyway, for two reasons.

    First, Duff's Device was designed to squeeze every last cycle out of
    direct memory copies, back when cycles were a lot more expensive than
    they are now, but there's a sizeable block of code in the body of the
    loop here, not just a simple "*dest++ = *src++" type thing.
    Second, Duff's Device was invented back before a lot of modern
    optimizations, both in compiler design and in CPU architecture, and
    according to the Wikipedia article, it tends to screw some of them up
    rather badly, leading to a net *decrease* in performance.

    So it hurts readability, debugabilty and performance on modern
    computers.  Could we just replace it with a simple loop please?


    _______________________________________________
    SDL mailing list
    SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
    http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org







_______________________________________________
SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Actually besides testdraw2 that seems to result in a performance
loss (even if slightly). Maybe it’s time to disable it by default?

2013/2/18, John :> Some rough benchmarks on x86_64 suggest that DUFFS_LOOP has no measurable

performance effect on the general software blit routines used by the SDL
tests.
(–renderer software)

TEST, DUFF (fps), NODUFF (fps),
testdraw2, 85.99, 86.82,
testrendertarget, 32.65, 31.82,
testscale, 35.49, 35.11,
testsprite2, 129.95, 125.69,

We might conclude there’s no need for this optimization, while at the same
time
acknowledge that leaving it in is not harmful, either.

On 02/18/2013 12:33 PM, Sam Lantinga wrote:

Yup, that’s just an oversight.

On Mon, Feb 18, 2013 at 8:30 AM, Mason Wheeler <masonwheeler at yahoo.com <mailto:masonwheeler at yahoo.com>> wrote:

As I said to John, there's nothing in this code that would respond to

a
#define like that. Looking at this further, it appears that several of
the
other blit routines are coded to respect this #define, but a bunch of
them,
including the one my code is going through (BlitNtoN) are not. An
oversight?

Mason

*From:* Sam Lantinga <slouken at libsdl.org <mailto:slouken at libsdl.org>>
*To:* Mason Wheeler <masonwheeler at yahoo.com
<mailto:masonwheeler at yahoo.com>>; SDL Development List

<sdl at lists.libsdl.org
<mailto:sdl at lists.libsdl.org>>
Sent: Sunday, February 17, 2013 11:25 PM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit
code

As John said, you can turn it off by commenting out USE_DUFFS_LOOP in

SDL_blit.h

It was originally added when I was optimizing the blitters and that

yielded
about 15% speed increase over standard loops with gcc using -O2.

If you a proposing disabling it by default, I'm not opposed, but I

would
suggest benchmarking first.

Regarding your original issue, the SDL destination alpha semantics are

a
little weird. From the SDL documentation:

     RGBA->RGB:
       SDL_SRCALPHA set:
         alpha-blend (using alpha-channel).
         SDL_SRCCOLORKEY ignored.
       SDL_SRCALPHA not set:
         copy RGB.
         if SDL_SRCCOLORKEY set, only copy the pixels matching the
         RGB values of the source colour key, ignoring alpha in the
         comparison.

     RGB->RGBA:
       SDL_SRCALPHA set:
         alpha-blend (using the source per-surface alpha value);
         set destination alpha to opaque.
       SDL_SRCALPHA not set:
         copy RGB, set destination alpha to source per-surface alpha

value.
both:
if SDL_SRCCOLORKEY set, only copy the pixels matching the
source colour key.

     RGBA->RGBA:
       SDL_SRCALPHA set:
         alpha-blend (using the source alpha channel) the RGB values;
         leave destination alpha untouched. [Note: is this correct?]
         SDL_SRCCOLORKEY ignored.
       SDL_SRCALPHA not set:
         copy all of RGBA to the destination.
         if SDL_SRCCOLORKEY set, only copy the pixels matching the
         RGB values of the source colour key, ignoring alpha in the
        comparison.

     RGB->RGB:
       SDL_SRCALPHA set:
         alpha-blend (using the source per-surface alpha value).
       SDL_SRCALPHA not set:
         copy RGB.
       both:
         if SDL_SRCCOLORKEY set, only copy the pixels matching the
         source colour key.


On Sun, Feb 17, 2013 at 7:36 AM, Mason Wheeler <masonwheeler at yahoo.com <mailto:masonwheeler at yahoo.com>> wrote:

    I've got a strange problem that goes something like this:

    Load two surfaces with SDL_Image.  Call them A and B.
    Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for

all
the mask values. This creates a surface that does not necessarily
have
the same pixel format as the two surfaces already loaded. Call it
Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target. B contains an alpha channel, and
I
expect Target to contain an image overlaying B onto A. Instead, I
end
up with the entirety of Target having an alpha of 0.

    I traced into SDL in the debugger, trying to figure out what was

going
wrong, only to be stopped cold inside of BlitNtoN because the meat
of
the function takes place inside a silly DUFFS_LOOP macro. You
can’t
trace into a macro like you can in a function call, and this seems
like
a strange thing to have in there anyway, for two reasons.

    First, Duff's Device was designed to squeeze every last cycle out

of
direct memory copies, back when cycles were a lot more expensive
than
they are now, but there’s a sizeable block of code in the body of
the
loop here, not just a simple “*dest++ = *src++” type thing.
Second, Duff’s Device was invented back before a lot of modern
optimizations, both in compiler design and in CPU architecture,
and
according to the Wikipedia article, it tends to screw some of them
up
rather badly, leading to a net decrease in performance.

    So it hurts readability, debugabilty and performance on modern
    computers.  Could we just replace it with a simple loop please?


    _______________________________________________
    SDL mailing list
    SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
    http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org







_______________________________________________
SDL mailing list
SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

No, the units are frames per second (fps). Higher is better. So the Duff loop is
"faster" in 3 of the 4 results.On 02/18/2013 04:50 PM, Sik the hedgehog wrote:

Actually besides testdraw2 that seems to result in a performance
loss (even if slightly). Maybe it’s time to disable it by default?

2013/2/18, John <@John6>:

Some rough benchmarks on x86_64 suggest that DUFFS_LOOP has no measurable
performance effect on the general software blit routines used by the SDL
tests.
(–renderer software)

TEST, DUFF (fps), NODUFF (fps),
testdraw2, 85.99, 86.82,
testrendertarget, 32.65, 31.82,
testscale, 35.49, 35.11,
testsprite2, 129.95, 125.69,

We might conclude there’s no need for this optimization, while at the same
time
acknowledge that leaving it in is not harmful, either.

On 02/18/2013 12:33 PM, Sam Lantinga wrote:

Yup, that’s just an oversight.

On Mon, Feb 18, 2013 at 8:30 AM, Mason Wheeler <masonwheeler at yahoo.com <mailto:masonwheeler at yahoo.com>> wrote:

 As I said to John, there's nothing in this code that would respond to

a
#define like that. Looking at this further, it appears that several of
the
other blit routines are coded to respect this #define, but a bunch of
them,
including the one my code is going through (BlitNtoN) are not. An
oversight?

 Mason

 *From:* Sam Lantinga <slouken at libsdl.org <mailto:slouken at libsdl.org>>
 *To:* Mason Wheeler <masonwheeler at yahoo.com
 <mailto:masonwheeler at yahoo.com>>; SDL Development List

<sdl at lists.libsdl.org
<mailto:sdl at lists.libsdl.org>>
Sent: Sunday, February 17, 2013 11:25 PM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit
code

 As John said, you can turn it off by commenting out USE_DUFFS_LOOP in

SDL_blit.h

 It was originally added when I was optimizing the blitters and that

yielded
about 15% speed increase over standard loops with gcc using -O2.

 If you a proposing disabling it by default, I'm not opposed, but I

would
suggest benchmarking first.

 Regarding your original issue, the SDL destination alpha semantics are

a
little weird. From the SDL documentation:

      RGBA->RGB:
        SDL_SRCALPHA set:
          alpha-blend (using alpha-channel).
          SDL_SRCCOLORKEY ignored.
        SDL_SRCALPHA not set:
          copy RGB.
          if SDL_SRCCOLORKEY set, only copy the pixels matching the
          RGB values of the source colour key, ignoring alpha in the
          comparison.

      RGB->RGBA:
        SDL_SRCALPHA set:
          alpha-blend (using the source per-surface alpha value);
          set destination alpha to opaque.
        SDL_SRCALPHA not set:
          copy RGB, set destination alpha to source per-surface alpha

value.
both:
if SDL_SRCCOLORKEY set, only copy the pixels matching the
source colour key.

      RGBA->RGBA:
        SDL_SRCALPHA set:
          alpha-blend (using the source alpha channel) the RGB values;
          leave destination alpha untouched. [Note: is this correct?]
          SDL_SRCCOLORKEY ignored.
        SDL_SRCALPHA not set:
          copy all of RGBA to the destination.
          if SDL_SRCCOLORKEY set, only copy the pixels matching the
          RGB values of the source colour key, ignoring alpha in the
         comparison.

      RGB->RGB:
        SDL_SRCALPHA set:
          alpha-blend (using the source per-surface alpha value).
        SDL_SRCALPHA not set:
          copy RGB.
        both:
          if SDL_SRCCOLORKEY set, only copy the pixels matching the
          source colour key.


 On Sun, Feb 17, 2013 at 7:36 AM, Mason Wheeler <masonwheeler at yahoo.com <mailto:masonwheeler at yahoo.com>> wrote:

     I've got a strange problem that goes something like this:

     Load two surfaces with SDL_Image.  Call them A and B.
     Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0 for

all
the mask values. This creates a surface that does not necessarily
have
the same pixel format as the two surfaces already loaded. Call it
Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target. B contains an alpha channel, and
I
expect Target to contain an image overlaying B onto A. Instead, I
end
up with the entirety of Target having an alpha of 0.

     I traced into SDL in the debugger, trying to figure out what was

going
wrong, only to be stopped cold inside of BlitNtoN because the meat
of
the function takes place inside a silly DUFFS_LOOP macro. You
can’t
trace into a macro like you can in a function call, and this seems
like
a strange thing to have in there anyway, for two reasons.

     First, Duff's Device was designed to squeeze every last cycle out

of
direct memory copies, back when cycles were a lot more expensive
than
they are now, but there’s a sizeable block of code in the body of
the
loop here, not just a simple “*dest++ = *src++” type thing.
Second, Duff’s Device was invented back before a lot of modern
optimizations, both in compiler design and in CPU architecture,
and
according to the Wikipedia article, it tends to screw some of them
up
rather badly, leading to a net decrease in performance.

     So it hurts readability, debugabilty and performance on modern
     computers.  Could we just replace it with a simple loop please?


     _______________________________________________
     SDL mailing list
     SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
     http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org







 _______________________________________________
 SDL mailing list
 SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
 http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Er wait, actually swapped DUFF and NODUFF for some reason derp

2013/2/18, John :> No, the units are frames per second (fps). Higher is better. So the Duff

loop is
"faster" in 3 of the 4 results.

On 02/18/2013 04:50 PM, Sik the hedgehog wrote:

Actually besides testdraw2 that seems to result in a performance
loss (even if slightly). Maybe it’s time to disable it by default?

2013/2/18, John :

Some rough benchmarks on x86_64 suggest that DUFFS_LOOP has no
measurable
performance effect on the general software blit routines used by the SDL
tests.
(–renderer software)

TEST, DUFF (fps), NODUFF (fps),
testdraw2, 85.99, 86.82,
testrendertarget, 32.65, 31.82,
testscale, 35.49, 35.11,
testsprite2, 129.95, 125.69,

We might conclude there’s no need for this optimization, while at the
same
time
acknowledge that leaving it in is not harmful, either.

On 02/18/2013 12:33 PM, Sam Lantinga wrote:

Yup, that’s just an oversight.

On Mon, Feb 18, 2013 at 8:30 AM, Mason Wheeler <masonwheeler at yahoo.com <mailto:masonwheeler at yahoo.com>> wrote:

 As I said to John, there's nothing in this code that would respond

to
a
#define like that. Looking at this further, it appears that several
of
the
other blit routines are coded to respect this #define, but a bunch
of
them,
including the one my code is going through (BlitNtoN) are not. An
oversight?

 Mason

 *From:* Sam Lantinga <slouken at libsdl.org

<mailto:slouken at libsdl.org>>
To: Mason Wheeler <masonwheeler at yahoo.com
<mailto:masonwheeler at yahoo.com>>; SDL Development List
<sdl at lists.libsdl.org
<mailto:sdl at lists.libsdl.org>>
Sent: Sunday, February 17, 2013 11:25 PM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL
blit
code

 As John said, you can turn it off by commenting out USE_DUFFS_LOOP

in
SDL_blit.h

 It was originally added when I was optimizing the blitters and

that
yielded
about 15% speed increase over standard loops with gcc using -O2.

 If you a proposing disabling it by default, I'm not opposed, but I

would
suggest benchmarking first.

 Regarding your original issue, the SDL destination alpha semantics

are
a
little weird. From the SDL documentation:

      RGBA->RGB:
        SDL_SRCALPHA set:
          alpha-blend (using alpha-channel).
          SDL_SRCCOLORKEY ignored.
        SDL_SRCALPHA not set:
          copy RGB.
          if SDL_SRCCOLORKEY set, only copy the pixels matching the
          RGB values of the source colour key, ignoring alpha in

the
comparison.

      RGB->RGBA:
        SDL_SRCALPHA set:
          alpha-blend (using the source per-surface alpha value);
          set destination alpha to opaque.
        SDL_SRCALPHA not set:
          copy RGB, set destination alpha to source per-surface

alpha
value.
both:
if SDL_SRCCOLORKEY set, only copy the pixels matching the
source colour key.

      RGBA->RGBA:
        SDL_SRCALPHA set:
          alpha-blend (using the source alpha channel) the RGB

values;
leave destination alpha untouched. [Note: is this
correct?]
SDL_SRCCOLORKEY ignored.
SDL_SRCALPHA not set:
copy all of RGBA to the destination.
if SDL_SRCCOLORKEY set, only copy the pixels matching the
RGB values of the source colour key, ignoring alpha in
the
comparison.

      RGB->RGB:
        SDL_SRCALPHA set:
          alpha-blend (using the source per-surface alpha value).
        SDL_SRCALPHA not set:
          copy RGB.
        both:
          if SDL_SRCCOLORKEY set, only copy the pixels matching the
          source colour key.


 On Sun, Feb 17, 2013 at 7:36 AM, Mason Wheeler <masonwheeler at yahoo.com <mailto:masonwheeler at yahoo.com>> wrote:

     I've got a strange problem that goes something like this:

     Load two surfaces with SDL_Image.  Call them A and B.
     Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0

for
all
the mask values. This creates a surface that does not
necessarily
have
the same pixel format as the two surfaces already loaded. Call
it
Target.
Perform a blit from A to Target. This works fine.
Perform a blit from B to Target. B contains an alpha channel,
and
I
expect Target to contain an image overlaying B onto A.
Instead, I
end
up with the entirety of Target having an alpha of 0.

     I traced into SDL in the debugger, trying to figure out what

was
going
wrong, only to be stopped cold inside of BlitNtoN because the
meat
of
the function takes place inside a silly DUFFS_LOOP macro. You
can’t
trace into a macro like you can in a function call, and this
seems
like
a strange thing to have in there anyway, for two reasons.

     First, Duff's Device was designed to squeeze every last cycle

out
of
direct memory copies, back when cycles were a lot more
expensive
than
they are now, but there’s a sizeable block of code in the body
of
the
loop here, not just a simple “*dest++ = *src++” type thing.
Second, Duff’s Device was invented back before a lot of modern
optimizations, both in compiler design and in CPU
architecture,
and
according to the Wikipedia article, it tends to screw some of
them
up
rather badly, leading to a net decrease in performance.

     So it hurts readability, debugabilty and performance on modern
     computers.  Could we just replace it with a simple loop

please?

     _______________________________________________
     SDL mailing list
     SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
     http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org







 _______________________________________________
 SDL mailing list
 SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
 http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Either way, on each example, the difference is less than 1 FPS. Furthermore,
this blit code was written back before SDL 1.3, back when the screen itself was
a SDL_Surface and everything used the blitting code for all display work.? That’s
no longer the case; actual display work is done with hardware-accelerated
rendering APIs today, and SDL_Surface is used primarily for setup work.
This means that the blitters are no longer performance-critical like they used
to be.

Between that, the negligible difference in performance, and the clear difference
in debuggability and readability, it seems reasonable to me to remove the Duff’s
Device from the SDL2 codebase entirely.

Mason________________________________
From: Sik the hedgehog <sik.the.hedgehog at gmail.com>
To: SDL Development List
Sent: Monday, February 18, 2013 2:05 PM
Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL blit code

Er wait, actually swapped DUFF and NODUFF for some reason derp

2013/2/18, John :

No, the units are frames per second (fps). Higher is better. So the Duff
loop is
"faster" in 3 of the 4 results.

On 02/18/2013 04:50 PM, Sik the hedgehog wrote:

Actually besides testdraw2 that seems to result in a performance
loss (even if slightly). Maybe it’s time to disable it by default?

2013/2/18, John :

Some rough benchmarks on x86_64 suggest that DUFFS_LOOP has no
measurable
performance effect on the general software blit routines used by the SDL
tests.
(–renderer software)

TEST,? ? ? ? ? DUFF (fps),? NODUFF (fps),
testdraw2,? ? ? ? ? 85.99,? ? ? ? 86.82,
testrendertarget,? ? 32.65,? ? ? ? 31.82,
testscale,? ? ? ? ? 35.49,? ? ? ? 35.11,
testsprite2,? ? ? ? 129.95,? ? ? ? 125.69,

We might conclude there’s no need for this optimization, while at the
same
time
acknowledge that leaving it in is not harmful, either.

On 02/18/2013 12:33 PM, Sam Lantinga wrote:

Yup, that’s just an oversight.

On Mon, Feb 18, 2013 at 8:30 AM, Mason Wheeler <@Mason_Wheeler mailto:Mason_Wheeler> wrote:

? ? ? As I said to John, there’s nothing in this code that would respond
to
a
? ? ? #define like that. Looking at this further, it appears that several
of
the
? ? ? other blit routines are coded to respect this #define, but a bunch
of
them,
? ? ? including the one my code is going through (BlitNtoN) are not.? An
oversight?

? ? ? Mason


? ? ? From: Sam Lantinga <slouken at libsdl.org
<mailto:slouken at libsdl.org>>
? ? ? To: Mason Wheeler <@Mason_Wheeler
? ? ? mailto:Mason_Wheeler>; SDL Development List
<sdl at lists.libsdl.org
? ? ? <mailto:sdl at lists.libsdl.org>>
? ? ? Sent: Sunday, February 17, 2013 11:25 PM
? ? ? Subject: Re: [SDL] Proposal: remove DUFFS_LOOP macro from SDL
blit
code

? ? ? As John said, you can turn it off by commenting out USE_DUFFS_LOOP
in
SDL_blit.h

? ? ? It was originally added when I was optimizing the blitters and
that
yielded
? ? ? about 15% speed increase over standard loops with gcc using -O2.

? ? ? If you a proposing disabling it by default, I’m not opposed, but I
would
? ? ? suggest benchmarking first.

? ? ? Regarding your original issue, the SDL destination alpha semantics
are
a
? ? ? little weird.? From the SDL documentation:

? ? ? ? ? RGBA->RGB:
? ? ? ? ? ? SDL_SRCALPHA set:
? ? ? ? ? ? ? alpha-blend (using alpha-channel).
? ? ? ? ? ? ? SDL_SRCCOLORKEY ignored.
? ? ? ? ? ? SDL_SRCALPHA not set:
? ? ? ? ? ? ? copy RGB.
? ? ? ? ? ? ? if SDL_SRCCOLORKEY set, only copy the pixels matching the
? ? ? ? ? ? ? RGB values of the source colour key, ignoring alpha in
the
? ? ? ? ? ? ? comparison.

? ? ? ? ? RGB->RGBA:
? ? ? ? ? ? SDL_SRCALPHA set:
? ? ? ? ? ? ? alpha-blend (using the source per-surface alpha value);
? ? ? ? ? ? ? set destination alpha to opaque.
? ? ? ? ? ? SDL_SRCALPHA not set:
? ? ? ? ? ? ? copy RGB, set destination alpha to source per-surface
alpha
value.
? ? ? ? ? ? both:
? ? ? ? ? ? ? if SDL_SRCCOLORKEY set, only copy the pixels matching the
? ? ? ? ? ? ? source colour key.

? ? ? ? ? RGBA->RGBA:
? ? ? ? ? ? SDL_SRCALPHA set:
? ? ? ? ? ? ? alpha-blend (using the source alpha channel) the RGB
values;
? ? ? ? ? ? ? leave destination alpha untouched. [Note: is this
correct?]
? ? ? ? ? ? ? SDL_SRCCOLORKEY ignored.
? ? ? ? ? ? SDL_SRCALPHA not set:
? ? ? ? ? ? ? copy all of RGBA to the destination.
? ? ? ? ? ? ? if SDL_SRCCOLORKEY set, only copy the pixels matching the
? ? ? ? ? ? ? RGB values of the source colour key, ignoring alpha in
the
? ? ? ? ? ? ? comparison.

? ? ? ? ? RGB->RGB:
? ? ? ? ? ? SDL_SRCALPHA set:
? ? ? ? ? ? ? alpha-blend (using the source per-surface alpha value).
? ? ? ? ? ? SDL_SRCALPHA not set:
? ? ? ? ? ? ? copy RGB.
? ? ? ? ? ? both:
? ? ? ? ? ? ? if SDL_SRCCOLORKEY set, only copy the pixels matching the
? ? ? ? ? ? ? source colour key.

? ? ? On Sun, Feb 17, 2013 at 7:36 AM, Mason Wheeler
<@Mason_Wheeler
? ? ? mailto:Mason_Wheeler> wrote:

? ? ? ? ? I’ve got a strange problem that goes something like this:

? ? ? ? ? Load two surfaces with SDL_Image.? Call them A and B.
? ? ? ? ? Create a new SDL_Surface with SDL_CreateRGBSurface, passing 0
for
all
? ? ? ? ? the mask values.? This creates a surface that does not
necessarily
have
? ? ? ? ? the same pixel format as the two surfaces already loaded.? Call
it
Target.
? ? ? ? ? Perform a blit from A to Target. This works fine.
? ? ? ? ? Perform a blit from B to Target.? B contains an alpha channel,
and
I
? ? ? ? ? expect Target to contain an image overlaying B onto A.
Instead, I
end
? ? ? ? ? up with the entirety of Target having an alpha of 0.

? ? ? ? ? I traced into SDL in the debugger, trying to figure out what
was
going
? ? ? ? ? wrong, only to be stopped cold inside of BlitNtoN because the
meat
of
? ? ? ? ? the function takes place inside a silly DUFFS_LOOP macro.? You
can’t
? ? ? ? ? trace into a macro like you can in a function call, and this
seems
like
? ? ? ? ? a strange thing to have in there anyway, for two reasons.

? ? ? ? ? First, Duff’s Device was designed to squeeze every last cycle
out
of
? ? ? ? ? direct memory copies, back when cycles were a lot more
expensive
than
? ? ? ? ? they are now, but there’s a sizeable block of code in the body
of
the
? ? ? ? ? loop here, not just a simple “*dest++ = *src++” type thing.
? ? ? ? ? Second, Duff’s Device was invented back before a lot of modern
? ? ? ? ? optimizations, both in compiler design and in CPU
architecture,
and
? ? ? ? ? according to the Wikipedia article, it tends to screw some of
them
up
? ? ? ? ? rather badly, leading to a net decrease in performance.

? ? ? ? ? So it hurts readability, debugabilty and performance on modern
? ? ? ? ? computers.? Could we just replace it with a simple loop
please?

? ? ? ? ? _______________________________________________
? ? ? ? ? SDL mailing list
? ? ? ? ? SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
? ? ? ? ? http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

? ? ? _______________________________________________
? ? ? SDL mailing list
? ? ? SDL at lists.libsdl.org <mailto:SDL at lists.libsdl.org>
? ? ? http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org