SDL_fillrect.c: don't redetect CPU features

In SIMD-enabled implementations, SDL_FillRect() checks the CPU
features (SDL_HasMMX() / SDL_HasSSE()) every time (which in turn
checks if the CPU features have been detected first, and performs a
detection if necessary) and then branches to the appropriate function.
This is adds a bit of extra overhead that could be otherwise avoided
by the use of function pointers.

This patch adds a table of 4 function pointers in SIMD-enabled builds
that are used instead. When first called, the function pointer directs
to a detection routine that detects the best version, saves that
pointer, then executes it. This removes the extra branch that
typically occurs in code, such as “if(firstTime) setupPointers()”,
instead replacing it with a detect-then-execute function.

This is multithread-safe, since at no time does executing a function
pointer produce incorrect results, and two threads executing the
detection at the same time will save the same function pointer into
the same location, guaranteeing identical results as the
single-threaded case.

In non-SIMD-enabled builds, there is no function pointer table, and so
there is no overhead of function pointers. Similarly, since
SDL_FillRect3() has no optimized version on any platform, it doesn’t
call through function pointer, though the code is set up for a drop-in
implementation.

This was written against SDL2-hg, but it seems like it should apply to
SDL-1.2 as well.

Patrick
-------------- next part --------------
A non-text attachment was scrubbed…
Name: fillrect.diff
Type: application/octet-stream
Size: 5727 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20120510/b7366317/attachment.obj

Is a branch really that less efficient compared to a pointer indirection?On Thu, May 10, 2012 at 8:00 PM, Patrick Baggett <baggett.patrick at gmail.com> wrote:

In SIMD-enabled implementations, SDL_FillRect() checks the CPU
features (SDL_HasMMX() / SDL_HasSSE()) every time (which in turn
checks if the CPU features have been detected first, and performs a
detection if necessary) and then branches to the appropriate function.
This is adds a bit of extra overhead that could be otherwise avoided
by the use of function pointers.


Alejandro Santos

It depends on the situation, but generally, yes; branches are very expensive
on modern CPUs.

Every misprediction causes a pipeline flush - that is, throwing away all
progress made on the instructions beyond the branch point. This can be dozens
of instructions, briefly reducing throughput to a small fraction of the stated
"several instructions per cycle."

Some CPUs can reduce the impact of this by splitting the pipeline across the
alternative paths when in doubt, but I suspect it’s a bit early to release PC
code relying on that. (Last time I read about this, it was only done in
supercomputer CPUs, and something was said about the next generation x86. I
think that was when high end PCs used Core 2 CPUs.)On Friday 11 May 2012, at 02.43.59, Alejandro Santos wrote:

On Thu, May 10, 2012 at 8:00 PM, Patrick Baggett <baggett.patrick at gmail.com> wrote:

In SIMD-enabled implementations, SDL_FillRect() checks the CPU
features (SDL_HasMMX() / SDL_HasSSE()) every time (which in turn
checks if the CPU features have been detected first, and performs a
detection if necessary) and then branches to the appropriate function.
This is adds a bit of extra overhead that could be otherwise avoided
by the use of function pointers.

Is a branch really that less efficient compared to a pointer indirection?


//David Olofson - Consultant, Developer, Artist, Open Source Advocate

.— Games, examples, libraries, scripting, sound, music, graphics —.
| http://consulting.olofson.net http://olofsonarcade.com |
’---------------------------------------------------------------------’

David Olofson writes:> On Friday 11 May 2012, at 02.43.59, Alejandro Santos wrote:

On Thu, May 10, 2012 at 8:00 PM, Patrick Baggett <baggett.patrick at gmail.com> wrote:

In SIMD-enabled implementations, SDL_FillRect() checks the CPU
features (SDL_HasMMX() / SDL_HasSSE()) every time (which in turn
checks if the CPU features have been detected first, and performs a
detection if necessary) and then branches to the appropriate function.
This is adds a bit of extra overhead that could be otherwise avoided
by the use of function pointers.

Is a branch really that less efficient compared to a pointer indirection?

It depends on the situation, but generally, yes; branches are very expensive
on modern CPUs.

Every misprediction causes a pipeline flush - that is, throwing away all
progress made on the instructions beyond the branch point. This can be dozens
of instructions, briefly reducing throughput to a small fraction of the stated
"several instructions per cycle."

Some CPUs can reduce the impact of this by splitting the pipeline across the
alternative paths when in doubt, but I suspect it’s a bit early to release PC
code relying on that. (Last time I read about this, it was only done in
supercomputer CPUs, and something was said about the next generation x86. I
think that was when high end PCs used Core 2 CPUs.)

And you have 2 branches here and only the first can be tagged as
unlikely to need to run the CPU check.

MfG
Goswin