Simd

A lot of people have been saying a lot of weird stuff about SIMD. I’d just
like to respond to all that’s been said in one big rant:

Processor dependance is an interesting issue. I’m sure, however, that
there’s some way to detect the current processor. As long as the code for
every SIMD instruction set is compiled in, a simple case statement at
run time could figure out which one to use. Once that’s done, the union
can be set up appropriately and all the function pointers initialized. No
runtime speed penalty; only during initialization.

As to speed: it would help. The argument about C vs. ASM is spurious –
C compiles to ASM. If you don’t trust your compiler to write good ASM then
you have more problems than SIMD. On the mac, at least, there’s a C
library provided for doing altivec vector ops. Perhaps all x86 SIMD is so
poorly conceived and implemented that this is not possible, but I highly
doubt it. And while compilers can in theory auto-vectorize stuff, it would be at
compile time, whereas this would be runtime.

I’m kinda confused why everyone is so against this right off. Is there
some bizarre allergy to processing a lot of data at once going around?
It’s not like I’m suggesting something infeasible – plenty of apps use
SIMD. If someone could explain why everyone is in such a hurry to declare
it unpractical or impossible I’d appreciate it.

–oberon

[…]

I’m kinda confused why everyone is so against this right off. Is
there some bizarre allergy to processing a lot of data at once
going around? It’s not like I’m suggesting something infeasible –
plenty of apps use SIMD. If someone could explain why everyone is
in such a hurry to declare it unpractical or impossible I’d
appreciate it.

I don’t know the exact motivations behind these opinions, but I guess
it might have to do with the simple fact that using SIMD means more
work for developers. You have to implement the same algorithms
multiple times, possibly in quite different ways; not just with
different instruction sets. You also have to test, debug and optimize
each version, which obviously means that you need hardware, tools and
perhaps most importantly, time, for each architecture. I think this
might effectively put SIMD out of reach for most hobbyist developers,
and even many professionals. (Professionals only have the time and
budget for this kind of stuff if it’s strictly required, basically.)

Us Free/Open Source developers (seems to be plenty around here) have a
major advantage in that we can potentially get contributions from
others for hardware we don’t have, or don’t have tools, time or the
required knowledge to support.

OTOH, that means code that we cannot maintain ourselves, so if we at
some point need to change things, we may be forced to drop most of
the SIMD code, until someone feels like updating it.

That said, libraries (like SDL) can differ quite a bit from
applications in this respect. Libraries generally have many more
users/developers than do applications (an application doesn’t
necessarily have any users with programming experience at all), and
thus, a better chance of getting the SIMD code written and
maintained.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
http://olofson.nethttp://www.reologica.se —On Monday 20 October 2003 03.49, oberon wrote:

A lot of people have been saying a lot of weird stuff about SIMD. I’d just
like to respond to all that’s been said in one big rant:

Processor dependance is an interesting issue. I’m sure, however, that
there’s some way to detect the current processor. As long as the code for
every SIMD instruction set is compiled in, a simple case statement at
run time could figure out which one to use. Once that’s done, the union
can be set up appropriately and all the function pointers initialized. No
runtime speed penalty; only during initialization.

As to speed: it would help. The argument about C vs. ASM is spurious –
C compiles to ASM. If you don’t trust your compiler to write good ASM then
you have more problems than SIMD. On the mac, at least, there’s a C
library provided for doing altivec vector ops. Perhaps all x86 SIMD is so
poorly conceived and implemented that this is not possible, but I highly
doubt it. And while compilers can in theory auto-vectorize stuff, it would be at
compile time, whereas this would be runtime.

I’m kinda confused why everyone is so against this right off. Is there
some bizarre allergy to processing a lot of data at once going around?
It’s not like I’m suggesting something infeasible – plenty of apps use
SIMD. If someone could explain why everyone is in such a hurry to declare
it unpractical or impossible I’d appreciate it.

–oberon

To the best of my knowledge I have read every post in this thread. I do
not know what you are reacting too. No one has expressed the opinions
you are ranting against. Large portions of SDL already work the way you
describe. And that is well known and has been acknowledged in this
thread.

The original thread topic was about providing a architecture neutral way
of writing SIMD code above the compiler. Such a task is difficult
because of the large number of different SIMD instruction sets developed
by the different x86 manufactures. Of course code can be written for
each different instruction set and selected at run time. It is the idea
of generating one set of code that gets maximum advantage from all the
instruction sets without modifying the compiler that is hard.

		Bob PendletonOn Sun, 2003-10-19 at 20:49, oberon wrote:

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

±----------------------------------+

I’m kinda confused why everyone is so against this right off. Is there
some bizarre allergy to processing a lot of data at once going around?
It’s not like I’m suggesting something infeasible – plenty of apps use
SIMD. If someone could explain why everyone is in such a hurry to declare
it unpractical or impossible I’d appreciate it.

There’s no reason SDL can’t use SIMD instructions internally, except no
one has done so (it’s a lot of work that many people aren’t qualified to
do, and no one is qualified to maintain once it’s written). That being
said, there is a little bit of MMX and SSE in there already. Not much,
though. If you find a hot spot inside SDL and want to throw some Altivec
at it, the patch is certainly welcome. The same code still needs to
work on non-Macs and pre-G4 machines, too. And it has to compile with
CodeWarrior and gcc. The solution frequently becomes uglier than the
problem, and a timesink, so it gets avoided…but done right, there’s no
reason it can’t be in SDL.

I thought the original conversation was about exposing an abstraction
over SIMD instructions to the application, though, which is not
practical for several reasons.

–ryan.

To respond to the points David and Bob have made:

I realize that explicit SIMD would take a lot of work, but for some apps
it makes a lot of sense. My main question is whether this is a “SDL type
of thing.” If so, I’d happily write the altivec code and help iron out
abstractions that would provide maximum use across every SIMD
architecture. I’d even be willing to help with other SIMD stuff if
necessary, though I lack the hardware to test.

Responding directly to Bob: I think I might have misinterpretted. People
seem to be reacting to the fact that SIMD would not be an "easy addition"
and that there would clearly be technical hurdles. These do not bother me;
I have yet to meet a technical hurdle that can’t be jumped over (or
knocked over, depending). My main question, as I said above, is whether
this is an appropriate thing for SDL.

Sorry I haven’t been quoting the emails I respond to; pine doesn’t like
the size of the SDL digest :frowning:

–oberon

To respond to the points David and Bob have made:

I realize that explicit SIMD would take a lot of work, but for some apps
it makes a lot of sense. My main question is whether this is a “SDL type
of thing.” If so, I’d happily write the altivec code and help iron out
abstractions that would provide maximum use across every SIMD
architecture. I’d even be willing to help with other SIMD stuff if
necessary, though I lack the hardware to test.

Responding directly to Bob: I think I might have misinterpretted. People
seem to be reacting to the fact that SIMD would not be an "easy addition"
and that there would clearly be technical hurdles. These do not bother me;
I have yet to meet a technical hurdle that can’t be jumped over (or
knocked over, depending). My main question, as I said above, is whether
this is an appropriate thing for SDL.

Yeah, now that you point it out, there was a strong drift away from the
original topic. Hey, a good rant can be worth a lot. :slight_smile:

On topic: IMHO, the kind of project you are talking about is something
that SDL developers, and all people doing graphics and other numeric
intensive work, would benefit from. Being that it is of general value I
think it is well worth working on. Whether it fits inside of SDL, I’m
not the one who gets to decide that.

	Bob PendletonOn Tue, 2003-10-21 at 15:41, oberon wrote:

Sorry I haven’t been quoting the emails I respond to; pine doesn’t like
the size of the SDL digest :frowning:

–oberon


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

±----------------------------------+