Possible SDL addition

I know one thing I really wish was crossplatform that might fit in SDL is
hardware accelerated but hardware abstracted SIMD. I’m thinking a set of
function pointers and a union (the union being of char[16], short[8],
long[4], long long[2], and a 128 bit element if available) to allow 128
bit SIMD operations. With the number of schemes available now (altivec,
SSE, SSE2, 3DNOW!, MMX) it’s become nearly impossible to keep up. It might
also be useful in blitter code to get maximum speed in software. So:

Is this an appropriate idea for SDL? Does this jive with the spririt of
the library?

Would anyone like to help? I’ve poked at altivec and it’s not bad, but I
don’t know anything about the PC side of things and I’d like to get some
feedback (especially from The Great And Glorious Sam Who Bestows Upon Us
This Library) on the exact format of functions/data type(s?).

Comments/suggestions?

–oberon

oberon wrote:

Is this an appropriate idea for SDL? Does this jive with the spririt of
the library?

the simd asm extension in blitting ecc … they are already in the sdl

read the source and see !!!

I know one thing I really wish was crossplatform that might fit in SDL is
hardware accelerated but hardware abstracted SIMD. I’m thinking a set of
function pointers and a union (the union being of char[16], short[8],
long[4], long long[2], and a 128 bit element if available) to allow 128
bit SIMD operations. With the number of schemes available now (altivec,
SSE, SSE2, 3DNOW!, MMX) it’s become nearly impossible to keep up. It might
also be useful in blitter code to get maximum speed in software. So:

Is this an appropriate idea for SDL? Does this jive with the spririt of
the library?

Would anyone like to help? I’ve poked at altivec and it’s not bad, but I
don’t know anything about the PC side of things and I’d like to get some
feedback (especially from The Great And Glorious Sam Who Bestows Upon Us
This Library) on the exact format of functions/data type(s?).

Comments/suggestions?

–oberon

This isn’t my area of expertise, but wouldn’t calling a function for every
SIMD instruction that you wanted to do kill SDL’s performance, or at the very
least negate the speed gained from using the SIMD instructions in the first
place?

-Sean Ridenour

I know one thing I really wish was crossplatform that might fit in SDL is
hardware accelerated but hardware abstracted SIMD. I’m thinking a set of
function pointers and a union (the union being of char[16], short[8],
long[4], long long[2], and a 128 bit element if available) to allow 128
bit SIMD operations. With the number of schemes available now (altivec,
SSE, SSE2, 3DNOW!, MMX) it’s become nearly impossible to keep up. It might
also be useful in blitter code to get maximum speed in software. So:

Is this an appropriate idea for SDL? Does this jive with the spririt of
the library?

Would anyone like to help? I’ve poked at altivec and it’s not bad, but I
don’t know anything about the PC side of things and I’d like to get some
feedback (especially from The Great And Glorious Sam Who Bestows Upon Us
This Library) on the exact format of functions/data type(s?).

Comments/suggestions?

–oberon

This isn’t my area of expertise, but wouldn’t calling a function for every
SIMD instruction that you wanted to do kill SDL’s performance, or at the very
least negate the speed gained from using the SIMD instructions in the first
place?

Yes, if you did it that way. Many C compilers allow you write inline
assembly code. It might actually be possible to create a set of #defines
for and abstract SIMD instruction set (or a common subset) that would
let you write architecture neutral assembly language.

I don’t have time to lest the number of things that are wrong with that
idea. But, it is theoretically possible. The worst problems are that it
would only work with a small number of compilers and you would be
restricted to capabilities that are common all the different instruction
sets. And, of course, event that insures that the code only works on the
architecture it is compiled for, so the code won’t run on other
machines.

	Bob PendletonOn Sat, 2003-10-18 at 19:00, Sean Ridenour wrote:

-Sean Ridenour


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

±----------------------------------+

I don’t have time to lest the number of things that are wrong with that
idea. But, it is theoretically possible. The worst problems are that it
would only work with a small number of compilers and you would be
restricted to capabilities that are common all the different instruction
sets. And, of course, event that insures that the code only works on the
architecture it is compiled for, so the code won’t run on other
machines.

There might be a way to lift this one-architecture restriction by generating
the actual machine code at runtime (yikes!). But this would require a lot of
probably performance-killing run-time logic to do the final compilation. And
virus scanners and other tools might even detect self-modifying code and
throw a false allert on this.

While this approach might work, I do not want to see it in SDL as it is
really, really ugly.

Regards,
GregorAm Sonntag, 19. Oktober 2003 04:32 schrieb Bob Pendleton:

  Bob Pendleton

gregor, what if there was an install program that compiled the code into an
executable on the machine that person was going to be playing on? that way
it could be 100% optomized for that specific machine…neat idea?

maybe someone should make somekinda install utility thats coupled with gcc
to turn source into executables and such.

easy enough w/ linux bsd etc to type make install but not as easy in windows
and such.

btw im working on a game that requires gcc (mingw for windows) and
downloads/compiles parts of itself at runtime into dynamic libs so that it
has flexibility but efficiency at the same time. kinda off topic but just
mentioning that i do run time compiling and it does work pretty well! just a
small bit of a pause when it has to compile.> ----- Original Message -----

From: gregormueckl@gmx.de (Gregor Muckl)
To:
Sent: Sunday, October 19, 2003 2:15 AM
Subject: Re: [SDL] Possible SDL addition

Am Sonntag, 19. Oktober 2003 04:32 schrieb Bob Pendleton:

I don’t have time to lest the number of things that are wrong with that
idea. But, it is theoretically possible. The worst problems are that it
would only work with a small number of compilers and you would be
restricted to capabilities that are common all the different instruction
sets. And, of course, event that insures that the code only works on the
architecture it is compiled for, so the code won’t run on other
machines.

There might be a way to lift this one-architecture restriction by
generating
the actual machine code at runtime (yikes!). But this would require a lot
of
probably performance-killing run-time logic to do the final compilation.
And
virus scanners and other tools might even detect self-modifying code and
throw a false allert on this.

While this approach might work, I do not want to see it in SDL as it is
really, really ugly.

Regards,
Gregor

Bob Pendleton


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Alan Wolfe wrote:

gregor, what if there was an install program that compiled the code into an
executable on the machine that person was going to be playing on? that way
it could be 100% optomized for that specific machine…neat idea?

maybe someone should make somekinda install utility thats coupled with gcc
to turn source into executables and such.

easy enough w/ linux bsd etc to type make install but not as easy in windows
and such.

btw im working on a game that requires gcc (mingw for windows) and
downloads/compiles parts of itself at runtime into dynamic libs so that it
has flexibility but efficiency at the same time. kinda off topic but just
mentioning that i do run time compiling and it does work pretty well! just a
small bit of a pause when it has to compile.

Is the source code available ? I would be interested in seeing it :slight_smile:

As for the original question, isn’t it the compiler’s job to optimize
(or vectorize, for that matter) the code properly ?

Stephane

Alan Wolfe wrote:

gregor, what if there was an install program that compiled the code into an
executable on the machine that person was going to be playing on? that way
it could be 100% optomized for that specific machine…neat idea?

maybe someone should make somekinda install utility thats coupled with gcc
to turn source into executables and such.

easy enough w/ linux bsd etc to type make install but not as easy in windows
and such.

btw im working on a game that requires gcc (mingw for windows) and
downloads/compiles parts of itself at runtime into dynamic libs so that it
has flexibility but efficiency at the same time. kinda off topic but just
mentioning that i do run time compiling and it does work pretty well! just a
small bit of a pause when it has to compile.

Is the source code available ? I would be interested in seeing it :slight_smile:

As for the original question, isn’t it the compiler’s job to optimize
(or vectorize, for that matter) the code properly ?

Yes, it is :slight_smile: BUT it is very hard for a compiler to recognize code
that can be optimized into SIMD instructions. Compilers that do that
usually rely on language extensions to allow the programmer to
explicitly code SIMD code. Its been, oh jeez, 15 years since I did this
kind of work, but I did spend several years writing programming tools to
support SIMD programming for graphics hardware. So, I hope things have
improved since then… Anyone know of a compiler that converts a
non-extended language to SIMD code?

	Bob PendletonOn Sun, 2003-10-19 at 09:17, Stephane Marchesin wrote:

Stephane


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

±----------------------------------+

Isn’t Intel’s compiler (icc) supposed to do this to some extent? I’ve
never used it, but I think that’s how it gets its speed increases.
Here’s some info about it, particularly in the “Intra-Register
Vectorization” section:

http://www.linuxjournal.com/article.php?sid=4885

-MikeOn Mon, 2003-10-20 at 11:06, Bob Pendleton wrote:

On Sun, 2003-10-19 at 09:17, Stephane Marchesin wrote:

Alan Wolfe wrote:

gregor, what if there was an install program that compiled the code into an
executable on the machine that person was going to be playing on? that way
it could be 100% optomized for that specific machine…neat idea?

maybe someone should make somekinda install utility thats coupled with gcc
to turn source into executables and such.

easy enough w/ linux bsd etc to type make install but not as easy in windows
and such.

btw im working on a game that requires gcc (mingw for windows) and
downloads/compiles parts of itself at runtime into dynamic libs so that it
has flexibility but efficiency at the same time. kinda off topic but just
mentioning that i do run time compiling and it does work pretty well! just a
small bit of a pause when it has to compile.

Is the source code available ? I would be interested in seeing it :slight_smile:

As for the original question, isn’t it the compiler’s job to optimize
(or vectorize, for that matter) the code properly ?

Yes, it is :slight_smile: BUT it is very hard for a compiler to recognize code
that can be optimized into SIMD instructions. Compilers that do that
usually rely on language extensions to allow the programmer to
explicitly code SIMD code. Its been, oh jeez, 15 years since I did this
kind of work, but I did spend several years writing programming tools to
support SIMD programming for graphics hardware. So, I hope things have
improved since then… Anyone know of a compiler that converts a
non-extended language to SIMD code?

  Bob Pendleton

AFAIK any decent C++ compiler is able to generate
SIMD for math code.

And the Intel compiler even has some math classes
that provide an high level view of the SIMD instructions
of the x86 family

Bob Pendleton wrote:>On Sun, 2003-10-19 at 09:17, Stephane Marchesin wrote:

Alan Wolfe wrote:

gregor, what if there was an install program that compiled the code into an
executable on the machine that person was going to be playing on? that way
it could be 100% optomized for that specific machine…neat idea?

maybe someone should make somekinda install utility thats coupled with gcc
to turn source into executables and such.

easy enough w/ linux bsd etc to type make install but not as easy in windows
and such.

btw im working on a game that requires gcc (mingw for windows) and
downloads/compiles parts of itself at runtime into dynamic libs so that it
has flexibility but efficiency at the same time. kinda off topic but just
mentioning that i do run time compiling and it does work pretty well! just a
small bit of a pause when it has to compile.

Is the source code available ? I would be interested in seeing it :slight_smile:

As for the original question, isn’t it the compiler’s job to optimize
(or vectorize, for that matter) the code properly ?

Yes, it is :slight_smile: BUT it is very hard for a compiler to recognize code
that can be optimized into SIMD instructions. Compilers that do that
usually rely on language extensions to allow the programmer to
explicitly code SIMD code. Its been, oh jeez, 15 years since I did this
kind of work, but I did spend several years writing programming tools to
support SIMD programming for graphics hardware. So, I hope things have
improved since then… Anyone know of a compiler that converts a
non-extended language to SIMD code?

  Bob Pendleton

Stephane


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Bob Pendleton wrote:

Is the source code available ? I would be interested in seeing it :slight_smile:

As for the original question, isn’t it the compiler’s job to optimize
(or vectorize, for that matter) the code properly ?

Yes, it is :slight_smile:

Ok, so we agree that’s something that has nothing to do with SDL :slight_smile:
People interested in vectorizing languages should google for vector
languages, SUIF seems interesting to that end.

BUT it is very hard for a compiler to recognize code
that can be optimized into SIMD instructions.

Yep, I tried writing a vectorizing compiler… so I know what you mean
:slight_smile: It remained a simple lex/yacc pseudo pascal -> i386 compiler, though.

Compilers that do that
usually rely on language extensions to allow the programmer to
explicitly code SIMD code. Its been, oh jeez, 15 years since I did this
kind of work, but I did spend several years writing programming tools to
support SIMD programming for graphics hardware. So, I hope things have
improved since then… Anyone know of a compiler that converts a
non-extended language to SIMD code?

At least for C/C++ it’s very hard to vectorize the code. Because of
variable aliasing, you still have to give clues to the compiler, usually
to promise two arrays won’t overlap, and sometimes to promise there are
no dependencies between iterations. Notably, VectorC (the standard
PlayStation 2 compiler, where the playstation 2 gets all its Mflops) and
intel’s compiler both do that by requiring many #pragmas here and there.

For gcc, things might change soon, as people are working on dependency
analysis and vectorization in a gcc branch called tree-ssa, although for
now there’s nothing usable.

But hey, I’m getting quite off topic now !

Stephane>On Sun, 2003-10-19 at 09:17, Stephane Marchesin wrote:

Paulo Pinto wrote:

AFAIK any decent C++ compiler is able to generate
SIMD for math code.

Not really. If you’re thinking about gcc, it is able to use SSE
instructions but does not treat resgister xmm* as vectors variables, but
rather as single floats. Still, it has some interest, performance-wise,
because some CPUs (notabley the P4) tread SSE(2) faster than standard
fpu computations.
And I’m not even talking about the -mmmx flag that I’ve never seen to
have any effect on the generated code :slight_smile:

And the Intel compiler even has some math classes
that provide an high level view of the SIMD instructions
of the x86 family

Yup, but there you have to write compiler-dependent code.

Stephane

As I can see it, all this is just a headache.
I have to admit that I haven’t got any solid theoric bases on dynamic
compilation or SIMD extensions, so everything is strictly IMHO…

There are several methods to use at 100% the capabilities of the CPU
you’re running on:

  1. distribute the sources and make the compiler on the target machine do
    all the needed optimizations; i.e. a gcc compiler on a athlon-xp will use
    CFLAGS=-O3 -march=athlon-xp -fomit-frame-pointer -mfpmath=sse
    Classical examples are MPlayer or Gentoo Linux (I use it. And it’s faaaast!)
    There are several drawbacks: it can be done only on *NIX systems, as in
    windows very few people have a C/C++ environment installed and even
    fewer have the same you have. Many people distribute for their
    projects GNU Makefiles along as Visual C++ project files. I say they are
    just a bit masochistic.
    Even in *NIX systems, compiling from sources is surely more complex than
    doing “rpm -i package” or such; not all users could have gcc and make
    installed. This counts double for videogame players.

  2. Distribute several binary packages of your application. The user will
    then choose which one to install.
    The problem is that you’ll have to distribute LOTS of files! For
    example, you’ll have, at least and for every version:

SDL-1.2.6-win32-i386.zip
SDL-1.2.6-win32-i686.zip
SDL-1.2.6-win32-pentium3.zip
SDL-1.2.6-win32-pentium4.zip
SDL-1.2.6-win32-athlon.zip
SDL-1.2.6-win32-athlon-xp.zip
SDL-1.2.6-linux-i386.tgz
SDL-1.2.6-linux-i686.tgz
SDL-1.2.6-linux-pentium3.tgz
SDL-1.2.6-linux-pentium4.tgz
SDL-1.2.6-linux-athlon.tgz
SDL-1.2.6-linux-athlon-xp.tgz

That’s decisely unhandy. Plus, if your package is being distributed by
someone else (i.e. a Linux distro), he will almost certainly provide
only the i386 or the i686 version.

  1. Distribute a single package with all the different binaries and a
    wrapper script that launches the correct executable. The main problem is
    the package size; in fact big programs will have their sizes multiplied
    (at least) by 6.

  2. Dynamic compilation. I don’t know how it works, then I’ll just skip it :slight_smile:

  3. Dynamic linking (my choice). The only drawback in this one is that
    you become tied to a specific compiler/make environment and you have to
    rewrite a lot of code if you decide to change it. However, gcc works
    with almost any platform, so there isn’t really such a problem…
    The principle is that some “critical” functions (the ones called
    thousands of times a second - to find out which ones they are, use a
    profiler) are compiled several times, in different modules and with
    different optimizations; their names are redefined each time; then
    they’re linked all together. A wrapper function calls the correct one at
    runtime.
    To have a working example (and also a much better explanation) of how
    this works, take a look at the Electric Field Simulator on my site:

http://www.crusaderky.altervista.org/downloads.php

I’m sorry there isn’t any documentation on the site yet; however, you
can read the “Multiarch-README” from inside the package, as well as look
at the sources.–
[] Guido Imperiale
[] CRV?ADER//KY
[] CVI.SCIENTIA.IMPERIVM

crusaderky at libero dot it
http://www.crusaderky.altervista.org

“Nam et ipsa scientia potestas est” (Knowledge is Power)
– Sir Francis Bacon (1561-1626)
Meditationes Sacrae, de Haeresibus

“The Net treats censorship as damage and routes around it.”
– John Gilmore

“I worry about my child and the Internet all the time, even though she’s
too young to have logged on yet. Here’s what I worry about. I worry that
10 or 15 years from now, she will come to me and say: ‘Daddy, where
were you when they took freedom of the press away from the Internet?’”
– Mike Godwin, Electronic Frontier Foundation

-------------- next part --------------
A non-text attachment was scrubbed…
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20031021/4000a4f7/attachment.pgp

Use “-mfpmath=sse,387”, otherwise gcc wont optimize correctly. (Im not
entirely sure why there even is an option to use sse by itself, and if there
is a good reason, it probably doesnt apply to any recent release (3.3.x))On 21-Oct-2003, CRV?ADER//KY wrote:

CFLAGS=-O3 -march=athlon-xp -fomit-frame-pointer -mfpmath=sse


Patrick “Diablo-D3” McFarland || unknown at panax.com
"Computer games don’t affect kids; I mean if Pac-Man affected us as kids, we’d
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." – Kristian Wilson, Nintendo, Inc, 1989
-------------- next part --------------
A non-text attachment was scrubbed…
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20031021/a36666d6/attachment.pgp