Atomic int/ptr operations?

Mac OS X:

http://developer.apple.com/documentation/Cocoa/Conceptual/Multithreading/ThreadSafety/ThreadSafety.html#//apple_ref/doc/uid/10000057i-CH8-SW14

You should use those on PowerPC builds, because it’s hard to get it right
in assembly (and possibly impossible across all PowerPC chips).

But on Intel chips, you might as well just use the same "lock ; xchg"
opcodes everyone else does, instead of relying on the OS.

I think I am really relying on the compiler, not the OS. Because of the need
to impose a code motion barrier on the compiler and an execution order
barrier on the processor I want to let the compiler issue the opcodes
because I believe the compiler writers have more knowledge about the subject
than I do :slight_smile: I’m being very conservative.

Please give me more information. And, please, feel free to jump in an
correct anything you see me doing wrong.

Bob PendletonOn Thu, Jun 18, 2009 at 7:41 PM, Ryan C. Gordon wrote:

–ryan.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------

Wouldn’t it be safer / easier to do this in a separate library? SDL_Atomic or something? So as to not break the main build.

Pat________________________________
From: bob@pendleton.com (Bob Pendleton)
To: A list for developers using the SDL library. (includes SDL-announce)
Sent: Friday, June 19, 2009 12:39:20 PM
Subject: Re: [SDL] Atomic int/ptr operations?

On Thu, Jun 18, 2009 at 10:35 PM, Sam Lantinga wrote:

Yeah, this all seems reasonable. Patch away! :slight_smile:

OK… you asked for it :slight_smile:

Seriously, I can build a new .h and add a .c file with function definitions. I’ll even redo the test program. I’ll sketch out the Windows and Mac code, but, I will not be able to compile the Mac code, and maybe not the Windows code. So, I’ll check in the GCC code and circulate the rest on the list for other people to compile and test. When I have that code nailed down, I’ll check it in.

Bob Pendleton

On Thu, Jun 18, 2009 at 7:24 AM, Bob Pendleton wrote:

Here is info for MacOS. Interesting that they only support (AFAICT) 32 bit and 64 bit operands. That means, that if we want to work across platforms we can’t support 8 and 16 bit operands.

Mac OS X: http://developer.apple.com/documentation/Cocoa/Conceptual/Multithreading/ThreadSafety/ThreadSafety.html#//apple_ref/doc/uid/10000057i-CH8-SW14

I’m starting to think that these operations should be implemented as out of line functions in SDL. Doing it that way avoids problems with forced inclusion of os specific header files in user code and it appears to get rid of memory barrier problems (I said appears…). All at the cost of a very small reduction in performance.

Opinions? Flames? what do y’all think about this?

Bob Pendleton

On Wed, Jun 17, 2009 at 7:50 PM, Bob Pendleton wrote:

On Fri, Jun 12, 2009 at 3:46 AM, Sam Lantinga wrote:

Bob, this broke building on Windows. There are a couple problems.

First, including windows.h indirectly in SDL.h breaks SDL_sysvideo.h, which can be fixed, and potentially breaks application code that doesn’t compile with windows.h included.

Second, at least on Visual C++ there are warnings about using the wrong parameter types with the interlocked functions. These are correct, and even though some care has been taken to make sure the parameters are the same size, it makes me nervous.

Spent most of this afternoon reading the MSDN and GCC sections on atomic operations. No, not even close to reading all the stuff on MSDN. GCC docs were much more concise.

One with the code we now have is that it is intended to be inlined whenever possible. This makes good sense for performance, not such good sense for portability. That is where we get the problem with #including “windows.h” in a header file. That is pretty much a no no. Another problem is that the Windows functions are very picky about their argument sizes. As a fan of strong typing I can hardly complain about that. Yet another problem is all that assembly code. It made me nervous from the get go. AFAICT it isn’t needed on any platfrom supported by Windows or by GCC.

GCC: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html

MSDN: http://msdn.microsoft.com/en-us/library/ms686353(VS.85).aspx

I think we need to go back to basics on this part of 1.3.

It looks like the size is going to have to be part of the function names, so if we want to support atomic ops on 8,16,32, and 64 bit fields we’re going to have 4 versions of each function.

We need to look at inlining. Looks to me like we can maybe get away with #define-ing the functions to map to the equivalent fuctions for each platform. This does assume that things like windows.h will get included “somewhere” before the #define-ed code is used. This may not be possible and the functions may have to be defined completely outside of the header files. I need input on this topic. The easiest way to implement this is out of line, but that does cost some performance.

It looks like there is no one atomic operation you can count on having on every processor type. At the least you get either an atomic test-and-set used to acquire a lock along with an atomic store used to release a lock or an atomic exchange that can be used to implement the test-and-set/relelase operations. After that you get into the whole set of fetch-and-op/op-and-fetch operations where op is a subset of (increment, add, decrement, subtract, and, xor, nand…).

Given either exchange or test-and-set/release you can implement all the others with a small loss of performance.

It looks to me like a reasonable set would be 8,16,32, and 64 bit operand versions of:

test-and-set/release
exchange
fetch-and-increment
fetch-and-add
fetch-and-decrement
fetch-and-subtract
increment-and-fetch
add-and-fetch
decrement-and-fetch
subtract-and-fetch

The difference between fetch-and-add and add-and-fetch is like the difference between i++ and ++i.

test-and-set returns SDL_bool,
release returns void
all the rest return one of Uint8, Uint16, Uint32, Uint64

hmm, should probably add busy-wait that works with exchange or test-and-set.

Double hmm, don’t worry, the final names will not have “-” in them, they’ll be SDL_TestAndSet() or something like that.

Ok, that’s my take on it, let me know what’s wrong with it.

See ya,
–Sam

On Tue, Jun 9, 2009 at 11:03 AM, Bob Pendleton wrote:

Ok, I applied the patch. Got everything to compile. Got it all to
install correctly. And ran testatomic. It all works on my Linux box so
I checked it in. No guarantees about any other platform. So, update
from svn and test it on everything. If you find a problem post patches
and we’ll get them in.

On Tue, Jun 9, 2009 at 11:16 AM, Donny Viszneki<donny.viszneki at gmail.com> wrote:

On Tue, Jun 9, 2009 at 9:04 AM, Bob Pendleton wrote:

On Tue, Jun 9, 2009 at 1:21 AM, Donny Viszneki<donny.viszneki at gmail.com> wrote:

xchg xchg xchg xchg xchg xchg xchg xchg …

so what platforms will that patch operate on?

Don’t know yet! What platforms are you interested in? Why don’t you
take a look too.

I did take a look! That’s what “xchg xchg xchg xchg” was all about!

Ya’know… I had not clue what that was about…

Bob Pendleton

http://codebad.com/


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

±----------------------------------------------------------


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------

±----------------------------------------------------------


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


-Sam Lantinga, Founder and President, Galaxy Gameworks LLC


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------

Wouldn’t it be safer / easier to do this in a separate library? SDL_Atomic
or something? So as to not break the main build.

That is the plan. It is currently all in one .h file. That will change.On Fri, Jun 19, 2009 at 12:54 PM, Patryk Bratkowski wrote:

Pat


From: Bob Pendleton <@Bob_Pendleton>
To: A list for developers using the SDL library. (includes SDL-announce)

Sent: Friday, June 19, 2009 12:39:20 PM
Subject: Re: [SDL] Atomic int/ptr operations?

On Thu, Jun 18, 2009 at 10:35 PM, Sam Lantinga wrote:

Yeah, this all seems reasonable. Patch away! :slight_smile:

OK… you asked for it :slight_smile:

Seriously, I can build a new .h and add a .c file with function
definitions. I’ll even redo the test program. I’ll sketch out the Windows
and Mac code, but, I will not be able to compile the Mac code, and maybe not
the Windows code. So, I’ll check in the GCC code and circulate the rest on
the list for other people to compile and test. When I have that code nailed
down, I’ll check it in.

Bob Pendleton

On Thu, Jun 18, 2009 at 7:24 AM, Bob Pendleton <@Bob_Pendleton> wrote:

Here is info for MacOS. Interesting that they only support (AFAICT) 32
bit and 64 bit operands. That means, that if we want to work across
platforms we can’t support 8 and 16 bit operands.

Mac OS X:
http://developer.apple.com/documentation/Cocoa/Conceptual/Multithreading/ThreadSafety/ThreadSafety.html#//apple_ref/doc/uid/10000057i-CH8-SW14

I’m starting to think that these operations should be implemented as out
of line functions in SDL. Doing it that way avoids problems with forced
inclusion of os specific header files in user code and it appears to get rid
of memory barrier problems (I said appears…). All at the cost of a very
small reduction in performance.

Opinions? Flames? what do y’all think about this?

Bob Pendleton

On Wed, Jun 17, 2009 at 7:50 PM, Bob Pendleton <@Bob_Pendleton>wrote:

On Fri, Jun 12, 2009 at 3:46 AM, Sam Lantinga wrote:

Bob, this broke building on Windows. There are a couple problems.

First, including windows.h indirectly in SDL.h breaks SDL_sysvideo.h,
which can be fixed, and potentially breaks application code that doesn’t
compile with windows.h included.

Second, at least on Visual C++ there are warnings about using the wrong
parameter types with the interlocked functions. These are correct, and even
though some care has been taken to make sure the parameters are the same
size, it makes me nervous.

Spent most of this afternoon reading the MSDN and GCC sections on atomic
operations. No, not even close to reading all the stuff on MSDN. GCC docs
were much more concise.

One with the code we now have is that it is intended to be inlined
whenever possible. This makes good sense for performance, not such good
sense for portability. That is where we get the problem with #including
"windows.h" in a header file. That is pretty much a no no. Another problem
is that the Windows functions are very picky about their argument sizes. As
a fan of strong typing I can hardly complain about that. Yet another problem
is all that assembly code. It made me nervous from the get go. AFAICT it
isn’t needed on any platfrom supported by Windows or by GCC.

GCC: http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html

MSDN: http://msdn.microsoft.com/en-us/library/ms686353(VS.85).aspx

I think we need to go back to basics on this part of 1.3.

It looks like the size is going to have to be part of the function
names, so if we want to support atomic ops on 8,16,32, and 64 bit fields
we’re going to have 4 versions of each function.

We need to look at inlining. Looks to me like we can maybe get away with
#define-ing the functions to map to the equivalent fuctions for each
platform. This does assume that things like windows.h will get included
"somewhere" before the #define-ed code is used. This may not be possible and
the functions may have to be defined completely outside of the header files.
I need input on this topic. The easiest way to implement this is out of
line, but that does cost some performance.

It looks like there is no one atomic operation you can count on having
on every processor type. At the least you get either an atomic
test-and-set used to acquire a lock along with an atomic store used to
release a lock or an atomic exchange that can be used to implement the
test-and-set/relelase operations. After that you get into the whole set of
fetch-and-op/op-and-fetch operations where op is a subset of (increment,
add, decrement, subtract, and, xor, nand…).

Given either exchange or test-and-set/release you can implement all the
others with a small loss of performance.

It looks to me like a reasonable set would be 8,16,32, and 64 bit
operand versions of:

test-and-set/release
exchange
fetch-and-increment
fetch-and-add
fetch-and-decrement
fetch-and-subtract
increment-and-fetch
add-and-fetch
decrement-and-fetch
subtract-and-fetch

The difference between fetch-and-add and add-and-fetch is like the
difference between i++ and ++i.

test-and-set returns SDL_bool,
release returns void
all the rest return one of Uint8, Uint16, Uint32, Uint64

hmm, should probably add busy-wait that works with exchange or
test-and-set.

Double hmm, don’t worry, the final names will not have “-” in them,
they’ll be SDL_TestAndSet() or something like that.

Ok, that’s my take on it, let me know what’s wrong with it.

See ya,
–Sam

On Tue, Jun 9, 2009 at 11:03 AM, Bob Pendleton <@Bob_Pendleton>wrote:

Ok, I applied the patch. Got everything to compile. Got it all to
install correctly. And ran testatomic. It all works on my Linux box so
I checked it in. No guarantees about any other platform. So, update
from svn and test it on everything. If you find a problem post patches
and we’ll get them in.

On Tue, Jun 9, 2009 at 11:16 AM, Donny Viszneki< donny.viszneki at gmail.com> wrote:

On Tue, Jun 9, 2009 at 9:04 AM, Bob Pendleton<@Bob_Pendleton> wrote:

On Tue, Jun 9, 2009 at 1:21 AM, Donny Viszneki< donny.viszneki at gmail.com> wrote:

xchg xchg xchg xchg xchg xchg xchg xchg …

so what platforms will that patch operate on?

Don’t know yet! What platforms are you interested in? Why don’t you
take a look too.

I did take a look! That’s what “xchg xchg xchg xchg” was all about!

Ya’know… I had not clue what that was about…

Bob Pendleton


http://codebad.com/


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------


±----------------------------------------------------------


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


-Sam Lantinga, Founder and President, Galaxy Gameworks LLC


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------

I’m a little bit stuck on getting the GNU atomic builtin functions to
link correctly. The 32 bit versions of the function compile and link
just fine. The 64 bit versions compile, but when I try to link them I
get:

/usr/local/lib/libSDL.so: undefined reference to __sync_lock_test_and_set_8' /usr/local/lib/libSDL.so: undefined reference to__sync_sub_and_fetch_8’
/usr/local/lib/libSDL.so: undefined reference to __sync_fetch_and_sub_8' /usr/local/lib/libSDL.so: undefined reference to__sync_fetch_and_add_8’
/usr/local/lib/libSDL.so: undefined reference to
__sync_bool_compare_and_swap_8' /usr/local/lib/libSDL.so: undefined reference to__sync_add_and_fetch_8’

Researching the problem I see several people having the problem and
the the way around it is to set the -march= compile flag to the
correct value. Trouble is, I’m not sure what the correct value is. I’m
on a x86_64 processor, but I am running the generic Ubuntu kernel
which is 32 bit. Uname --all gives me:

Linux voyager 2.6.28-13-generic #44-Ubuntu SMP Tue Jun 2 07:57:31 UTC
2009 i686 GNU/Linux

I’m working my way through the possible march values trying to see what works.

Any help?

Bob Pendleton–
±----------------------------------------------------------

http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html

I’d say try one of the 64-bit ones. Your compiler will need to support
that architecture, though.On Wed, Jun 24, 2009 at 6:45 PM, Bob Pendleton wrote:

I’m working my way through the possible march values trying to see what works.


http://codebad.com/

I’m working my way through the possible march values trying to see what works.

http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html

I’d say try one of the 64-bit ones. Your compiler will need to support
that architecture, though.

Hey, thanks Donny. I tried building with -march=pentium4 and worked my
way down to i486. The 64 bit atomic operations compile and link
starting at -march=pentium and up from there. The default architecture
is i386. All of SDL builds for i386. Most Linux distros distribute
code built for i386 and x86_64 (-march=k8 or athlon64). And, a lot of
people with 64 bit machines are running 32 bit OSes. Interesting
situation.

Right now, it seems that to get 64 bit atomic operations we would need
to build for at least pentium not for i36. It is also possible there
is a well hidden library the provides the 64 bit atomic operations.
(It seems if you go far enough down the list of processor
architectures you can get 128 bit atomic ops. :slight_smile:

What I have learned is that 32 bit atomic ops are supported on Mac
OSX, Windows, and Linux. And, they are supported by every Intel
processor since the i386. As far as I can tell right now they are the
only atomic ops that can be safely supported on the three major OSes
and all Intel processors. I don’t know a lot about the rest of the
OSes or other processor architectures and SDL support a bunch.

There are several options open at this point. We can just drop 64 bit
atomic ops or we can write 64 bit ops based on the 32 bit ops. (If we
do that we can implement 8 and 16 bit ops also.) Currently it seems to
me the way you do that is to use one internal lock variable that uses
a busy wait and and an atomic clear to control access to the code for
the other atomic operations. It would be slower than native
instructions, but it would work everywhere that has either an atomic
test-and-set or xchg along with an atomic clear. (Both of which can be
implemented with an atomic xchg.

By far the simplest way to do all this is to just use the 32 bit
operations. OTOH, a more software approach will be more implementable
across a wide range of processors.

I’m open for either, I have no ego in this… Let me know what y’all think.

Bob PendletonOn Wed, Jun 24, 2009 at 8:33 PM, Donny Viszneki <donny.viszneki at gmail.com> wrote:

On Wed, Jun 24, 2009 at 6:45 PM, Bob Pendleton<@Bob_Pendleton> wrote:


http://codebad.com/


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------