SDL: Adding Atomic Operations to SDL 1.3, part #3 Redesign

OK, part 3 is finally up. at http://www.thegrumpyprogrammer.com/node/16

In this article I go through 4 different common uses of atomic
operations and compare what they require to what I provided. I also
address the race condition in the current emulation code. At this
point I am going to take one more shot at this problem.

Bob Pendleton–
±----------------------------------------------------------

Looking forward to it, Bob! :)On Tue, Sep 1, 2009 at 3:40 PM, Bob Pendleton wrote:

OK, part 3 is finally up. at http://www.thegrumpyprogrammer.com/node/16

In this article I go through 4 different common uses of atomic
operations and compare what they require to what I provided. I also
address the race condition in the current emulation code. At this
point I am going to take one more shot at this problem.

Bob Pendleton


±----------------------------------------------------------


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


-Sam Lantinga, Founder and President, Galaxy Gameworks LLC

very nice article, thanks.On Tue, Sep 1, 2009 at 11:40 PM, Bob Pendleton wrote:

OK, part 3 is finally up. at http://www.thegrumpyprogrammer.com/node/16

In this article I go through 4 different common uses of atomic
operations and compare what they require to what I provided. I also
address the race condition in the current emulation code. At this
point I am going to take one more shot at this problem.

Bob Pendleton

hi,

here’s something that the linux kernel added to emulate 64bit atomic
ops using spin locks.

Thought you might be interested in it:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=09d4e0edd4614e787393acc582ac701c6ec3565b

cheers,On Sat, Sep 5, 2009 at 5:12 PM, Ren? Dudfield<@Rene_Dudfield> wrote:

On Tue, Sep 1, 2009 at 11:40 PM, Bob Pendleton wrote:

OK, part 3 is finally up. at http://www.thegrumpyprogrammer.com/node/16

In this article I go through 4 different common uses of atomic
operations and compare what they require to what I provided. I also
address the race condition in the current emulation code. At this
point I am going to take one more shot at this problem.

Bob Pendleton

very nice article, thanks.

hi,

here’s something that the linux kernel added to emulate 64bit atomic
ops using spin locks.

Thought you might be interested in it:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=09d4e0edd4614e787393acc582ac701c6ec3565b

Fascinating! Took me a minute to see how it would work. But, hashes
are amazing things.

You are right, I am very interested in this idea.

Bob PendletonOn Thu, Sep 10, 2009 at 11:07 AM, Ren? Dudfield wrote:

cheers,

On Sat, Sep 5, 2009 at 5:12 PM, Ren? Dudfield wrote:

On Tue, Sep 1, 2009 at 11:40 PM, Bob Pendleton<@Bob_Pendleton> wrote:

OK, part 3 is finally up. at http://www.thegrumpyprogrammer.com/node/16

In this article I go through 4 different common uses of atomic
operations and compare what they require to what I provided. I also
address the race condition in the current emulation code. At this
point I am going to take one more shot at this problem.

Bob Pendleton

very nice article, thanks.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------

It seems like the only atomic operations you really need to implement are:

AtomicIncrement
AtomicDecremen
AtomicExchange
AtomicCompareExchange
TestAndSet

I can’t really see any use in anything else for most applications, anyway. I probably should have mentioned this when you first began seeking help with SDL_atomic, but I had figured you had a solid plan for it at the time.

Yeah, Bob, I’m curious how often the other variants would actually be used?On Wed, Nov 18, 2009 at 7:55 PM, nfries88 wrote:

It seems like the only atomic operations you really need to implement are:

AtomicIncrement
AtomicDecremen
AtomicExchange
AtomicCompareExchange
TestAndSet

I can’t really see any use in anything else for most applications, anyway. I
probably should have mentioned this when you first began seeking help with
SDL_atomic, but I had figured you had a solid plan for it at the time.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


-Sam Lantinga, Founder and President, Galaxy Gameworks LLC

Yeah, Bob, I’m curious how often the other variants would actually be used?

That is a fair question. And, it has been long enough since I looked
at it that I can look at my code with fresh (or at least forgetful)
eyes.

Here is the list that nfries88 suggested:

AtomicIncrement
AtomicDecrement
AtomicExchange
AtomicCompareExchange
TestAndSet

The first thing you have to do when comparing this list is to kick out
the exchange operations because I found they were not widely
supported. Not that many architecture/OS combos seem to have an atomic
exchange. If you have atomicCompareExchange you don’t need testAndSet.
Because the exchange operations seem so rare I didn’t feel good about
including them in the list. Without an exchange you have to add an
atomic clear to clear values set by TestAndSet. That gives us this
list:

AtomicIncrement
AtomicDecrement
AtomicClear
TestAndSet

That is actually pretty close to what I was originally going to put in
SDL_atomic.h. :slight_smile:

Here is the list of the basic ops from SDL_atomic.c. classes of systems

SDL_AtomicTestThenSet
SDL_AtomicClear

SDL_AtomicFetchThenIncrement
SDL_AtomicFetchThenDecrement

SDL_AtomicIncrementThenFetch
SDL_AtomicDecrementThenFetch

SDL_AtomicFetchThenAdd
SDL_AtomicFetchThenSubtract

SDL_AtomicAddThenFetch
SDL_AtomicSubtractThenFetch

SDL_AtomicLock
SDL_AtomicUnlock

There are two of each of these, one ending in 32 and one ending in 64
because I wanted to be able to do atomic operations on 32 and 64 bit
values. Without 64 bit support you can’t do atomic operations on
pointers or reference counts. I expect both sets to be used for a long
time. I’m focused on 64 bit machines, but even I want some code to run
on both 32 and 64 bit architectures. The original lists included
operations on 8 and 16 bit values. But, after doing a lot of research
I found that there aren’t many architecture/OS combos that support all
those. Even the 64 bit ops are not available on all 32 bit
architectures.

Ok, so the difference between 32 and 64 bit architectures explains
half the length of my list.

BTW, I considered using size_t instead of explicit 32 and 64 bit
values. Doing that would get rid of half of my list. AFAIK SDL doesn’t
have a type equivalent to size_t, but it does use size_t. What do
folks think about making the atomic operations take a size_t? Is there
any OS where size_t is not 64 bits when a pointer is 64 bits?

Nfries88 didn’t say if his idea of atomic increment and decrement
operations include a fetch as part of the atomic operations. I assume
he did because those operations are pretty much useless with out an
included fetch. You can use an atomic increment and then fetch the
value, but another N atomic increments may have been executed between
the time you incremented the value and the time you get to fetch the
value you changed. N threads can increment the counter and then all N
can read the same value. That is why ALL atomic operations (except
clear, and even some versions of clear) include a fetch as part of the
operation.

Why both pre and post fetch versions? I almost didn’t do that. That
was the hardest decision I had to make. I had to provide a version of
the operations that also fetched a value. I had the choice to provide
either a pre-fetch version, a post-fetch version, or both. Most of the
architecture/OS combos I looked at provide both pre and post fetch
versions. Then there is the fact that C and all its descendents
provide post-fetch (++g) and pre-fetch (g++) versions of its increment
and decrement operators. I figured it was a good idea to allow any C,
C++, Java… code that depends on the difference between pre and post
fetch to be converted to atomic operations. That is why there are pre
and post fetch versions of all the arithmetic operations. It would not
break my heart to get rid of the pre-fetch versions of the arithmetic
operations.

Having pre and post versions of the arithmetic operations explains why
my list has twice as many versions of increment and decrement as
nFries’s list.

That brings us to the add and subtract operations. When working on
pointers, you rarely want to change them by just one. You usually want
to change them by the SIZEOF(some type). That means that if you want
to do atomic operations on pointers you want add and subtract, not
just increment and decrement. (Not that C automatically increments and
decrements by the SIZEOF the type of the pointer.) Increment and
decrement are fine for counters, but not for pointers. Ok, so why have
both addition subtraction and increment, decrement? Isn’t atomicAdd(1)
the same as increment? Yep, it is and so is atomicSubtract(-1). But,
it seems that increment and decrement are some times faster (no
argument to mess with) than add and subtract. Also, it seems that more
systems directly implement atomic increment and decrement than
directly implement atomic add and subtract. If add and subtract are
more likely to be slow and/or emulated, then you don’t want them to be
the only atomic arithmetic operations you provide. That is why I have
add and subtract as well as increment and decrement. Add and subtract
were added to support atomic operations on pointers.

I hope that explains why there are both add and subtract as well as
increment and decrement and why there are pre-fetch and post-fetch
versions of each of the operations. If people don’t like, or can’t see
the value of, post fetch versions of the ops, then I can take them
out. I just don’t think I know enough to say they are of no use.

I also included a spin lock. I did that because so many of the
algorithms I found were based on spin locks and I wanted to allow an
implementation of the library to provide the most efficient
implementation of a spin lock that was possible on that platform.
Systems have have and atomic exchange can use it to implement the spin
lock and testAndSet. Otherwise programmers all over the world would be
implementing their own spin locks using testAndSet that is implemented
using exchange. Thinking about wasting that many cycles and person
years hurt. Also, by adding spin lock to the library I was able to
keep the size of a lock variable set at 32 bits so there was no
possibility of forcing the world to use 64 bits for a 1 bit lock.

My final list looks like:

AtomicIncrement
AtomicAdd

AtomicDecrement
AtomicSubtract

AtomicClear
TestAndSet

AtomicLock
AtomicUnlock

Currently each of the arithmetic operations comes in a pre and post
fetch version and in 32 and 64 bit versions for a total of 4 of each
operation. It looks to me like the two lists aren’t that different.
The key differences are the result of:

  1. the general lack of exchange operations, my decision to include
    spin locks in the library, and my desire to do atomic operations on
    pointers.

Bob PendletonOn Wed, Nov 18, 2009 at 11:38 PM, Sam Lantinga wrote:

On Wed, Nov 18, 2009 at 7:55 PM, nfries88 wrote:

It seems like the only atomic operations you really need to implement are:

AtomicIncrement
AtomicDecremen
AtomicExchange
AtomicCompareExchange
TestAndSet

I can’t really see any use in anything else for most applications, anyway. I
probably should have mentioned this when you first began seeking help with
SDL_atomic, but I had figured you had a solid plan for it at the time.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


? ? ? ?-Sam Lantinga, Founder and President, Galaxy Gameworks LLC


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------

Damn, I hit the wrong key write at the end there. The last paragraph
was supposed to read:

Currently each of the arithmetic operations comes in a pre and post
fetch version and in 32 and 64 bit versions for a total of 4 of each
operation. It looks to me like the two lists aren’t that different.
The key differences are the result of:

  1. the general lack of exchange operations,
  2. my decision to include spin locks in the library,
  3. my desire to do atomic operations on pointers,
  4. and support for 64 bit applications.

Yeah, Bob, I’m curious how often the other variants would actually be used?

That is a fair question. And, it has been long enough since I looked
at it that I can look at my code with fresh (or at least forgetful)
eyes.

Here is the list that nfries88 suggested:

AtomicIncrement
AtomicDecrement
AtomicExchange
AtomicCompareExchange
TestAndSet

The first thing you have to do when comparing this list is to kick out
the exchange operations because I found they were not widely
supported. Not that many architecture/OS combos seem to have an atomic
exchange. If you have atomicCompareExchange you don’t need testAndSet.
?Because the exchange operations seem so rare I didn’t feel good about
including them in the list. Without an exchange you have to add an
atomic clear to clear values set by TestAndSet. That gives us this
list:

AtomicIncrement
AtomicDecrement
AtomicClear
TestAndSet

That is actually pretty close to what I was originally going to put in
SDL_atomic.h. :slight_smile:

Here is the list of the basic ops from SDL_atomic.c. classes of systems

SDL_AtomicTestThenSet
SDL_AtomicClear

SDL_AtomicFetchThenIncrement
SDL_AtomicFetchThenDecrement

SDL_AtomicIncrementThenFetch
SDL_AtomicDecrementThenFetch

SDL_AtomicFetchThenAdd
SDL_AtomicFetchThenSubtract

SDL_AtomicAddThenFetch
SDL_AtomicSubtractThenFetch

SDL_AtomicLock
SDL_AtomicUnlock

There are two of each of these, one ending in 32 and one ending in 64
because I wanted to be able to do atomic operations on 32 and 64 bit
values. Without 64 bit support you can’t do atomic operations on
pointers or reference counts. I expect both sets to be used for a long
time. I’m focused on 64 bit machines, but even I want some code to run
on both 32 and 64 bit architectures. The original lists included
operations on 8 and 16 bit values. But, after doing a lot of research
I found that there aren’t many architecture/OS combos that support all
those. Even the 64 bit ops are not available on all 32 bit
architectures.

Ok, so the difference between 32 and 64 bit architectures explains
half the length of my list.

BTW, I considered using size_t instead of explicit 32 and 64 bit
values. Doing that would get rid of half of my list. AFAIK SDL doesn’t
have a type equivalent to size_t, but it does use size_t. What do
folks think about making the atomic operations take a size_t? Is there
any OS where size_t is not 64 bits when a pointer is 64 bits?

Nfries88 didn’t say if his idea of atomic increment and decrement
operations include a fetch as part of the atomic operations. I assume
he did because those operations are pretty much useless with out an
included fetch. You can use an atomic increment and then fetch the
value, but another N atomic increments may have been executed between
the time you incremented the value and the time you get to fetch the
value you changed. N threads can increment the counter and then all N
can read the same value. That is why ALL atomic operations (except
clear, and even some versions of clear) include a fetch as part of the
operation.

Why both pre and post fetch versions? I almost didn’t do that. That
was the hardest decision I had to make. I had to provide a version of
the operations that also fetched a value. I had the choice to provide
either a pre-fetch version, a post-fetch version, or both. Most of the
architecture/OS combos I looked at provide both pre and post fetch
versions. Then there is the fact that C and all its descendents
provide post-fetch (++g) and pre-fetch (g++) versions of its increment
and decrement operators. I figured it was a good idea to allow any C,
C++, Java… code that depends on the difference between pre and post
fetch to be converted to atomic operations. That is why there are pre
and post fetch versions of all the arithmetic operations. It would not
break my heart to get rid of the pre-fetch versions of the arithmetic
operations.

Having pre and post versions of the arithmetic operations explains why
my list has twice as many versions of increment and decrement as
nFries’s list.

That brings us to the add and subtract operations. When working on
pointers, you rarely want to change them by just one. You usually want
to change them by the SIZEOF(some type). That means that if you want
to do atomic operations on pointers you want add and subtract, not
just increment and decrement. (Not that C automatically increments and
decrements by the SIZEOF the type of the pointer.) Increment and
decrement are fine for counters, but not for pointers. Ok, so why have
both addition subtraction and increment, decrement? Isn’t atomicAdd(1)
the same as increment? Yep, it is and so is atomicSubtract(-1). But,
it seems that increment and decrement are some times faster (no
argument to mess with) than add and subtract. Also, it seems that more
systems directly implement atomic ?increment and decrement than
directly implement atomic add and subtract. If add and subtract are
more likely to be slow and/or emulated, then you don’t want them to be
the only atomic arithmetic operations you provide. That is why I have
add and subtract as well as increment and decrement. Add and subtract
were added to support atomic operations on pointers.

I hope that explains why there are both add and subtract as well as
increment and decrement and why there are pre-fetch and post-fetch
versions of each of the operations. If people don’t like, or can’t see
the value of, post fetch versions of the ops, then I can take them
out. I just don’t think I know enough to say they are of no use.

I also included a spin lock. I did that because so many of the
algorithms I found were based on spin locks and I wanted to allow an
implementation of the library to provide the most efficient
implementation of a spin lock that was possible on that platform.
Systems have have and atomic exchange can use it to implement the spin
lock and testAndSet. Otherwise programmers all over the world would be
implementing their own spin locks using testAndSet that is implemented
using exchange. Thinking about wasting that many cycles and person
years hurt. Also, by adding spin lock to the library I was able to
keep the size of a lock variable set at 32 bits so there was no
possibility of forcing the world to use 64 bits for a 1 bit lock.

My final list looks like:

AtomicIncrement
AtomicAdd

AtomicDecrement
AtomicSubtract

AtomicClear
TestAndSet

AtomicLock
AtomicUnlock

Currently each of the arithmetic operations comes in a pre and post
fetch version and in 32 and 64 bit versions for a total of 4 of each
operation. It looks to me like the two lists aren’t that different.
The key differences are the result of:

  1. the general lack of exchange operations,
  2. my decision to include spin locks in the library,
  3. my desire to do atomic operations on pointers,
  4. and support for 64 bit applications.On Fri, Nov 20, 2009 at 6:08 PM, Bob Pendleton <@Bob_Pendleton> wrote:

On Wed, Nov 18, 2009 at 11:38 PM, Sam Lantinga wrote:

Bob Pendleton

On Wed, Nov 18, 2009 at 7:55 PM, nfries88 wrote:

It seems like the only atomic operations you really need to implement are:

AtomicIncrement
AtomicDecremen
AtomicExchange
AtomicCompareExchange
TestAndSet

I can’t really see any use in anything else for most applications, anyway. I
probably should have mentioned this when you first began seeking help with
SDL_atomic, but I had figured you had a solid plan for it at the time.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


? ? ? ?-Sam Lantinga, Founder and President, Galaxy Gameworks LLC


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------


±----------------------------------------------------------

Bob wrote:

Yeah, Bob, I’m curious how often the other variants would actually be used?

That is a fair question. And, it has been long enough since I looked
at it that I can look at my code with fresh (or at least forgetful)
eyes.

Here is the list that nfries88 suggested:

AtomicIncrement
AtomicDecrement
AtomicExchange
AtomicCompareExchange
TestAndSet

The first thing you have to do when comparing this list is to kick out
the exchange operations because I found they were not widely
supported. Not that many architecture/OS combos seem to have an atomic
exchange. If you have atomicCompareExchange you don’t need testAndSet.
Because the exchange operations seem so rare I didn’t feel good about
including them in the list. Without an exchange you have to add an
atomic clear to clear values set by TestAndSet. That gives us this
list:

AtomicIncrement
AtomicDecrement
AtomicClear
TestAndSet

That is actually pretty close to what I was originally going to put in
SDL_atomic.h. :slight_smile:

Fair enough. I wasn’t aware that support for atomic exchange was a rarity, but then again I mostly develop for Windows so what do I know about other systems?

Here is the list of the basic ops from SDL_atomic.c. classes of systems

SDL_AtomicTestThenSet
SDL_AtomicClear

SDL_AtomicFetchThenIncrement
SDL_AtomicFetchThenDecrement

SDL_AtomicIncrementThenFetch
SDL_AtomicDecrementThenFetch

SDL_AtomicFetchThenAdd
SDL_AtomicFetchThenSubtract

SDL_AtomicAddThenFetch
SDL_AtomicSubtractThenFetch

SDL_AtomicLock
SDL_AtomicUnlock

There are two of each of these, one ending in 32 and one ending in 64
because I wanted to be able to do atomic operations on 32 and 64 bit
values. Without 64 bit support you can’t do atomic operations on
pointers or reference counts. I expect both sets to be used for a long
time. I’m focused on 64 bit machines, but even I want some code to run
on both 32 and 64 bit architectures. The original lists included
operations on 8 and 16 bit values. But, after doing a lot of research
I found that there aren’t many architecture/OS combos that support all
those. Even the 64 bit ops are not available on all 32 bit
architectures.

This seems like a pretty good list. 64-bit ops can be simulated on 32-bit architectures not supporting them easily enough.
Why wouldn’t systems have all of those basic ops, though?

BTW, I considered using size_t instead of explicit 32 and 64 bit
values. Doing that would get rid of half of my list. AFAIK SDL doesn’t
have a type equivalent to size_t, but it does use size_t. What do
folks think about making the atomic operations take a size_t? Is there
any OS where size_t is not 64 bits when a pointer is 64 bits?

The C standard requires that size_t be the size of a pointer. That said, I’m sure there’s platforms where it’s always defined as a 32-bit integer…
I would not be opposed to that. I would personally prefer explicit 32 and 64 bit operations, though.

Nfries88 didn’t say if his idea of atomic increment and decrement
operations include a fetch as part of the atomic operations. I assume
he did because those operations are pretty much useless with out an
included fetch.
[…]
Why both pre and post fetch versions? I almost didn’t do that.

Obviously, a fetch should be included. I didn’t mention it because, like you said, they’d be pretty much useless without one.
I’ve also never seen system-defined functions for these that didn’t include a fetch (although, as I said, my experience is mostly with Windows).
In fact, the Windows SDK doesn’t provide both pre-fetch and post-fetch versions of their operations, and MSDN doesn’t state which one that the operations use (so I’m not even certain).
I also can’t see many developers missing pre-fetch too much. But, I can’t say it’s a bad idea to include them either.

That brings us to the add and subtract operations. When working on
pointers, you rarely want to change them by just one. You usually want
to change them by the SIZEOF(some type). That means that if you want
to do atomic operations on pointers you want add and subtract, not
just increment and decrement. (Not that C automatically increments and
decrements by the SIZEOF the type of the pointer.) Increment and
decrement are fine for counters, but not for pointers. Ok, so why have
both addition subtraction and increment, decrement? Isn’t atomicAdd(1)
the same as increment? Yep, it is and so is atomicSubtract(-1). But,
it seems that increment and decrement are some times faster (no
argument to mess with) than add and subtract. Also, it seems that more
systems directly implement atomic increment and decrement than
directly implement atomic add and subtract. If add and subtract are
more likely to be slow and/or emulated, then you don’t want them to be
the only atomic arithmetic operations you provide. That is why I have
add and subtract as well as increment and decrement. Add and subtract
were added to support atomic operations on pointers.

I think that’s a good call. I’ve never actually done that type of thing before, so it hadn’t come to mind when I made my list.
I could see its utility, though.

I also included a spin lock. I did that because so many of the
algorithms I found were based on spin locks and I wanted to allow an
implementation of the library to provide the most efficient
implementation of a spin lock that was possible on that platform.
Systems have have and atomic exchange can use it to implement the spin
lock and testAndSet. Otherwise programmers all over the world would be
implementing their own spin locks using testAndSet that is implemented
using exchange. Thinking about wasting that many cycles and person
years hurt. Also, by adding spin lock to the library I was able to
keep the size of a lock variable set at 32 bits so there was no
possibility of forcing the world to use 64 bits for a 1 bit lock.

This was a bit unnecessary, and a big reason why I included CMPXCHG in the list.
However, I can’t argue your reasons for adding it (I didn’t know that systems implemented testAndSet over CMPXCHG).

Currently each of the arithmetic operations comes in a pre and post
fetch version and in 32 and 64 bit versions for a total of 4 of each
operation. It looks to me like the two lists aren’t that different.
The key differences are the result of:

  1. the general lack of exchange operations,
  2. my decision to include spin locks in the library,
  3. my desire to do atomic operations on pointers,
  4. and support for 64 bit applications.

It seems like a good list. I do still think prefetch operations would not be missed, though.
Sorry I couldn’t be more help with this, and made you waste all that time typing.> On Wed, Nov 18, 2009 at 11:38 PM, Sam Lantinga wrote:

Bob wrote:

Quote:

Yeah, Bob, I’m curious how often the other variants would actually be used?

That is a fair question. And, it has been long enough since I looked
at it that I can look at my code with fresh (or at least forgetful)
eyes.

Here is the list that nfries88 suggested:

AtomicIncrement
AtomicDecrement
AtomicExchange
AtomicCompareExchange
TestAndSet

The first thing you have to do when comparing this list is to kick out
the exchange operations because I found they were not widely
supported. Not that many architecture/OS combos seem to have an atomic
exchange. If you have atomicCompareExchange you don’t need testAndSet.
Because the exchange operations seem so rare I didn’t feel good about
including them in the list. Without an exchange you have to add an
atomic clear to clear values set by TestAndSet. That gives us this
list:

AtomicIncrement
AtomicDecrement
AtomicClear
TestAndSet

That is actually pretty close to what I was originally going to put in
SDL_atomic.h. [image: Smile]

Fair enough. I wasn’t aware that support for atomic exchange was a rarity,
but then again I mostly develop for Windows so what do I know about other
systems?

It came as a surprise to me too. It seems like the great basic atomic
operation. But, it was missing in a lot of places I looked. One reference I
found said that TestAndSet was found on most architectures, Atomic Exchange
was found on x86, and the swap and compare operations were from the Motorola
68000.

I first saw testAndSet on a UNIVAC 1108a built in the late '60s. The
difference between the 1108 and the 1108a was that the “a” model supported
multiple CPUs and multiple IOUs and added the testAndSet instruction to
support building mutexes. I first ran into atomic exchange on the 8086 which
came along at least 10 years later. The 8086 added a “lock” prefix byte that
turned a lot of instructions into atomic operations. But that was at least
10 years later.

You know, if I had to guess, I would guess that Intel had a patent on
something to do with atomic exchange so everyone else either used testAndSet
or invented new ones like atomic compare and exchange. OTOH, it could just
be that no one else wanted to spend that much silicon on adding more than
the minimum number of useful instructions. RISC has it good points.

Anyone really know?

Quote:

Here is the list of the basic ops from SDL_atomic.c. classes of systems

SDL_AtomicTestThenSet
SDL_AtomicClear

SDL_AtomicFetchThenIncrement
SDL_AtomicFetchThenDecrement

SDL_AtomicIncrementThenFetch
SDL_AtomicDecrementThenFetch

SDL_AtomicFetchThenAdd
SDL_AtomicFetchThenSubtract

SDL_AtomicAddThenFetch
SDL_AtomicSubtractThenFetch

SDL_AtomicLock
SDL_AtomicUnlock

There are two of each of these, one ending in 32 and one ending in 64
because I wanted to be able to do atomic operations on 32 and 64 bit
values. Without 64 bit support you can’t do atomic operations on
pointers or reference counts. I expect both sets to be used for a long
time. I’m focused on 64 bit machines, but even I want some code to run
on both 32 and 64 bit architectures. The original lists included
operations on 8 and 16 bit values. But, after doing a lot of research
I found that there aren’t many architecture/OS combos that support all
those. Even the 64 bit ops are not available on all 32 bit
architectures.

This seems like a pretty good list. 64-bit ops can be simulated on 32-bit
architectures not supporting them easily enough.
Why wouldn’t systems have all of those basic ops, though?

Economics and architecture. On word oriented architectures it is costly to
add hardware that allows byte oriented operations. That is why a lot of
machines don’t support atomic operations on 8 and 16 bit fields.

Then look at what you have to do to implement these instructions. Take a
look at testandset. The original form is very simple. It just grabs the
contents of and address, writes a 1 to it, and returns the original value to
the CPU. It is so simple you could build it into the memory hardware as a
variation of the read operation. On the old core memory machines testAndSet
originated on, reading memory erased what you read so all the memory
controllers would do a read followed by writing back the original value.
Making the write set the memory to 1 would be pretty simple. OTOH, exchange,
increment, swap and compare, and all the others require that the contents of
memory reach the CPU, but processed, and the be written back to memory. That
means the whole memory system of the machine must be locked for the use of a
single CPU during a full read/modify/write sequence. RISC machines didn’t
even have instructions like increment that require a single instruction to
perform a read/modify/write. They had load instructions, store instructions,
and modified values in registers. Pretty much everything out there that
isn’t an x86 architecture is derived for a RISC architecture.

Quote:

BTW, I considered using size_t instead of explicit 32 and 64 bit
values. Doing that would get rid of half of my list. AFAIK SDL doesn’t
have a type equivalent to size_t, but it does use size_t. What do
folks think about making the atomic operations take a size_t? Is there
any OS where size_t is not 64 bits when a pointer is 64 bits?

The C standard requires that size_t be the size of a pointer. That said,
I’m sure there’s platforms where it’s always defined as a 32-bit integer…
I would not be opposed to that. I would personally prefer explicit 32 and
64 bit operations, though.

Yeah, me too.

Quote:

Nfries88 didn’t say if his idea of atomic increment and decrement
operations include a fetch as part of the atomic operations. I assume
he did because those operations are pretty much useless with out an
included fetch.
[…]

Why both pre and post fetch versions? I almost didn’t do that.

Obviously, a fetch should be included. I didn’t mention it because, like
you said, they’d be pretty much useless without one.
I’ve also never seen system-defined functions for these that didn’t include
a fetch (although, as I said, my experience is mostly with Windows).
In fact, the Windows SDK doesn’t provide both pre-fetch and post-fetch
versions of their operations, and MSDN doesn’t state which one that the
operations use (so I’m not even certain).

They don’t provide pre-fetch versions of increment and decrement. But, MS
does provide a pre-fetch version of addition so you can implement them if
you want to.

I also can’t see many developers missing pre-fetch too much. But, I can’t
say it’s a bad idea to include them either.

OTOH, QNX only provides pre-fetch operations. And it does implement
atomic operations like add that do not return a value. That blew my poor
little mind until I realized that QNX runs on many different
architectures. I suspect they know more about this than I do.

I also love the way GCC handles atomic operations. They have taken the Intel
Itanium atomic ops and defined a function name for each of them. They made
those function into builtins in GCC. They then warn you that not all of
these operations will be available on all machines. If they are not
available on your target machine they will generate a function call to an
unimplemented function… That way you can do ahead and write a replacement
if you want to. But, they only provide atomic ops that can be mapped
directly to the hardware. Interesting approach…

Quote:

That brings us to the add and subtract operations. When working on
pointers, you rarely want to change them by just one. You usually want
to change them by the SIZEOF(some type). That means that if you want
to do atomic operations on pointers you want add and subtract, not
just increment and decrement. (Not that C automatically increments and
decrements by the SIZEOF the type of the pointer.) Increment and
decrement are fine for counters, but not for pointers. Ok, so why have
both addition subtraction and increment, decrement? Isn’t atomicAdd(1)
the same as increment? Yep, it is and so is atomicSubtract(-1). But,
it seems that increment and decrement are some times faster (no
argument to mess with) than add and subtract. Also, it seems that more
systems directly implement atomic increment and decrement than
directly implement atomic add and subtract. If add and subtract are
more likely to be slow and/or emulated, then you don’t want them to be
the only atomic arithmetic operations you provide. That is why I have
add and subtract as well as increment and decrement. Add and subtract
were added to support atomic operations on pointers.

I think that’s a good call. I’ve never actually done that type of thing
before, so it hadn’t come to mind when I made my list.
I could see its utility, though.

Quote:

I also included a spin lock. I did that because so many of the
algorithms I found were based on spin locks and I wanted to allow an
implementation of the library to provide the most efficient
implementation of a spin lock that was possible on that platform.
Systems have have and atomic exchange can use it to implement the spin
lock and testAndSet. Otherwise programmers all over the world would be
implementing their own spin locks using testAndSet that is implemented
using exchange. Thinking about wasting that many cycles and person
years hurt. Also, by adding spin lock to the library I was able to
keep the size of a lock variable set at 32 bits so there was no
possibility of forcing the world to use 64 bits for a 1 bit lock.

This was a bit unnecessary, and a big reason why I included CMPXCHG in the
list.
However, I can’t argue your reasons for adding it (I didn’t know that
systems implemented testAndSet over CMPXCHG).

Quote:

Currently each of the arithmetic operations comes in a pre and post
fetch version and in 32 and 64 bit versions for a total of 4 of each
operation. It looks to me like the two lists aren’t that different.
The key differences are the result of:

  1. the general lack of exchange operations,
  2. my decision to include spin locks in the library,
  3. my desire to do atomic operations on pointers,
  4. and support for 64 bit applications.

It seems like a good list. I do still think prefetch operations would not
be missed, though.
Sorry I couldn’t be more help with this, and made you waste all that time
typing.

I do not mind. I appreciate the fact that you took the time to look at what
I did and thought about it. Hey, once you brought up the question even Sam
realized he wanted a better explanation of what I did. Now I have it.

Not to mention you forced me to point out places in the design that I have
questions about.

Anybody else want to comment or make a suggestion? Please do it. Reducing
the size of the API is a good goal.

Bob PendletonOn Sat, Nov 21, 2009 at 12:33 AM, nfries88 wrote:

On Wed, Nov 18, 2009 at 11:38 PM, Sam Lantinga <> wrote:


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------