SDL_AtomicLock() traffic

I think the implementation of SDL_AtomicLock() should be smarter about when
it actually writes to memory, especially using instructions that can
generate inter-CPU bus traffic. Basically, the current solution floods the
memory bus with atomic operations which makes it scale extremely poorly. The
current implementation of SDL_AtomicLock() is:

void
SDL_AtomicLock(SDL_SpinLock lock)
{
/
FIXME: Should we have an eventual timeout? */
while (!SDL_AtomicTryLock(lock)) {
SDL_Delay(0);
}
}

I suggest this implementation:

diff -r 9d3baea5738a src/atomic/SDL_spinlock.c
— a/src/atomic/SDL_spinlock.c Fri Sep 09 10:45:48 2011 -0400
+++ b/src/atomic/SDL_spinlock.c Sat Sep 10 11:59:06 2011 -0500
@@ -90,10 +90,15 @@
void
SDL_AtomicLock(SDL_SpinLock *lock)
{

  • int canstop = 0;
    /* FIXME: Should we have an eventual timeout? */
  • while (!SDL_AtomicTryLock(lock)) {
  •    SDL_Delay(0);
    
  • }
  • do {
  • /* Only try to lock if there is a chance it can succeed */
  • if(*lock == 0)
  • canstop = SDL_AtomicTryLock(lock);
  • else
  • SDL_Delay(0);
  • } while(!canstop);
    }

void

Basically, this checks if the TryLock could succeed before trying it. I did
notice that compilers will optimize this to a simpler instruction sequence.
This is probably because SDL_Spinlock is a typedef for an int. It really
should be a typedef for a volatile int, since its value can change from
another thread without warning. Another possibility is to do:

diff -r 9d3baea5738a src/atomic/SDL_spinlock.c
— a/src/atomic/SDL_spinlock.c Fri Sep 09 10:45:48 2011 -0400
+++ b/src/atomic/SDL_spinlock.c Sat Sep 10 12:07:05 2011 -0500
@@ -90,10 +90,17 @@
void
SDL_AtomicLock(SDL_SpinLock *lock)
{

  • int canstop = 0;
  • volatile SDL_SpinLock* slock = lock;+
    /* FIXME: Should we have an eventual timeout? */
  • while (!SDL_AtomicTryLock(lock)) {
  •    SDL_Delay(0);
    
  • }
  • do {
  • /* Only try to lock if there is a chance it can succeed */
  • if(*slock == 0)
  • canstop = SDL_AtomicTryLock(lock);
  • else
  • SDL_Delay(0);
  • } while(!canstop);
    }

void


I don’t feel like #2 is the best solution since it somewhat more of a
workaround the optimizer than working with it.

Patrick

+/* Only try to lock if there is a chance it can succeed */
+if(*lock == 0)
+canstop = SDL_AtomicTryLock(lock);

Doesn’t this defeat the purpose of an atomic operation?

–ryan.

I think the implementation of SDL_AtomicLock() should be smarter about when
it actually writes to memory, especially using instructions that can
generate inter-CPU bus traffic. Basically, the current solution floods the
memory bus with atomic operations which makes it scale extremely poorly. The
current implementation of?SDL_AtomicLock() is:
void
SDL_AtomicLock(SDL_SpinLock lock)
{
? ? /
FIXME: Should we have an eventual timeout? */
? ? while (!SDL_AtomicTryLock(lock)) {
? ? ? ? SDL_Delay(0);
? ? }
}
I suggest this implementation:
diff -r 9d3baea5738a src/atomic/SDL_spinlock.c
— a/src/atomic/SDL_spinlock.c Fri Sep 09 10:45:48 2011 -0400
+++ b/src/atomic/SDL_spinlock.c Sat Sep 10 11:59:06 2011 -0500
@@ -90,10 +90,15 @@
?void
?SDL_AtomicLock(SDL_SpinLock *lock)
?{

  • int canstop = 0;
    ? ? ?/* FIXME: Should we have an eventual timeout? */
  • ? ?while (!SDL_AtomicTryLock(lock)) {
  • ? ? ? ?SDL_Delay(0);
  • ? ?}
  • ? ?do {
  • /* Only try to lock if there is a chance it can succeed */
  • if(*lock == 0)
  • canstop = SDL_AtomicTryLock(lock);
  • else
  • SDL_Delay(0);
  • ? ?} while(!canstop);
    ?}

?void

Basically, this checks if the TryLock could succeed before trying it. I did
notice that compilers will optimize this to a simpler instruction sequence.
This is probably because SDL_Spinlock is a typedef for an int. It really
should be a typedef for a volatile int, since its value can change from
another thread without warning. Another possibility is to do:

diff -r 9d3baea5738a src/atomic/SDL_spinlock.c
— a/src/atomic/SDL_spinlock.c Fri Sep 09 10:45:48 2011 -0400
+++ b/src/atomic/SDL_spinlock.c Sat Sep 10 12:07:05 2011 -0500
@@ -90,10 +90,17 @@
?void
?SDL_AtomicLock(SDL_SpinLock *lock)
?{

  • int canstop = 0;
  • volatile SDL_SpinLock* slock = lock;

? ? ?/* FIXME: Should we have an eventual timeout? */

  • ? ?while (!SDL_AtomicTryLock(lock)) {
  • ? ? ? ?SDL_Delay(0);
  • ? ?}
  • ? ?do {
  • /* Only try to lock if there is a chance it can succeed */
  • if(*slock == 0)
  • canstop = SDL_AtomicTryLock(lock);
  • else
  • SDL_Delay(0);
  • ? ?} while(!canstop);
    ?}

?void

There are a couple of things going on that I think you have missed.
The call to SDL_Delay(0) forces the thread to go into the scheduler so
that the processor may be allocated to another thread. That call is
needed on single processor machines to allow the this function to ever
succeed. SDL_Delay(0) also takes a long time which means that no flood
of interprocessor communication is taking place. Also, spin locks
should only ever be used around operations like adding and removing
items from queues that are known to take very short amounts of time.
Spin locks are only appropriate when the odds of ever waiting on the
lock are very low.

Your version has the problem that it first reads the value and tests
it, and then does a test-and-set instruction so it reads it and tests
it twice. The value is a volatile so it is doing twice the work of
just doing the test-and-set instruction. Because it is a volatile
value it has to make sure that it is not in the cache and that no
other processor has a copy in its cache that is different from the one
in main memory. In other words, much the same work that has to be done
for a test-and-set instruction, but now it has to be done twice.

All in all I see no reason to change the implementation.

BTW, you do not achieve scalable interprocessor communication by doing
tiny tweaks to things like spin lock implementations. You get scalable
IPC by studying the 50+ years of literature on the subject and then
doing very careful design. Failure to do that is why the average
programmer can’t get two threads to cooperate in a program. OTOH,
programmers who do that are able go get code with 10s of thousands of
threads runs nicely on machines with 10s of thousands of cores every
day.

One point about scalability, when you use the word you need to define
what you mean by it. To some people it means going from 1 to 10 or
maybe 100. To others it means going from 10,000 to 100,000,000,0000.

Bob PendletonOn Sat, Sep 10, 2011 at 12:08 PM, Patrick Baggett <baggett.patrick at gmail.com> wrote:


I don’t feel like #2 is the best solution since it somewhat more of a
workaround the optimizer than working with it.
Patrick


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------

+/* Only try to lock if there is a chance it can succeed */
+if(*lock == 0)
+canstop = SDL_AtomicTryLock(lock);

Doesn’t this defeat the purpose of an atomic operation?

It wastes the if and the fetch and test of the lock value. Because the
value can change between the if and atomic operation there is no point
at all in having the if.

Odd, I forgot to mention that in my long winded reply to the original
poster. I got in a lot of other reasons for not wanting to change it.
But, for got the fact that it is pointless.

Bob PendletonOn Sun, Sep 11, 2011 at 12:41 AM, Ryan C. Gordon wrote:

–ryan.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------

+/* Only try to lock if there is a chance it can succeed */
+if(*lock == 0)
+canstop = SDL_AtomicTryLock(lock);

Doesn’t this defeat the purpose of an atomic operation?

It wastes the if and the fetch and test of the lock value. Because the
value can change between the if and atomic operation there is no point
at all in having the if.

There is a good reason for what I wrote, and I’m sorry if I didn’t explain
it well. Testing the value for zero is a memory read, yes. However, if the
value is not invalidated in the CPUs cache, it will return the same value,
so in effect, it doesn’t really cause memory reads. Atomic operations must.

Reading a word (as in machine word) from a word aligned address is always
atomic on every architecture, that is, you’ll never get 1/2 of the old
value, 1/2 of the new value. It is always old or new. I know it may seem
weird to read from a potentially shared area, but look at the actual
results:

Case 1: Value shows up as 0, really is 0 (no other modifications)
Try atomic operation of exchange -> success

Case 2: Value shows up as 0, really is 1 (another CPU just modified it)
Try atomic operation of exchange -> failure, will retry later

If you repeatedly do atomic operations, you send messages to all the other
CPUs to invalidate their caches if they contain that memory location. This
traffic is what this patch seeks to avoid. The problem with the current
implementation is that it keeps generating all of this traffic when it
cannot succeed. I’m sure this isn’t the most convincing argument, but even
Wikipedia http://en.wikipedia.org/wiki/Spinlock mentions this as a common
problem. If you search for literature on spinlocks, one of the newbie
mistakes is to loop on atomic operations without even testing it first. I’ll
dig up links if necessary.

I’m not saying that spinlocks are the way to increase scalability, but
spamming inter-CPU bus traffic will not help, no matter what.On Sun, Sep 11, 2011 at 1:00 AM, Bob Pendleton wrote:

On Sun, Sep 11, 2011 at 12:41 AM, Ryan C. Gordon wrote:

Odd, I forgot to mention that in my long winded reply to the original
poster. I got in a lot of other reasons for not wanting to change it.
But, for got the fact that it is pointless.

Bob Pendleton

–ryan.


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


±----------------------------------------------------------


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Hi Paul,

I think it’s called TATS?

What would Linus do? http://www.cs.fsu.edu/~xyuan/cop5611/spinlock.html

So you seem to be technically correct. Is there a test case for
SDL_AtomicLock ?

I think it’s called TATS?
http://en.wikipedia.org/wiki/Test_and_Test-and-set

What would Linus do? http://www.cs.fsu.edu/~xyuan/cop5611/spinlock.html

Wow, I learned something new today.

I’ll take a closer look at the patch, then.

Bob: any counter arguments?

–ryan.