I think the implementation of SDL_AtomicLock() should be smarter about when
it actually writes to memory, especially using instructions that can
generate inter-CPU bus traffic. Basically, the current solution floods the
memory bus with atomic operations which makes it scale extremely poorly. The
current implementation of?SDL_AtomicLock() is:
void
SDL_AtomicLock(SDL_SpinLock lock)
{
? ? / FIXME: Should we have an eventual timeout? */
? ? while (!SDL_AtomicTryLock(lock)) {
? ? ? ? SDL_Delay(0);
? ? }
}
I suggest this implementation:
diff -r 9d3baea5738a src/atomic/SDL_spinlock.c
— a/src/atomic/SDL_spinlock.c Fri Sep 09 10:45:48 2011 -0400
+++ b/src/atomic/SDL_spinlock.c Sat Sep 10 11:59:06 2011 -0500
@@ -90,10 +90,15 @@
?void
?SDL_AtomicLock(SDL_SpinLock *lock)
?{
- int canstop = 0;
? ? ?/* FIXME: Should we have an eventual timeout? */
- ? ?while (!SDL_AtomicTryLock(lock)) {
- ? ? ? ?SDL_Delay(0);
- ? ?}
- ? ?do {
- /* Only try to lock if there is a chance it can succeed */
- if(*lock == 0)
- canstop = SDL_AtomicTryLock(lock);
- else
- SDL_Delay(0);
- ? ?} while(!canstop);
?}
?void
Basically, this checks if the TryLock could succeed before trying it. I did
notice that compilers will optimize this to a simpler instruction sequence.
This is probably because SDL_Spinlock is a typedef for an int. It really
should be a typedef for a volatile int, since its value can change from
another thread without warning. Another possibility is to do:
diff -r 9d3baea5738a src/atomic/SDL_spinlock.c
— a/src/atomic/SDL_spinlock.c Fri Sep 09 10:45:48 2011 -0400
+++ b/src/atomic/SDL_spinlock.c Sat Sep 10 12:07:05 2011 -0500
@@ -90,10 +90,17 @@
?void
?SDL_AtomicLock(SDL_SpinLock *lock)
?{
- int canstop = 0;
- volatile SDL_SpinLock* slock = lock;
-
? ? ?/* FIXME: Should we have an eventual timeout? */
- ? ?while (!SDL_AtomicTryLock(lock)) {
- ? ? ? ?SDL_Delay(0);
- ? ?}
- ? ?do {
- /* Only try to lock if there is a chance it can succeed */
- if(*slock == 0)
- canstop = SDL_AtomicTryLock(lock);
- else
- SDL_Delay(0);
- ? ?} while(!canstop);
?}
?void
There are a couple of things going on that I think you have missed.
The call to SDL_Delay(0) forces the thread to go into the scheduler so
that the processor may be allocated to another thread. That call is
needed on single processor machines to allow the this function to ever
succeed. SDL_Delay(0) also takes a long time which means that no flood
of interprocessor communication is taking place. Also, spin locks
should only ever be used around operations like adding and removing
items from queues that are known to take very short amounts of time.
Spin locks are only appropriate when the odds of ever waiting on the
lock are very low.
Your version has the problem that it first reads the value and tests
it, and then does a test-and-set instruction so it reads it and tests
it twice. The value is a volatile so it is doing twice the work of
just doing the test-and-set instruction. Because it is a volatile
value it has to make sure that it is not in the cache and that no
other processor has a copy in its cache that is different from the one
in main memory. In other words, much the same work that has to be done
for a test-and-set instruction, but now it has to be done twice.
All in all I see no reason to change the implementation.
BTW, you do not achieve scalable interprocessor communication by doing
tiny tweaks to things like spin lock implementations. You get scalable
IPC by studying the 50+ years of literature on the subject and then
doing very careful design. Failure to do that is why the average
programmer can’t get two threads to cooperate in a program. OTOH,
programmers who do that are able go get code with 10s of thousands of
threads runs nicely on machines with 10s of thousands of cores every
day.
One point about scalability, when you use the word you need to define
what you mean by it. To some people it means going from 1 to 10 or
maybe 100. To others it means going from 10,000 to 100,000,000,0000.
Bob PendletonOn Sat, Sep 10, 2011 at 12:08 PM, Patrick Baggett <baggett.patrick at gmail.com> wrote:
I don’t feel like #2 is the best solution since it somewhat more of a
workaround the optimizer than working with it.
Patrick
SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org
–
±----------------------------------------------------------