Thread to Processor Pinning

Hi,

With the introduction of SDL_GetPerformanceCounter() it would make a lot
of sense to provide an SDL function that pins a thread to one core only.
Why? Because on Windows QueryPerformanceCounter() does adjust to
frequency changes in the CPU, but it does not account for switching
between cores where each core potentially has its own timing mechanism.
In fact I think that function is implemented by using the RDTSC
counter if possible and falls back internally to other means of timing.
The official documentation even recommends pinning the main thread to
one processor.

I think it would make sense to provide an SDL_GetProcessorCount()
function and an SDL_PinThread() function that can pin a thread to one of
the processors/cores.

This is what I am doing currently which binds the main thread to
processor 0 on windows:

 ULONG_PTR affinity_mask;
 ULONG_PTR process_affinity_mask;
 ULONG_PTR system_affinity_mask;

 if (!GetProcessAffinityMask(GetCurrentProcess(),
                             &process_affinity_mask,
                             &system_affinity_mask))
     return;

 // run on the first core
 affinity_mask = (ULONG_PTR)1 << 0;
 if (affinity_mask & process_affinity_mask)
     SetThreadAffinityMask(GetCurrentThread(), affinity_mask);

I guess pinning threads to processors is only useful on Windows in
general though as Linux / OS X have clocks that take the drift into account.

What’s your opinion on that? Am I missing something?

Regards,
Armin

Do you know of actual situations where this is necessary on current
configurations? I posted a request for more information along these lines,
but didn’t get any responses:
http://forums.libsdl.org/viewtopic.php?t=7110On Tue, Apr 5, 2011 at 1:31 AM, Armin Ronacher <armin.ronacher at active-4.com>wrote:

Hi,

With the introduction of SDL_GetPerformanceCounter() it would make a lot of
sense to provide an SDL function that pins a thread to one core only. Why?
Because on Windows QueryPerformanceCounter() does adjust to frequency
changes in the CPU, but it does not account for switching between cores
where each core potentially has its own timing mechanism. In fact I think
that function is implemented by using the RDTSC counter if possible and
falls back internally to other means of timing. The official documentation
even recommends pinning the main thread to one processor.

I think it would make sense to provide an SDL_GetProcessorCount() function
and an SDL_PinThread() function that can pin a thread to one of the
processors/cores.

This is what I am doing currently which binds the main thread to processor
0 on windows:

ULONG_PTR affinity_mask;
ULONG_PTR process_affinity_mask;
ULONG_PTR system_affinity_mask;

if (!GetProcessAffinityMask(GetCurrentProcess(),
&process_affinity_mask,
&system_affinity_mask))
return;

// run on the first core
affinity_mask = (ULONG_PTR)1 << 0;
if (affinity_mask & process_affinity_mask)
SetThreadAffinityMask(GetCurrentThread(), affinity_mask);

I guess pinning threads to processors is only useful on Windows in general
though as Linux / OS X have clocks that take the drift into account.

What’s your opinion on that? Am I missing something?

Regards,
Armin


SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org


-Sam Lantinga, Founder and CEO, Galaxy Gameworks

Hi,

Do you know of actual situations where this is necessary on current
configurations? I posted a request for more information along these
lines, but didn’t get any responses:
http://forums.libsdl.org/viewtopic.php?t=7110
So full disclaimer first: I am by no means an authority on this subject.
In fact I am terribly new to all this stuff; my interpretation of this
issue comes from a) my basic understanding of how RDTSC works and b)
reading the MSDN documentation of QueryPerformanceCounter() and a bunch
of other articles on the subject.

RDTSC itself counts the instructions executed per core. So by default
it will give you different values if your thread is executed by a
different core after a context switch. The second problem with RDTSC is
that you don’t know the frequency and that this value is unpredictable
pretty much. The solution for the latter problem (frequency changes) is
fixed by QueryPerformanceCounter() according to the documentation:

Retrieves the frequency of the high-resolution performance counter,
if one exists. The frequency cannot change while the system is
running. [MSDN about QueryPerformanceFrequency]

When reading the documentation about QueryPerformanceCounter it gives
this piece of advice:

On a multiprocessor computer, it should not matter which processor is
called. However, you can get different results on different
processors due to bugs in the basic input/output system (BIOS) or the
hardware abstraction layer (HAL). To specify processor affinity for a
thread, use the SetThreadAffinityMask function.

Now I suppose the reason why it talks about HAL/BIOS might be because
AMD CPUs were known to cause problems when power management features
kicked in. As such CPUID on these machines advertises fixed behaviour.

As far as I understand the issue, there is no reliable way to query
RDTSC on multi core machines properly unless you are fixed to a core.
Looking at the Linux sources it reveals that Linux takes great care
ensuring the value is properly and it seems to automatically fall back
from RDTSC if it can no longer trust the value.

Generally the issue seems to be complex, and for Intel machines alone
depending on the processor it seems to be somewhat impossible to
correctly use this value. (Intel 64 and IA-32 Architectures Software
Developer’s Manual Volume 3A 16.12)

From Microsoft’s “Game Timing and Multicore Processor” documentation1
comes the following recommendation about QueryPerformanceCounter():

When computing deltas, the values should be clamped to ensure that
any bugs in the timing values do not cause crashes or unstable
time-related computations. The clamp range should be from 0 (to
prevent negative delta values) to some reasonable value based on your
lowest expected framerate. Clamping is likely to be useful in any
debugging of your application, but be sure to keep it in mind if
doing performance analysis or running the game in some unoptimized
mode.
That should be obvious, but it was not obvious for me, so at the very
least I would recommend putting that into the documentation. It however
clearly shows that the counter might not be as reliable as intended.

This however is where it gets interesting:

Compute all timing on a single thread. Computation of timing on
multiple threads ? for example, with each thread associated with a
specific processor ? greatly reduces performance of multi-core
systems.

From reading this it seems to imply that Microsoft might internally try
to fix the RDTSC value if used by switching execution to a different
thread in the kernel if necessary. So it might just be a performance
hit not locking to a thread. From the point afterwards however:

Set that single thread to remain on a single processor […]
QueryPerformanceCounter and QueryPerformanceFrequency typically
adjust for multiple processors, bugs in the BIOS or drivers may
result in these routines returning different values as the thread
moves from one processor to another. So, it’s best to keep the thread
on a single processor.

Avery Lee (who I pretty much trust as an authority on windows details)
wrote in his blog about five years ago [2] the following:

When I tried Windows XP x64 Edition, the HAL used the CPU TSC for
QueryPerformanceCounter() without realizing that Cool & Quiet would
cause it to run at half normal speed.

5 years is a long time, but then also the most recent comments on there
are from 2009 and they are still mentioning the problem. Depending on
the kind of game it might not be unreasonable that Win XP and an Athlon
XP processor is still in use.

My personal conclusion: better safe then sorry and go by the microsoft
recommendation of pinning the thread. Even if the only advantage of
this would be less overhead of the time function invocation.

I hope that was somewhat useful.

Regards,
Armin

[2]: http://www.virtualdub.org/blog/pivot/entry.php?id=106On 4/6/11 7:10 PM, Sam Lantinga wrote: