Hi,
Do you know of actual situations where this is necessary on current
configurations? I posted a request for more information along these
lines, but didn’t get any responses:
SDL :: View topic - High resolution timing functions (windows testing needed)
So full disclaimer first: I am by no means an authority on this subject.
In fact I am terribly new to all this stuff; my interpretation of this
issue comes from a) my basic understanding of how RDTSC works and b)
reading the MSDN documentation of QueryPerformanceCounter() and a bunch
of other articles on the subject.
RDTSC itself counts the instructions executed per core. So by default
it will give you different values if your thread is executed by a
different core after a context switch. The second problem with RDTSC is
that you don’t know the frequency and that this value is unpredictable
pretty much. The solution for the latter problem (frequency changes) is
fixed by QueryPerformanceCounter() according to the documentation:
Retrieves the frequency of the high-resolution performance counter,
if one exists. The frequency cannot change while the system is
running. [MSDN about QueryPerformanceFrequency]
When reading the documentation about QueryPerformanceCounter it gives
this piece of advice:
On a multiprocessor computer, it should not matter which processor is
called. However, you can get different results on different
processors due to bugs in the basic input/output system (BIOS) or the
hardware abstraction layer (HAL). To specify processor affinity for a
thread, use the SetThreadAffinityMask function.
Now I suppose the reason why it talks about HAL/BIOS might be because
AMD CPUs were known to cause problems when power management features
kicked in. As such CPUID on these machines advertises fixed behaviour.
As far as I understand the issue, there is no reliable way to query
RDTSC on multi core machines properly unless you are fixed to a core.
Looking at the Linux sources it reveals that Linux takes great care
ensuring the value is properly and it seems to automatically fall back
from RDTSC if it can no longer trust the value.
Generally the issue seems to be complex, and for Intel machines alone
depending on the processor it seems to be somewhat impossible to
correctly use this value. (Intel 64 and IA-32 Architectures Software
Developer’s Manual Volume 3A 16.12)
From Microsoft’s “Game Timing and Multicore Processor” documentation1
comes the following recommendation about QueryPerformanceCounter():
When computing deltas, the values should be clamped to ensure that
any bugs in the timing values do not cause crashes or unstable
time-related computations. The clamp range should be from 0 (to
prevent negative delta values) to some reasonable value based on your
lowest expected framerate. Clamping is likely to be useful in any
debugging of your application, but be sure to keep it in mind if
doing performance analysis or running the game in some unoptimized
mode.
That should be obvious, but it was not obvious for me, so at the very
least I would recommend putting that into the documentation. It however
clearly shows that the counter might not be as reliable as intended.
This however is where it gets interesting:
Compute all timing on a single thread. Computation of timing on
multiple threads ? for example, with each thread associated with a
specific processor ? greatly reduces performance of multi-core
systems.
From reading this it seems to imply that Microsoft might internally try
to fix the RDTSC value if used by switching execution to a different
thread in the kernel if necessary. So it might just be a performance
hit not locking to a thread. From the point afterwards however:
Set that single thread to remain on a single processor […]
QueryPerformanceCounter and QueryPerformanceFrequency typically
adjust for multiple processors, bugs in the BIOS or drivers may
result in these routines returning different values as the thread
moves from one processor to another. So, it’s best to keep the thread
on a single processor.
Avery Lee (who I pretty much trust as an authority on windows details)
wrote in his blog about five years ago [2] the following:
When I tried Windows XP x64 Edition, the HAL used the CPU TSC for
QueryPerformanceCounter() without realizing that Cool & Quiet would
cause it to run at half normal speed.
5 years is a long time, but then also the most recent comments on there
are from 2009 and they are still mentioning the problem. Depending on
the kind of game it might not be unreasonable that Win XP and an Athlon
XP processor is still in use.
My personal conclusion: better safe then sorry and go by the microsoft
recommendation of pinning the thread. Even if the only advantage of
this would be less overhead of the time function invocation.
I hope that was somewhat useful.
Regards,
Armin
[2]: Beware of QueryPerformanceCounter() - virtualdub.org 4/6/11 7:10 PM, Sam Lantinga wrote: