I wrote an article on how to measure the duration of specific operations and their number per second in floating-point form and to be able to refresh the state of such counters any number of times per second. The article is in Polish and can be found here — Implementacja precyzyjnych liczników wydajności gry. Below is the English translation. I’m not a C guy, so the code examples are an imitation of C, treat them as pseudocode.
1. Counter of the average duration of a given operation
Let’s assume that we want to regularly measure the average execution time of a given operation, e.g. the duration of a single game logic update, and obtain the number of milliseconds, with a decimal part, as a result. The state of this counter can be refreshed any number of times per second, even after each logic update. Additionally, the result should be “smoothed” so that micro-fluctuations do not change its value too rapidly (this would reduce its readability).
1.1. Measurement of operation duration
For accurate measurement, use the clock with the highest frequency available on your platform. Milliseconds are not enough, so let’s use a hardware clock. The SDL library allows you to read this counter using the SDL_GetPerformanceCounter function, and its frequency using the SDL_GetPerformanceFrequency function. These are wrappers for the Windows functions QueryPerformanceCounter and QueryPerformanceFrequency.
The frequency of the Windows system clock (scheduler) is relatively low, maximum 1000
— i.e. every millisecond. However, the high-resolution counter, even on my potato laptop (it’s 10 years old), has a frequency of 10000000
(ten million), which means it changes its state every 100
nanoseconds. The difference is gigantic.
To measure the duration of an operation, take the reading of this counter just before and after the operation, and then count the difference in samples:
int64 ticks_start; // time just before the operation is performed, in ticks
int64 ticks_sample; // duration of the operation, in ticks
ticks_start = SDL_GetPerformanceCounter();
update_logic();
ticks_sample = SDL_GetPerformanceCounter() - ticks_start;
1.2. Samples buffer
To meet the requirements for our counter, samples should be collected in a buffer and their values counted in the accumulator. The sample collection should be something like a circular buffer, i.e. of a fixed size, initially zeroed. You add a sample and increment the index for the next one, and if the index goes beyond the buffer area, it is set to 0. In this way, you can easily overwrite the oldest sample with a new one, and so on. You don’t need any sophisticated data types to implement such a buffer — just a simple fixed-size array and a variable with an index for the next sample.
To refresh the counter and not have to sum the samples each time in the loop, you can use an auxiliary accumulator variable, which is also initially zeroed. Each time a new sample needs to be added to the buffer, the value of the oldest sample is first subtracted from the accumulator, the value of the new one is added to it, and finally the oldest sample in the buffer is overwritten with the newly acquired one. In this way, the accumulator always stores the current sum of all samples, bypassing the need for loop summing.
Sample pseudocode for getting and collecting samples:
const int32 SAMPLE_NUM = 8; // samples array size
int64 ticks_start; // time just before the operation is performed, in ticks
int64 ticks_sample; // duration of the operation, in ticks
int64[SAMPLE_NUM] samples = {0}; // array of samples
int64 sample_index = 0; // index of the oldest sample, to be overwritten with the new one
int64 sample_accu = 0; // accumulator with the sum of all samples
// calculation of the logic update duration, in ticks
ticks_start = SDL_GetPerformanceCounter();
update_logic();
ticks_sample = SDL_GetPerformanceCounter() - ticks_start;
// adding a new sample to the buffer
sample_accu -= samples[sample_index]; // subtract the value of the oldest sample from the accumulator
sample_accu += ticks_sample; // add a new sample to the accumulator
samples[sample_index] = ticks_sample; // overwrite the oldest sample in the array with the new one
sample_index += 1; // increase the index of the oldest sample by 1
if (sample_index == SAMPLE_NUM) // if the sample index exceeds array bound
sample_index = 0; // set it to the first cell of the array
In the example above, I used an eight-sample buffer, which gives quite good smoothing. It should be noted here that the more samples, the less visible the fluctuations will be and our counter will seem more lazy.
1.3. Calculation of the new counter state
With the samples collected in the buffer and their sum in the accumulator, you need to calculate the result — the average execution time of the logic update. When and how often our counter will be refreshed depends on your preferences. The only important thing is not to refresh it more often than after each execution of a given operation (here: logic update), because its state will not change.
We are interested in the average time in milliseconds (with a decimal part), so the calculations should look like this:
float = time_per_update; // average operation execution time, in milliseconds
time_per_update = sample_accu / SAMPLE_NUM / SDL_GetPerformanceFrequency() * 1000.0f;
That’s it, now you can display the value of the time_per_update
variable on the screen, with any rounding. In my engine, I use such a counter to precisely calculate the average time of a single update of the game logic and rendering a single frame.
2. Counter of the number of operations performed per second
Here the situation is slightly different. The simplest way is to count the operations performed in the form of a simple variable — after each operation, the counter is incremented by 1
. Once a second, the counter’s status is displayed on the screen and reset to zero. Simple implementation, but has drawbacks — the result is an int and its value changes once per second (not more often). To be able to update our counter more often than every second and also have a floating-point result, we need to collect samples again.
2.1. Measurement
We do not count the operations performed by simply incrementing an accumulator, because it will not help. We need to rely on timestamps, milliseconds are enough because they will provide sufficient precision. The SDL_GetTicks64 function or the Windows GetTickCount64 function is used for this purpose. Timestamp should be retrieved right after performing a given operation:
uint64 timestamp; // sample with time after the operation, in milliseconds
update_logic();
time_sample = SDL_GetTicks64();
2.2. Samples buffer
In this case, the sample buffer can also be an ordinary array, but its size may change dynamically while the game is running, so in order not to declare large arrays on the stack (with a spare space), it is better to use a list (depending on what you have available in your language and its RTL). This time, old samples are not overwritten, but a new one is simply added to the end of the list, after each operation (here: logic update).
Pseudocode:
list_of_uint64 samples; // uint64 sample list (hypothetical data type)
uint64 timestamp; // sample with time after the operation, in milliseconds
// performing the operation
update_logic();
// getting the end time of the operation and adding the sample to the end of the list
time_sample = SDL_GetTicks64();
samples.append(time_sample);
2.3. Calculation of the new counter state
The list contains samples, their number is small at the start of the game and increases over time. To be able to calculate a new counter value (at any moment), you must first determine the sample interval closest to the second — after all, we are interested in how many times the operation was performed per second. The difference between the first and last samples in the list will initially be less than a second, but after the game has been running for a while, the difference will be greater than a second and this will continue until the end of the session.
We are interested in the freshest second period, so we need to determine the range of samples of this freshest second. This range must apply to samples counted from the end of the list towards the beginning. If the difference between the first and last samples is greater than a second, there are old samples at the beginning of the list and should be removed from the list. This can be done before calculating the result, but also after — it depends on what is more convenient for you.
To calculate the result, first check whether there are already any samples — if not, we do nothing (the value of our counter remains zero by default). However, if there are already some samples, we must determine their range giving at least a second difference. After determining the range, the difference of the two samples gives the final period (a second or slightly more), while the difference of their indices gives the number of operations performed per second, in the form of an integer. To obtain the result in floating-point form, simple calculations are enough. These two pieces of information are required for this calculation — the number of milliseconds between samples and the number of samples.
Pseudocode:
// only calculate the result if there are already some samples
if (samples.count > 0)
{
// auxiliary indexes of samples in the list, to determine their second interval
int32 index_first = samples.count - 1;
int32 index_last = samples.count - 1;
// while the sample difference is less than 1000 milliseconds, move the start
// index of the sample range towards the top of the list until the seconds range
// is found or until the first sample is reached
while (index_first > 0 && samples[index_last] - samples[index_first] < 1000)
index_first -= 1;
// sample difference must be greater than 0 (because it is a divisor) and if so, calculate
// the number of operations performed per second, in floating point form
if (samples[index_last] - samples[index_last] > 0)
updates_num = 1000.0f / (samples[index_last] - samples[index_first]) * (index_last - index_first);
// if there are old samples at the beginning (before "index_first"), remove them
if (index_first > 0)
samples.delete_range(0, index_first - 1);
}
The above pseudocode uses sample indexes in the list, but if the sample buffer cannot be handled like arrays (as in my engine), and gives access to pointers to the first and last item, you can easily use pointers and their arithmetic instead of indexes. Iterating through a buffer as a continuous block of memory and using pointers gives higher performance, while a pointer is an index on steroids — not only does it directly point to the memory of an item, but pointer arithmetic also allows you to quickly calculate the number of items between these pointers.
In my engine, the above pseudocode looks like below (Free Pascal), it is an implementation using pointers and their arithmetic (after simplification, including shortening long variable names and without comments):
var
UpdateNum: TGame_Float32 = 0.0;
var
SamplesUpdate: TGame_List;
SampleFirst: PGame_UInt64;
SampleCurrent: PGame_UInt64;
SampleLast: PGame_UInt64;
{..}
if Game_ListReveal(@SamplesUpdate, @SampleFirst, @SampleLast) then
begin
SampleCurrent := SampleLast;
while (SampleCurrent > SampleFirst) and (SampleLast^ - SampleCurrent^ < 1000) do
SampleCurrent -= 1;
if SampleLast^ - SampleCurrent^ > 0 then
UpdateNum := 1000.0 / (SampleLast^ - SampleCurrent^) * (SampleLast - SampleCurrent);
if SampleCurrent > SampleFirst then
Game_ListDeleteRange(@SamplesUpdate, 0, SampleCurrent - SampleFirst);
end;
The above implementation gives an effect almost exactly the same as FRAPS, with the difference that the calculated framerate is in floating-point form, which can be rounded as desired.