Syncing with the monitor retrace

Sik_the_hedgehog · May 3, 2013, 4:43pm

SDL_Delay(0) will give you the minimum delay (it literally means
"switch away from this task and return as soon as possible").
Polling the events should be doing that already.

2013/5/3, Frederik vom Hofe <frederik.hofe at gmail.com>:>

jeroen clarysse wrote:

Now imagine an event occurs in the event thread at time 13, so between the
2nd and 3rd swap. This event will update the bitmaps_to_be_drawn array and
raise the must_redraw boolean. My problem is : WHEN WILL THE NEW SCREN BE
VISIBLE ?

Obviously, I want the update to become visible at the next retrace : at
time 20. However, if I understand correctly, this will NOT BE THE CASE :
at time 20, the main thread will wake up from the sleep it went into after
the swap at time 10 !!! So at time 20, it will update the backbuffer and
call Swap(vsync=true), which means that it will go back to sleep and
update the monitor only at time 30 !!!

am I correct ?

what do you propose as a solution ?]

Thats the cost of syncing to the screen.
On a 100Hz screen, a single frame can only use 10ms to draw or it will miss the vsync. The default strategy is to draws as fast as possible and then idle until the vsync occurs. This means any visual change you make will take between 10-20 ms (on 100Hz) until you see it on screen. Theoretically you could use SDL_Delay after the blocking swap function. But this will just make it very likely that you miss the next vsync. Also it would not help a lot: e.g. by cutting the draw time in half you only have 5ms to draw and still 5-15ms before change appears on the screen. But this only is a problem if you need direct screen changes after input. Otherwise just use scripts that the render thread knows in advance (like 3 frames) and then can show stuff at exact predefined frames/time. Nathaniel J Fries wrote:

SDL_Delay has no understanding of frames.

It delays for in milliseconds (on a 100Hz display, 1/10th of a frame, on a
60hz display, 1/17th of a frame, etc).

The only point in using SDL_Delay(1) was to make the event thread idle the
smallest possible amount of time so it doesn’t use up 100% CPU but still can
measure input timings exactly.

jeroen_clarysse1 · May 3, 2013, 5:32pm

thanks to all for your replies. This is quite an interesting discussion

I think the only solution would be to have some sort of a call like this : SDL_if_in_vertical_blank_then_swap_else_do_nothing()

this would basically check if the retrace beam is at the top. If so, the backbuffer is swapped onscreen. If not, execution continues and you can do some processing. Of course, if the beam was very close to the bottom, and this processing takes longer than the beam to turn around, you will miss a retrace and this drop a frame !

so an even better call would be SDL_how_far_from_the_vertical_blank_is_the_beam(), which returns pixels or msecs (one can be calculated from the other based on monitor resolution and refresh rate)

DirectX has such a call : IDirectDraw7::GetScanLine() which is documented here (http://msdn.microsoft.com/en-us/library/windows/desktop/gg426149(v=vs.85).aspx)

but if I’m correct, OpenGL doesn’t support this

I think that my only solution is to implement a new feature in my framework : “start an internal clock after the next swap”, which can then be used to measure response latencies. This does mean of course that exact timing between visual presentations is difficult

Nathaniel_J_Fries · May 3, 2013, 6:43pm

Sik wrote:

SDL_Delay(0) will give you the minimum delay (it literally means
"switch away from this task and return as soon as possible").

True.
Even better would be to use sched_yield on POSIX or SwitchToThread on Windows, since both of them essentially establish “I have done my job for this time slice, let other threads do theirs”.
I understand that the semantics are different (sched_yield guarantees that the thread is moved to the back of the scheduling queue, SwitchToThread only guarantees that one other thread will be given an opportunity to run)

Sik wrote:

Polling the events should be doing that already.

Polling events does wait on a mutex, but it does not explicitly yield execution.
Pumping events will not yield execution except as performed internally by the underlying windowing system (I would imagine xlib does quite a bit of this, and that Windows does very little).------------------------
Nate Fries

Sik_the_hedgehog · May 3, 2013, 6:54pm

I meant that polling events does help the scheduler know when it can
be safe to switch threads without hurting performance. Ultimately you
just want to help the scheduler do its job, not force it to do what
you want. Maybe it thinks that giving your thread more time is the
best thing after all!

2013/5/3, Nathaniel J Fries :>

Sik wrote:

SDL_Delay(0) will give you the minimum delay (it literally means
"switch away from this task and return as soon as possible").

True.
Even better would be to use sched_yield on POSIX or SwitchToThread on
Windows, since both of them essentially establish “I have done my job for
this time slice, let other threads do theirs”.
I understand that the semantics are different (sched_yield guarantees that
the thread is moved to the back of the scheduling queue, SwitchToThread only
guarantees that one other thread will be given an opportunity to run)

Sik wrote:

Polling the events should be doing that already.

Polling events does wait on a mutex, but it does not explicitly yield
execution.
Pumping events will not yield execution except as performed internally by
the underlying windowing system (I would imagine xlib does quite a bit of
this, and that Windows does very little).

Nate Fries

Frederik_vom_Hofe · May 3, 2013, 7:46pm

Sik wrote:

SDL_Delay(0) will give you the minimum delay (it literally means
"switch away from this task and return as soon as possible").

Polling the events should be doing that already.

Didnt know about SDL_Delay(0). But this would not eliminate 100% CPU usage.

“pooling” with SDL_PollEvent is non blocking and would also not cause the thread to idle.

But SDL_WaitEvent looks like it. But then you don’t know in what time intervals events come in and therefor how long the thread would be asleep after calling SDL_WaitEvent.

@jeroen clarysse: One simple question! Do you really need minimum time from event to screen-output with vsync? Otherwise you just over complicating things.

jeroen_clarysse1 · May 3, 2013, 8:10pm

Frederik vom Hofe wrote:

@jeroen clarysse: One simple question! Do you really need minimum time from event to screen-output with vsync? Otherwise you just over complicating things.

yeah… i’m not using SDL as a game engine, but for psychology experiments. Timing is imperative for approx 20% of our experiments. Anything that is “priming” related or subliminal perception requires very accurate measurements. I know about the whole issue with monitor latencies etc etc and we have ways to embrace that (we use very expensive 200Hz monitors and high end graphic cards for subliminal experiments or eyetracking stuff)

basically we need to know the response time between a presented stimulus and input that is read from a device (usually via the parallel port or Data Acquisiution cards). Also eyetrackers or EMG measurements need to be synced with the presentation time of the visual stimulus.

if it’s just one stimulus, I can sort of work around the VRS issues listed previously in this discussion, but sometimes we have a sequence of stimuli… also, like I said : i’m building an application that other researchers will use to build their own experiments. As such, I need to be sure that my timing is accurate in all possible circumstances, since you never know what these researchers will come up with

the previous version of the software was DirectX only, and I could use the GetScanLine() code : I would prepare all stimuli in the first half of the screen refresh, and make sure to be ready before the vertical blank is reached. With the SDL approach, I can’t do that since the Swap(sync=true) routine will block me from doing any work between swaps. Multithreading looks like a solution, but since I’m always one swap too late, I can’t get the accuracy I need either

but it is an interesting discussion nonetheless, and i’m learning a lot !

Frederik_vom_Hofe · May 3, 2013, 10:03pm

jeroen clarysse wrote:

if it’s just one stimulus, I can sort of work around the VRS issues listed previously in this discussion, but sometimes we have a sequence of stimuli… also, like I said : i’m building an application that other researchers will use to build their own experiments. As such, I need to be sure that my timing is accurate in all possible circumstances, since you never know what these researchers will come up with

Now it makes more sense to me, you just want to cover all cases.

I have a new idea. Even single threaded!

The most important part is a high resolution timer! But sdl2 got that covered:

Code:
double getHighResSec()
{
return static_cast( SDL_GetPerformanceCounter() ) / static_cast( SDL_GetPerformanceFrequency() );
}

Code:
if (SDL_GetPerformanceFrequency() > 1000)
{
//yes! high resolution timer
}
else
{
//crash
}

double lastVsyncTime = getHighResSec();

const double frameTime = 1.0 / 60.0; //for 60 hz screen (you could auto dedect this)
const double vsyncWaitBufferTime = 0.001; //the minimum time we want to give the flush and swap to render the image…

while(true)
{
while ( getHighResSec() + frameTime - lastVsyncTime < vsyncWaitBufferTime )
{
//pool events

	//maybe draw some cat pictures if
	//	getHighResSec() + frameTime - lastVsyncTime
	//is still big enough
	//maybe also want to call glFlush here to make sure driver starts to draw and gets done in time
}

swap; //with vsync of course
lastVsyncTime = getHighResSec();

}

PS: high end graphic cards maybe not your first choice if you are just looking for latency.

Frederik_vom_Hofe · May 3, 2013, 10:12pm

Just correcting a brain fart:

Code:
while ( getHighResSec() + frameTime - lastVsyncTime < vsyncWaitBufferTime )

has to be

Code:
while ( getHighResSec() - lastVsyncTime < frameTime - vsyncWaitBufferTime )

Jared_Maddox · May 4, 2013, 2:33am

Message-ID: <1367611804.m2f.36942 at forums.libsdl.org>
Content-Type: text/plain; charset=“iso-8859-1”

the previous version of the software was DirectX only, and I could use the
GetScanLine() code : I would prepare all stimuli in the first half of the
screen refresh, and make sure to be ready before the vertical blank is
reached. With the SDL approach, I can’t do that since the Swap(sync=true)
routine will block me from doing any work between swaps. Multithreading
looks like a solution, but since I’m always one swap too late, I can’t get
the accuracy I need either

Hmmm… Have you tried using software renderers with multi-threading?
I don’t know how much work your rendering is doing (and I haven’t
tried using the software renderer in a multi-threaded manner) but if
it’s light enough then it should be practical to do it entirely in
software, and just upload it when appropriate. If you need something
more heavy-duty than SDL itself then you can glue one of the software
OpenGL implementations (e.g. TinyGL) to a SDL_Surface.

Beyond that, have you looked to see if anyone has tried this with
platform-specific OpenGL? If they have, then their experiences with it
could be much more informative than anything we can tell you, since
SDL would mostly be useful for making your code more portable, rather
than being useful for e.g. rendering geometry-based scenes.> Date: Fri, 03 May 2013 13:10:05 -0700

From: “jeroen clarysse” <jeroen.clarysse at ppw.kuleuven.be>
To: sdl at lists.libsdl.org
Subject: Re: [SDL] syncing with the monitor retrace

slouken · May 4, 2013, 5:59am

If you’re running on a dedicated system, you can run a calibration loop
using SDL’s performance timer and the present synced to vblank to estimate
its timing, then you can figure out how long a present takes and roughly
when it happens.

Then you can in a single thread, loop with no delays and work until the
next scheduled vblank and then do a present. If you don’t have any other
load on the system it should be pretty accurate, and you can adjust the
calibration on the fly and detect missed vblanks.

I don’t have time to show it in code, but hopefully that helps.

See ya!On Fri, May 3, 2013 at 7:33 PM, Jared Maddox wrote:

Date: Fri, 03 May 2013 13:10:05 -0700
From: “jeroen clarysse” <jeroen.clarysse at ppw.kuleuven.be>
To: sdl at lists.libsdl.org
Subject: Re: [SDL] syncing with the monitor retrace
Message-ID: <1367611804.m2f.36942 at forums.libsdl.org>
Content-Type: text/plain; charset=“iso-8859-1”

the previous version of the software was DirectX only, and I could use
the
GetScanLine() code : I would prepare all stimuli in the first half of the
screen refresh, and make sure to be ready before the vertical blank is
reached. With the SDL approach, I can’t do that since the Swap(sync=true)
routine will block me from doing any work between swaps. Multithreading
looks like a solution, but since I’m always one swap too late, I can’t
get
the accuracy I need either

Hmmm… Have you tried using software renderers with multi-threading?
I don’t know how much work your rendering is doing (and I haven’t
tried using the software renderer in a multi-threaded manner) but if
it’s light enough then it should be practical to do it entirely in
software, and just upload it when appropriate. If you need something
more heavy-duty than SDL itself then you can glue one of the software
OpenGL implementations (e.g. TinyGL) to a SDL_Surface.

Beyond that, have you looked to see if anyone has tried this with
platform-specific OpenGL? If they have, then their experiences with it
could be much more informative than anything we can tell you, since
SDL would mostly be useful for making your code more portable, rather
than being useful for e.g. rendering geometry-based scenes.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Nathaniel_J_Fries · May 4, 2013, 2:16pm

Frederik vom Hofe wrote:

Sik wrote:

SDL_Delay(0) will give you the minimum delay (it literally means
"switch away from this task and return as soon as possible").

Polling the events should be doing that already.

Didnt know about SDL_Delay(0). But this would not eliminate 100% CPU usage.

Unless your thread is literally the only thread in the system, then it should drop significantly. I’ve dropped from >90% to as low as 2% using SDL_Delay(0).------------------------
Nate Fries

JonnyD · May 4, 2013, 4:05pm

I recall something about SDL_Delay(0) being a no-op on some platforms. Is
that true? I always use SDL_Delay(1) just in case.

Jonny DOn Sat, May 4, 2013 at 10:16 AM, Nathaniel J Fries wrote:

**

Frederik vom Hofe wrote:

Sik wrote:

SDL_Delay(0) will give you the minimum delay (it literally means
"switch away from this task and return as soon as possible").

Polling the events should be doing that already.

Didnt know about SDL_Delay(0). But this would not eliminate 100% CPU usage.

Unless your thread is literally the only thread in the system, then it
should drop significantly. I’ve dropped from >90% to as low as 2% using
SDL_Delay(0).

Nate Fries

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

Nathaniel_J_Fries · May 4, 2013, 4:25pm

Jonny D wrote:

I recall something about SDL_Delay(0) being a no-op on some platforms. ?Is that true? ?I always use SDL_Delay(1) just in case.

Jonny D

I have also heard this, which is why I think a yield function might be more appropriate for this purpose.------------------------
Nate Fries

Sik_the_hedgehog · May 4, 2013, 8:46pm

Decided to look into this to see how SDL_Delay behaves.

Windows: always calls the Sleep function. This behaves exactly like I said.

Unix: it tries to use nanosleep if available, and a busy loop
otherwise. Note that in the case of the latter it literally never
tells the OS that it’s waiting, making SDL_Delay unsuitable for giving
up CPU time.

BeOS: Haiku? Anyway, it uses snooze there, so I guess the yielding
semantics still apply.

PSP: what the heck is this doing here?! (whatever, calls
sceKernelDelayThreadCB, no idea how that works)

I assume all OSX, Linux and Android go with the Unix code. No idea
what iOS would use (I’d guess Unix, but yeah, not sure).

Anyway, basically it seems that yes, it calls the sleep function of
the OS pretty much always. There’s only one exception which is Unix
without nanoseconds, but in this case it enters a busy loop so no
matter how much you make the delay it won’t give up CPU time.

2013/5/4, Nathaniel J Fries :>

Jonny D wrote:

I recall something about SDL_Delay(0) being a no-op on some platforms. ?Is
that true? ?I always use SDL_Delay(1) just in case.

Jonny D

I have also heard this, which is why I think a yield function might be more
appropriate for this purpose.

Nate Fries

Nathaniel_J_Fries · May 5, 2013, 1:17am

Sik wrote:

Decided to look into this to see how SDL_Delay behaves.
Unix: it tries to use nanosleep if available, and a busy loop
otherwise. Note that in the case of the latter it literally never
tells the OS that it’s waiting, making SDL_Delay unsuitable for giving
up CPU time.

False. If nanosleep is not available, it uses select().
Passing a valid timeval but no fd_sets to a POSIX-compliant implementation of select will delay the thread for the specified time, or until a signal occurs. In fact, its behavior in this situation is nearly the same as nanosleep’s, except that select does not provide its own method of determining how much longer it needs to sleep (thus the seeming busy-loop).
POSIX does not specify whether this actually yields the thread. The internal implementation of select might well not yield execution.------------------------
Nate Fries

Sik_the_hedgehog · May 5, 2013, 2:01am

Touch?. Forgot about that trick.

And technically that about not explicitly saying if it yields is true
even in the delay functions of operating systems, it just happens to
be the most common behavior (originally because if the process sleeps
it truly has nothing else to do, and then because taking advantage of
that sleeping 0ms became an idiom for instant yielding in programs so
operating systems behave accordingly).

I don’t think it’s possible to guarantee a yield in a modern system no
matter what - and probably it isn’t a good idea to force yields since
that can mess up with the scheduler’s plans. We can behave in specific
ways to provide hints to the scheduler and that’s it.

2013/5/4, Nathaniel J Fries :>

Sik wrote:

Decided to look into this to see how SDL_Delay behaves.
Unix: it tries to use nanosleep if available, and a busy loop
otherwise. Note that in the case of the latter it literally never
tells the OS that it’s waiting, making SDL_Delay unsuitable for giving
up CPU time.

False. If nanosleep is not available, it uses select().
Passing a valid timeval but no fd_sets to a POSIX-compliant implementation
of select will delay the thread for the specified time, or until a signal
occurs. In fact, its behavior in this situation is nearly the same as
nanosleep’s, except that select does not provide its own method of
determining how much longer it needs to sleep (thus the seeming busy-loop).
POSIX does not specify whether this actually yields the thread. The internal
implementation of select might well not yield execution.

Nate Fries

jeroen_clarysse1 · May 6, 2013, 12:27pm

allright, here’s a wrap-up of my findings so far :

SDL does not have a way to detect vertical blank, so we can not use that to coordinate things
multithreading is a solution, but it is impossible to accurately coordinate everything inside my own framework in such a way that it always works

I will resort to the following solution :

    * two threads : main thread and event thread
    * main thread does only one thing : do a sync'ed SWAP() call which waits for the beam sync, then swaps. After the swap, this thread will 
       increment a VRScounter and verify that we did not lose too much time (time sinc last swap should be <= refresh rate)
    * event thread will inpsect all external devices and fixed-time-events. 
* every time the main thread updates the screen, a flag will be raised, e.g.  flag named "screen_was_updated"
* a researcher can lower this flag in his experiment when he displays critical visual stimuli. 
* a researcher can make an event "when flag is raised". He can attach affect5 commands (my framework code) to start timers
* this timer can be used to calculate the response time of the subject in the experiment
* needed commands in my framework are : 
		+ turn on/off VRS sync 	--> simply calls SDL_SetSwapInterval(1) resp. SDL_SetSwapInterval(0)
		+ set VRS flag 	  		--> tells the framework which flag to raise at each screen update
		+ set VRS counter		--> tells the framework which counter to increment at each screen update

that should do the trick !!

fingers crossed that I did not forget any crucial aspects

thanks for everyone who helped me brainstorm on this

jeroen_clarysse1 · May 6, 2013, 12:50pm

I have one last question perhaps for the experts here : the wiki states :

Code:
NOTE: You should not expect to be able to create a window, render, or receive events on any thread other than the main one.

this would mean that I can only inspect mouse and keyboard inside the MAIN THREAD, not the event thread. Is that correct ? Because it would mean that mouse & keyboard can only be checked once per refresh rate. For the keyboard that would be not that much of an issue since it is a slow device anyway. But a mouse button can be polled faster than that on most machines… It would be a bit of a shame to lose that !

any ideas/suggestions ?

Nathaniel_J_Fries · May 6, 2013, 1:10pm

jeroen clarysse wrote:

I have one last question perhaps for the experts here : the wiki states :

Code:
NOTE: You should not expect to be able to create a window, render, or receive events on any thread other than the main one.

this would mean that I can only inspect mouse and keyboard inside the MAIN THREAD, not the event thread. Is that correct ? Because it would mean that mouse & keyboard can only be checked once per refresh rate. For the keyboard that would be not that much of an issue since it is a slow device anyway. But a mouse button can be polled faster than that on most machines… It would be a bit of a shame to lose that !

any ideas/suggestions ?

You need to call SDL_PumpEvents from the main thread. you can use SDL_PeekEvent from another thread, as long as the main thread is filling SDL’s thread-safe event queue via SDL_PumpEvents.

But there’s a flaw with this plan, too. First, you’d have to wait for a vsync to fill the event queue. Second, if you miss a vsync during rendering, you’d have to wait for the NEXT vsync; meaning that your app will be less responsive. But there’s more: unless this has been fixed in SDL2, SDL’s thread-safe event queue only holds 256 events. It’s unlikely, but possible, to have more than 256 events per vsync. What happens to these events? Are they lost forever, or are they just held over? At least in the case of some older versions of SDL, they are lost forever. I’m not sure that this was ever fixed. And if they weren’t lost forever, they’d still be processed very late.

My recommendation? Don’t use vsync. If you only want to draw at the same rate of refresh, figure out the refresh rate of the display and use SDL_GetTicks to determine when next to draw. And if you’re using a backbuffer, I don’t see why you’d use vsync at all.------------------------
Nate Fries

jeroen_clarysse1 · May 6, 2013, 1:35pm

@Nathaniel : thanks for the feedback !

I need to use the VSYNC unfortunately : the whole application is centered around accurate displaying of images to subjects in psychology experiments (we use 100Hz or 200Hz CRT monitors for optimal results). It is rather important for us to ensure that images are drawn in “one sweep”, so syncing is mandatory. I can however live with the disadvantage of missing one frame occasionally. First of all, we only do a few draws per “trial” (a trial is the smallest instance of execution that is presented to the subject of the experiment. Typically, a trial is one image display, plus measurement of a response. A typical experiment is 50-100 trials per subject)

your solution of measuring the refresh rate once, and then using that time is something I’m not really feeling confident with : I just wrote a very small piece of code :

Code:

    Uint64 ticks_before[NUM];
    Uint64 ticks_after[NUM];
    SDL_GL_SetSwapInterval(1);
    printf("\n\nswap interval is %d \n\n", SDL_GL_GetSwapInterval());
    
    int swapcounter = 0;
    while (swapcounter < NUM)
    {
        
        ticks_before[swapcounter] = SDL_GetPerformanceCounter();
        SDL_RenderPresent(m_main_renderer);
        ticks_after[swapcounter] = SDL_GetPerformanceCounter();

        swapcounter++;
        
    }

    int i;
    for (i=0; i<NUM; i++)
        printf("\nswap %3d at %lld, duration %lld\n", i, ticks_before[i], ticks_after[i] - ticks_before[i]);
}

and it turns out there is a bit of variation in between each refresh. this would make it very difficult to ensure proper syncing

I think that using SDL_GL_SetSwapInterval(1) is my only way to ensure TRUE syncing