Minimum timeslice fun

Message: 13Date: Fri, 31 May 2002 16:44:43 +0200
From: David Olofsondavid.olofson@reologica.se
To: sdl at libsdl.org
Subject: Re: Re: minimum timeslice fun (was Re: [SDL]
semaphores and
mutexes)
Reply-To: sdl at libsdl.org

On Fri, 31 May 2002 16:44:43 , David Olofson<david.olofson at reologica.se> wrote:

On Fri, 31/05/2002 02:51:10 , Loren Osborn <@Loren_Osborn> wrote:

I think the main concern here seems to be not
burning cycles rendering extra unused (or unusable)

frames… For software surfaces I think the answer
seems to be obvious: Delay by the remainder of the

inverse of the frame rate each time SDL_Flip() is
called…

Well, that’s exactly what a proper retrace sync’ed
implementation of the driver’s “flip” operation
does.

With the exception of the actual snycronization that
you mentioned below:

i.e. query the video system to find that the
surface is displayed on a screen running at (let’s
say) 80 Hz… 80 Hz == 12.5 ms per frame. So
every time SDL_Flip() is called on a software
surface, SDL could save the tick-count. If 12
ticks haven’t elapsed since the previous SDL_Flip,
then sleep until they had…

Comments?

The first problem with that is that you can’t do it
without proper real time scheduling and high
resolution timers. It could be done on most
platforms, though. (Multimedia timers or Win32, RTC
driver on Linux etc.)

This is indeed true, but there are several cleaver
workarounds to this proposed the last week on this
list…

first, there is the posibility that (at a fairly slow
scan rate) we have a significant time to wait, and we
can request a sleep for a small amount less than that,
and busy wait for the rest… (not ideal or
guarenteed, but a reasonably portable solution)

also someone mentioned tracking how long sleep(0)
takes, and use that to determine if we should busy
wait or not…

another posibility is to query the OS for the amount
of time remaining in this timeslice… The biggest
problem here is also protablity, but it might allow
us to better guess if a sleep(0) is likely to return
in a particular length of time.

The second problem is the big one: Since there’s no
synchronization between the actual refesh rate and
your loop, you’ll have terrible tearing that
slowly drifts over the screen. Seen that during some

experiments, and I’d say it looks a lot worse than
normal, “random” tearing.

I am aware of this issue, but for SOFTWARE surfaces
(which I did specify) the ACTUAL syncronization
SHOULD be handled by the thing that is doing the
actual HARDWARE BLITS… (X server, SDL library,
whatever)… Maybe this is my opinion, but when you
tell a software surface to update a rect, it should be
obvious that you don’t have access to the V-retrace,
and to wait an apropriate time before doing the
Update… either that, or GIVE you access to the
V-retrace position so you can do the checking… (A
big problem with this would be the X server message
round-trip time…

Turn it into a PLL that locks on the refresh rate by

occasionally looking for the retrace, and we’re in
business. This is what I’m trying to do on Linux,
for use with various drivers. (Drivers will need
to occasionally timestamp retraces and pass the
data to a daemon, which will then keep track of
video timing using a real time thread driven by
the RTC, or other suitable IRQ source.)

//David


Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup

[…]

Well, that’s exactly what a proper retrace sync’ed
implementation of the driver’s “flip” operation
does.

With the exception of the actual snycronization that
you mentioned below:

If there’s no synchronization, the implementation is just about as broken as it gets, so I don’t really see what you mean here. :slight_smile:

i.e. query the video system to find that the
surface is displayed on a screen running at (let’s
say) 80 Hz… 80 Hz == 12.5 ms per frame. So
every time SDL_Flip() is called on a software
surface, SDL could save the tick-count. If 12
ticks haven’t elapsed since the previous SDL_Flip,
then sleep until they had…

Comments?

The first problem with that is that you can’t do it
without proper real time scheduling and high
resolution timers. It could be done on most
platforms, though. (Multimedia timers or Win32, RTC
driver on Linux etc.)

This is indeed true, but there are several cleaver
workarounds to this proposed the last week on this
list…
[…]

Yes, and what I mean is actually that this is not the issue here. At least Win32 and Linux has the required functionality to do it the proper way, and it might even be possibly on some systems that don’t provide similar functionality.

The real problem is the actual retrace sync, and that’s mostly a hardware and driver issue.

The really bad part is that most hardware suck more than you could possibly imagine in this regard. :frowning:

The second problem is the big one: Since there’s no
synchronization between the actual refesh rate and
your loop, you’ll have terrible tearing that
slowly drifts over the screen. Seen that during some

experiments, and I’d say it looks a lot worse than
normal, “random” tearing.

I am aware of this issue, but for SOFTWARE surfaces
(which I did specify) the ACTUAL syncronization
SHOULD be handled by the thing that is doing the
actual HARDWARE BLITS… (X server, SDL library,
whatever)… Maybe this is my opinion, but when you
tell a software surface to update a rect, it should be
obvious that you don’t have access to the V-retrace,
and to wait an apropriate time before doing the
Update…

Dream on. It was a loooong time ago I saw any consumer system that even had the hardware features required for raster synchronized blits, let alone support for it in drivers and API… :-/

These days, if hardware retrace sync is supported at all, it’s only for the flip operation of double buffered displays; not for blits.

either that, or GIVE you access to the
V-retrace position so you can do the checking…

Last machine I saw that feature on was the Amiga, which was basically a game console with an OS. The whole system (DMA for audio, video, floppy etc) was designed around the PAL/NTSC video timing, so everything kind of ran in hard sync with the video by definition.

These days, you should be happy if there’s a single “retrace in progress” bit to poll.

(A big problem with this would be the X server message
round-trip time…

Yeah - but IMHO, synchronization should be kept as far away from applications as possible anyway.

Running an application in hard sync with the video is inefficient and unreliable on normal operating systems (it’s a real time job…), and it doesn’t help much anyway if you want triple buffering or more.

Buffering two or more “frames” and having the driver or hardware manage synchronization is a fundamental design concept in audio processing, data aquisition, professional video processing and lots of other areas.

I find it very strange - to say the least - that this idea still seems so utterly foreign to most multimedia hardware and software developers.

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn Fri, 31/05/2002 13:59:17 , Loren Osborn <linux_dr at yahoo.com> wrote:

Ok… So it sounds like I’m living in a dream land.
Is it really that the hardware doesn’t expose the
v-trace? or that the drivers don’t expose it… As far
as I’m concerned, the windowing system (be it the
Windows Desktop, the X Windowing system, or whatever)
is a full screen application, and, as such, needs the
v-retrace to draw correctly. Is it just that most
windowing systems update so little of the screen per
frame that you just never see any visable tearing?
How do windowing systems traditionally handle this?

-LorenOn Fri May 31 15:40:01 2002 , David Olofson <david.olofson at reologica.se> wrote:

whatever)… Maybe this is my opinion, but when you
tell a software surface to update a rect, it should

be obvious that you don’t have access to the
V-retrace, and to wait an apropriate time before
doing the Update…

Dream on. It was a loooong time ago I saw any
consumer system that even had the hardware features
required for raster synchronized blits, let alone
support for it in drivers and API… :-/

These days, if hardware retrace sync is supported at

all, it’s only for the flip operation of double
buffered displays; not for blits.

either that, or GIVE you access to the
V-retrace position so you can do the checking…

Last machine I saw that feature on was the Amiga,
which was basically a game console with an OS. The
whole system (DMA for audio, video, floppy etc) was
designed around the PAL/NTSC video timing, so
everything kind of ran in hard sync with the video
by definition.

These days, you should be happy if there’s a single
“retrace in progress” bit to poll.


Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup

[…Death Of The Retrace…]

Ok… So it sounds like I’m living in a dream land.
Is it really that the hardware doesn’t expose the
v-trace?

AFAIK, practically all cards expose the actual retrace (ie the event where a CRT moves the beam back to the top-left corner) one way or another. The problem is that the only way to have that event affect the CPU in any way often is to poll a port on the video card, which contains a “1” bit only while the retrace is actually in progress.

That is, if you miss that window (in the range of a ms at most), you can’t tell that there was a retrace at all. “The event is lost.”

Any sane hardware design would use an interrupt for this event, but video cards apparently come from a parallel universe, were computers work in a totally different way… :wink:

or that the drivers don’t expose it…

Practically all Windows drivers implement retrace sync one way or another, although it has become popular to install 3D drivers with the feature disabled by default. One would assume that 3D card testers would know how to benchmark frame rates, but obviously the marketing departments don’t trust them, and thus chose to punish all users (that aren’t into video stuff) with tearing.

As far
as I’m concerned, the windowing system (be it the
Windows Desktop, the X Windowing system, or whatever)
is a full screen application, and, as such, needs the
v-retrace to draw correctly.

Right, but first of all, you have more than one application fighting for the screen, which makes it very hard to make use of double (or better) buffering - and without double+ buffering, retrace sync is both hard to implement, and requires that all updates take significantly less than a video frame to perform. (No point in syncing the start of an operation when the raster will scan over the drawing area several times before the operation is finished…)

The second problem is that normal applications aren’t “video oriented”. Their main loops sync with input events, file I/O and other stuff that has very little to do with the video subsystem. Since few applications are designed to deal with this, it’s pretty hard to force them to cooperate without serious performance issues.

Third, most normal applications only update parts of their windows every now and then, as opposed to pushing a full image every frame, as most games do. “N buffering” in a windowed environment requires that all changes are propagated to the next buffer before flipping. This means that applications with heavy but non critical rendering would screw up the smooth animation of real time graphics applications. It also means that applications that actually do render a full window every frame will suffer a severe performance penalty, unless they tell the OS that propagation of changes shouldn’t be automatic, as at has to be for normal applications.

Is it just that most
windowing systems update so little of the screen per
frame that you just never see any visable tearing?

Well, in my experience, most normal X and Win32 applications cause very visible tearing, if they get the chance… Only DirectX applications can avoid it, and only as long as rendering one frame takes less than one video frame. (No pageflipping.)

Either way, most windowed applications simply avoid doing anything that causes too much tearing.

How do windowing systems traditionally handle this?

They don’t. I don’t know of any “traditional” or current mainstream windowing systems that use a double buffered desktop, and IIRC, the last windowing system I saw that could sync normal rendering with the raster position was that of “classic” Amiga(D)OS.

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn Fri, 31/05/2002 18:01:46 , Loren Osborn <linux_dr at yahoo.com> wrote:

As much as I agree that an interrupt driven design
would be a much saner approach, the fact of the matter
is no such interrupt exists on current PC hardware
(which reduces that line of thinking to a “spilled
milk” argument). Since v-retrace is such a
consistently timed event (especially given that we can
query the frame rate), I propose that we can find the
v-retrace by polling (for no more than a single
frame), but once we know when it is, we can predict
its next occurrences, and send a
SDL_VIDEO_VRETRACE_START and a SDL_VIDEO_VRETRACE_END
event at the appropriate times while continually
calibrating our predictions.

Comments?

-LorenOn Fri, Mon, 3 Jun 2002 17:49:49 , David Olofson <david.olofson at reologica.se> wrote:

On Fri, 31/05/2002 18:01:46 , Loren Osborn <@Loren_Osborn> wrote:
[…Death Of The Retrace…]

Ok… So it sounds like I’m living in a dream land.

Is it really that the hardware doesn’t expose the
v-trace?

AFAIK, practically all cards expose the actual
retrace (ie the event where a CRT moves the beam
back to the top-left corner) one way or another.
The problem is that the only way to have that
event affect the CPU in any way often is to poll
a port on the video card, which contains a “1”
bit only while the retrace is actually in progress.

That is, if you miss that window (in the range
of a ms at most), you can’t tell that there was
a retrace at all. “The event is lost.”

Any sane hardware design would use an interrupt
for this event, but video cards apparently come
from a parallel universe, were computers work
in a totally different way… :wink:


Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup

[…]

As much as I agree that an interrupt driven design
would be a much saner approach, the fact of the matter
is no such interrupt exists on current PC hardware
(which reduces that line of thinking to a “spilled
milk” argument).

Exactly - I’m only trying to explain why I’m going for such a seemingly insane approach as a PLL and a 100-1000 Hz RTC timer driven thread. That’s the only way that can work on almost any card, without busywaiting and without missing retraces.

Since v-retrace is such a
consistently timed event (especially given that we can
query the frame rate), I propose that we can find the
v-retrace by polling (for no more than a single
frame), but once we know when it is, we can predict
its next occurrences, and send a
SDL_VIDEO_VRETRACE_START and a SDL_VIDEO_VRETRACE_END
event at the appropriate times while continually
calibrating our predictions.

That’s what I’m about to do, basically. :slight_smile:

I’ve had some success hacking some old Utah-GLX drivers, but with busywaiting on the TSC instead of the timer driven thread. I used the TSC to keep track of when the next retrace was about to occur, and polled for retrace until it was detected, or there was a timeout. Worked pretty well with Q3 and my test apps, but the busywaiting upset the scheduler.

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn Mon, 3/06/2002 15:54:23 , Loren Osborn <linux_dr at yahoo.com> wrote:

[…]

That’s what I’m about to do, basically. :slight_smile:

I’ve had some success hacking some old Utah-GLX
drivers, but with busywaiting on the TSC instead
of the timer driven thread. I used the TSC to
keep track of when the next retrace was about to
occur, and polled for retrace until it was
detected, or there was a timeout. Worked pretty
well with Q3 and my test apps, but the busywaiting
upset the scheduler.

Sounds good, but will it work with the SDL 2D stuff
also, or just OpenGL under SDL?

Thanks,

-LorenOn Tue, 4 Jun 2002 17:15:39 , David Olofson <david.olofson at reologica.se> wrote:


Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup

Retrace sync isn’t (or rather, shouldn’t be) implemented in SDL, so this project has more to do with Linux than with SDL.

Anyway, the Retrace Sync Daemon (RSD) will need some indication of when retraces occur, and preferably also reliable and accurate frame rate data. (Modelines should provide this information, provided the drivers can actually set up the exact requested frequencies. Other than that, one could measure the actual refresh rates, either during the installation, or when changing video modes. In my initial test, I used the latter method with very good results, so this shouldn’t be much of an issue, I think.)

In most cases, it seems like one may just have the RSD test the VGA retrace port directly, but on some cards, that doesn’t work unless VGA emulation is actually in use. On those, and possibly also on SMP systems, one could use a small “library” that allows a video driver to do the testing and timestamping in it’s own context (since it’s the “owner” of the video card), and then pass the data on to the RSD when desired. (Wall clock timing is not critical, as long as the timestamp for the occasional detected retrace is sufficiently accurate. Grabbing one timestamp before and one after each test gives the RSD a good idea about the quality of each “sample”.)

The RSD will be able to provide - depending on what drivers need -blocking, non busy waiting “wait_for_video_frame(N)” and “wait_for_next_retrace()” calls (implemented using the RTC on kernels with HZ < 1000 Hz), as well as various forms of “get_current_fractional_frame_count()” (calculated from the PLL - quite sufficient for “half buffering” in my experience) and the like. As a result of the true blocking sync calls, drivers will be able to implement blocking flip() methods.

Obviously, it should be pretty easy for most drivers to make use of the RSD, regardless of where they reside. fbdev, svgalib, DGA and some OpenGL drivers should be quite easy to adapt, while any targets that do back->front blits for flipping (like most Utah-GLX drivers, AFAIK) will have some trouble getting optimal performance. Hardware pageflipping is definitely recommended, as it losens up the scheduling accuracy requirements considerably. (Hitting the right frame is good enough, although better accuracy allows the application to use more CPU time per frame, especially with less than three buffers.)

In fact, applications could use the RSD directly, although I strongly recommend against doing that for anything but the “get_current_fractional_frame_count()” feature. (That would be very useful, though.)

Retrace sync on the application side simply won’t work with a normal OpenGL driver (actually tried that…), and it imposes strict real time requirements upon any application that uses it. Retrace sync should definitely stay inside the driver, and applications should block only when the driver cannot do anything sensible about further requests.

//David

.---------------------------------------
| David Olofson
| Programmer

david.olofson at reologica.se
Address:
REOLOGICA Instruments AB
Scheelev?gen 30
223 63 LUND
Sweden
---------------------------------------
Phone: 046-12 77 60
Fax: 046-12 50 57
Mobil:
E-mail: david.olofson at reologica.se
WWW: http://www.reologica.se

`-----> We Make Rheology RealOn Tue, 4/06/2002 14:52:42 , Loren Osborn <linux_dr at yahoo.com> wrote:

On Tue, 4 Jun 2002 17:15:39 , David Olofson <david.olofson at reologica.se> wrote:
[…]

That’s what I’m about to do, basically. :slight_smile:

I’ve had some success hacking some old Utah-GLX
drivers, but with busywaiting on the TSC instead
of the timer driven thread. I used the TSC to
keep track of when the next retrace was about to
occur, and polled for retrace until it was
detected, or there was a timeout. Worked pretty
well with Q3 and my test apps, but the busywaiting
upset the scheduler.

Sounds good, but will it work with the SDL 2D stuff
also, or just OpenGL under SDL?