Extension for 2D with DRI

I talked with Dirk Hohndel (sp?), the CTO of SuSE, who worked on XFree86
since 1992. I talked with him on a few things, and his answer as to
whether DRI could be adapted to allow access to 2D acceleration
primitives of video boards, his answer was something along the lines of
"Definitely!". Sounds good. :slight_smile:

We also talked about alpha blending and anti-aliasing in X
(server-side). There seems to be two efforts ongoing: one to get this
only for fonts (the cheap fixup approach), and the other a generic
extension to enable anti-aliasing of all operations (the real thing).
The problem has to do with the commutativity of operation: the X
protocol specifies that some operation should give the exact same result
whatever the order of execution. But, for example, two crossing lines
that are anti-aliased with give slightly different results depending on
the drawing order, so it isn’t doable right now.

There is some research ongoing in anti-aliasing algorithms that would
give the required properties.–
Pierre Phaneuf
http://ludusdesign.com/

Pierre Phaneuf wrote:

I talked with Dirk Hohndel (sp?), the CTO of SuSE, who worked on XFree86
since 1992. I talked with him on a few things, and his answer as to
whether DRI could be adapted to allow access to 2D acceleration
primitives of video boards, his answer was something along the lines of
"Definitely!". Sounds good. :slight_smile:

Oh yes, I also remember some people that wanted to be able to wait for
the vertical retrace. We also talked about this.

It seems that now, some new, high end video boards don’t event support
the reporting of the vertical retrace as an interrupt, if you want to
know, you have to poll (which is awful and is really not the thing to
do). At first, I was really puzzled and wondered how we could ever do it
correctly, but he explained further.

Polling for the retrace is really out of the question on a multitasking
system.

Waiting for an interrupt to be delivered is okay if you can hook the
interrupt (through a special kernel module, I think it is doable) and is
what is usually popular in games (and what people here were asking for).
It turns out that in reality, it is quite inefficient on recent boards.
First of all, in a non-realtime multitasking OS, it could take a while
between the actual interrupt happening and the X server process waking
up and acting on it, which is inacceptable since they happen so close to
each other. Second, modern boards have an accelerator pipeline that can
buffer multiple commands, and thus, waiting for the vertical retrace
before sending a command will only be effective if the accelerator
pipeline is empty or else it will wait even more and could effectively
miss the deadline. Even if the pipeline is empty, this is missing out on
parallelism, as the CPU waits idly for the vertical retrace to happen.

The right thing is this: those modern boards usually support a “wait for
vertical retrace” flag on their command, which will let the board do
all the waiting. If you do a big blit that you think could shear (small
blits usually are fast enough not to produce any artifacts), you set
that flag and when the accelerator is about to execute that command, it
will execute it only right as the vertical retrace happens, exactly what
we want. And the CPU is free during all this time to just spit commands
to the accelerator pipeline as fast as it can, exploiting the parallel
nature of the CPU/video board setup.

Any questions?–
Pierre Phaneuf
Systems Exorcist

Polling for the retrace is really out of the question on a multitasking
system.

Not if the alternative is blasting frames to the frame buffer as fast as you
can. If you are spending 100% CPU already, polling won’t make matters worse.

The right thing is this: those modern boards usually support a “wait for
vertical retrace” flag on their command, which will let the board do
all the waiting.

Unless your game is of the type “blit an entire screenful each frame”,
perhaps playing a movie. If you don’t want to grab 100% CPU, you have to
wait for something. In that case a device that blocks on a read() or ioctl()
until the next refresh would be handy, even if imprecise.

Maybe it could be simulated by short sleeps (using RTC) and polls,
I haven’t looked into this. But you would still need a way to find out whether
a vertical retrace has taken place.

Not if the alternative is blasting frames to the frame buffer as fast as
you
can. If you are spending 100% CPU already, polling won’t make matters
worse.

LOL… Actually, this brings up another point- how would one synchronize
video playback to run precisely at say, 30fps? I presume it’s impossible to
guarantee a locked framerate on a non-realtime OS, but many Windows apps do
quite a good job of it…

Dan

LOL… Actually, this brings up another point- how would one synchronize
video playback to run precisely at say, 30fps? I presume it’s impossible to
guarantee a locked framerate on a non-realtime OS, but many Windows apps do
quite a good job of it…

Do you want exactly 30fps, or a multiple (or short fraction) of the frame
rate of your monitor? In the first case you have to resample your movie,
either by doing very expensive interpolation, or be cheap and just
duplicate a frame here and there. For the low quality of video playback on
a computer, I guess the latter is all right.

I would sleep for most part of the inter-frame delay, and then spin in a
loop, polling for vertical retrace. Either sleep on a timer (if your
tick granularity is good enough), or (on Linux) use the rtc device.
The reward of not hogging 100% CPU is that you aren’t penalized for
gobbling up your entire timeslice all the time…

Do you want exactly 30fps, or a multiple (or short fraction) of the frame
rate of your monitor? In the first case you have to resample your movie,
either by doing very expensive interpolation, or be cheap and just
duplicate a frame here and there.

Ah, I was ignoring the refresh granularity problem. To get (approx) 30fps
playback on a 100Hz monitor, I guess I’d need to alternate between holding
each video frame for 3 and 4 retraces.

Which makes me even more curious how some Win32 apps seem to be able to do a
good job without any direct control over the hardware. I suppose if the blit
happens fast enough, and the OS has a low latency timer, you could "average"
30fps by sleeping for 33 msec; then the new frame would go out on the next
retrace, whenever that happens to be.

Heh, wish I had a super-high-speed camera to see what really happens =)

Thanks for the insight,
Dan

Which makes me even more curious how some Win32 apps seem to be able to do
a
good job without any direct control over the hardware. I suppose if the
blit
happens fast enough, and the OS has a low latency timer, you could
"average"
30fps by sleeping for 33 msec; then the new frame would go out on the next
retrace, whenever that happens to be.

As I understand it, the traditional way of locking playback to 30 fps is to
time how long it takes to draw the frame, then sleep for 33 less the time it
takes to draw the frame.

Jon.

Mattias Engdeg?rd wrote:

Polling for the retrace is really out of the question on a multitasking
system.

Not if the alternative is blasting frames to the frame buffer as fast as you
can. If you are spending 100% CPU already, polling won’t make matters worse.

Except that blasting the frames in questions usually mean some
computations, whether it be MPEG decompression or 3D rasterizing. Now
you can start doing these computation while the video board is waiting
(with an radically better precision I might add) for the vertical
retrace.

The net result is higher framerates: who doesn’t want that? :slight_smile:

The right thing is this: those modern boards usually support a “wait for
vertical retrace” flag on their command, which will let the board do
all the waiting.

Unless your game is of the type “blit an entire screenful each frame”,
perhaps playing a movie.

No problem with that. Just tag the blit command with the “wait for
retrace” flag, issue it and start uncompressing the next frame.

If you don’t want to grab 100% CPU, you have to wait for something.
In that case a device that blocks on a read() or ioctl() until the
next refresh would be handy, even if imprecise.

There is a big problem with any kind of userland "wait for retrace"
scheme. If it is polling, the app might get preempted and miss the exact
moment of the retrace, at best getting it late (perharps too late) and
at worst losing it completely. If it is a blocking system call, the
unblocking is not immediate, it only means that the kernel scheduler
puts back the process in the run queue, to be scheduled “when
appropriate”. Note that my system has a vertical refresh of over 100 Hz,
which means there is less than 10 millisecond between each retrace. The
Linux kernel preempts processes at a 100 Hz rate (except on Alpha, where
it is 1024 Hz I think), so not to miss the vertical retrace completely,
there would need to be NO other process in the run queue when the
blocked process is unblocked.

All of the “software” wait for vertical retrace solutions (those that
use the main CPU) are doomed in high end situations where it happens too
quickly (except maybe using a realtime OS, which is not the case with
Linux currently).

Remember that the vertical retrace idea is to take advantage of a window
in time where the ray is not actually drawing on the screen and where
changing the framebuffer will not cause visible distorsion and shearing.
When that window is half of the scheduler resolution, you’re in deep
shit if you are a process controlled by the scheduler, because in the
average case you’ll be missing the window.

The other way, relying on the video accelerator, is comparatively
perfect. The very hardware that knows when the vertical retrace happen
is the one that is triggered by it, it cannot get any better. Why isn’t
it used more? I think the reason is that using the 2D accelerator isn’t
all that popular, even today, and the video accelerator can only do the
waiting for the commands it is asked to do, not those that you do
yourself. Considering that hardware acceleration is as a rule of thumb
at least 20% faster (and often 700%-1000%), we’d be really better off by
switching to it!

Maybe it could be simulated by short sleeps (using RTC) and polls,
I haven’t looked into this. But you would still need a way to find out
whether a vertical retrace has taken place.

Really, all this simulation will get you is missing the vertical
retrace, or if you’re lucky, getting it when the ray is halfway up the
screen… If you are lucky!–
Pierre Phaneuf
http://ludusdesign.com/

Dan Maas wrote:

LOL… Actually, this brings up another point- how would one synchronize
video playback to run precisely at say, 30fps? I presume it’s impossible to
guarantee a locked framerate on a non-realtime OS, but many Windows apps do
quite a good job of it…

You probably need to use a proper sequence of accelerated blits and
finish/syncronize calls to the accelerator. If you use accelerated
blits, keeping up 30 fps is rather easy on modern systems.–
Pierre Phaneuf
http://ludusdesign.com/

Mattias Engdeg?rd wrote:

I would sleep for most part of the inter-frame delay, and then spin in a
loop, polling for vertical retrace. Either sleep on a timer (if your
tick granularity is good enough), or (on Linux) use the rtc device.
The reward of not hogging 100% CPU is that you aren’t penalized for
gobbling up your entire timeslice all the time…

The beginning of your answer (sleeping for the inter-frame delay) is
okay with me, but why poll for vertical retrace? As soon as the
inter-frame delay is finish, fire off a blit command to the video
accelerator (with the “wait for vretrace flag” enabled) and start
preparing the next frame right away!–
Pierre Phaneuf
http://ludusdesign.com/

Really, all this simulation will get you is missing the vertical
retrace, or if you’re lucky, getting it when the ray is halfway up the
screen… If you are lucky!

Actually, when working on the DGA 2.0, I noticed that the wait for
vertical retrace call worked really well, although it might just be
the sequence in which I flipped buffers and did the drawing.

See ya,
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

Sam Lantinga wrote:

Really, all this simulation will get you is missing the vertical
retrace, or if you’re lucky, getting it when the ray is halfway up the
screen… If you are lucky!

Actually, when working on the DGA 2.0, I noticed that the wait for
vertical retrace call worked really well, although it might just be
the sequence in which I flipped buffers and did the drawing.

I think that Dirk mentioned that the page flipping function in DGA
issued a “flip the page when the vertical retrace happen” command to the
video board, when available and simulated otherwise. Try it without the
wait for vertical retrace? Might be interesting…

But he referred me to Mark Volkovich for further high performance
issues…–
Pierre Phaneuf
http://ludusdesign.com/

Which makes me even more curious how some Win32 apps seem to be able to do a
good job without any direct control over the hardware. I suppose if the blit
happens fast enough, and the OS has a low latency timer, you could "average"
30fps by sleeping for 33 msec; then the new frame would go out on the next
retrace, whenever that happens to be.

Hehe… that’s what I do. :slight_smile: Actually, my games usually end up being 33fps
or 20fps depending on whether I decide to wait 30ms or 50ms…

-bill!

Well if we’re talking about normal audio/video playback then most of the
time they just don’t wait for vsync. Windows media player certainly
doesn’t and neither do most cutscenes in games.
It’s not that big an issue really.On Thu, 13 Apr 2000, Dan Maas wrote:

Do you want exactly 30fps, or a multiple (or short fraction) of the frame
rate of your monitor? In the first case you have to resample your movie,
either by doing very expensive interpolation, or be cheap and just
duplicate a frame here and there.

Ah, I was ignoring the refresh granularity problem. To get (approx) 30fps
playback on a 100Hz monitor, I guess I’d need to alternate between holding
each video frame for 3 and 4 retraces.

Which makes me even more curious how some Win32 apps seem to be able to do a
good job without any direct control over the hardware. I suppose if the blit
happens fast enough, and the OS has a low latency timer, you could "average"
30fps by sleeping for 33 msec; then the new frame would go out on the next
retrace, whenever that happens to be.

Heh, wish I had a super-high-speed camera to see what really happens =)

Thanks for the insight,
Dan

Long live the confused,
Akawaka.

Bother, said Pooh as the Ewoks stole his honey pot.

The beginning of your answer (sleeping for the inter-frame delay) is
okay with me, but why poll for vertical retrace? As soon as the
inter-frame delay is finish, fire off a blit command to the video
accelerator (with the “wait for vretrace flag” enabled) and start
preparing the next frame right away!

You are perfectly right; I must have been thinking of two things at the
same time, thus overtaxing the sponge pretending to be my brain.
I was concerned about synching for replaying at a fixed video/frame ratio,
but thinking about it, it is probably not very important since the vertical
refresh (~80Hz) is far higher than for movies movies (24 frames/s). So
I guess there is little risk of temporal aliasing artifacts from that,
and it’s better to used fixed timing between frames.

I wonder if it makes sense to simulate the 50 interlaced half-frames
per second for TV playback, though. How do people do it? Simply pair them
together and run them at 25Hz?

The net result is higher framerates: who doesn’t want that? :slight_smile:

Nobody needs higher framerates than required for the recorded video sequence,
nor higher than vertical frequency of your monitor. But if there is a lot
of computation going on, you are right.

The right thing is this: those modern boards usually support a “wait for
vertical retrace” flag on their command, which will let the board do
all the waiting.

Unless your game is of the type “blit an entire screenful each frame”,
perhaps playing a movie.

No problem with that. Just tag the blit command with the “wait for
retrace” flag, issue it and start uncompressing the next frame.

Sure, this is the principle behind triple buffering: assuming you own the
CPU, decouple the screen refresh from the frame generation. But you may
not have that luxury.

Even if you can maintain a CPU monopoly, I can imagine situations
where it is better to produce 40 than 51 frames/s when your monitor
refreshes at 80Hz. First-person shooters are remarkably immune to this,
but other games could produce temporal aliasing problems.

There is a big problem with any kind of userland "wait for retrace"
scheme. If it is polling, the app might get preempted and miss the exact
moment of the retrace, at best getting it late (perharps too late) and
at worst losing it completely.

I didn’t mean to let polling or blocking trigger the screen refresh;
the hardware is best suited for that (either by page flipping or by
issuing queued drawing requests at retrace). But that is not sufficient
if you want to play something at a fixed fraction of the monitor frequency.

I’m painfully aware of tearing effects from the lack of synchronization
in X11, and I really wish there was a good standard way around it.
Too bad many platforms have no usable implementation of the double-buffer
extension (DBE).

Mattias Engdeg?rd wrote:

You are perfectly right; I must have been thinking of two things at the
same time, thus overtaxing the sponge pretending to be my brain.

LOL!

I was concerned about synching for replaying at a fixed video/frame ratio,
but thinking about it, it is probably not very important since the vertical
refresh (~80Hz) is far higher than for movies movies (24 frames/s). So
I guess there is little risk of temporal aliasing artifacts from that,
and it’s better to used fixed timing between frames.

Exactly.

I wonder if it makes sense to simulate the 50 interlaced half-frames
per second for TV playback, though. How do people do it? Simply pair them
together and run them at 25Hz?

You mean, with a TV tuner in your PC? Or on a real TV? On a real TV, you
effectively get a framerate of 25 or 30 Hz (depending on where you
live!) because it takes two cycles to draw a full frame. I guess TV
tuner cards approximate to the next vertical retrace like we’d do
playing an MPEG movie.–
Pierre Phaneuf
http://ludusdesign.com/

You mean, with a TV tuner in your PC? Or on a real TV?

Yes, a tuner for a PC monitor, or a TV to digital video converter.
As I understand it, combining two half-frames yields a frame where the
scan lines come from different time points, so it’s not really accurate.
Perhaps this doesn’t show at all, given the quality of regular PAL (not to
mention NTSC). I suppose that all points of a half-frame are not sampled
at the same time either, so to simulate an analog TV completely, you
have to keep track of the electron beam, phosphorous afterglow, etc :slight_smile:

Sorry, I guess this has a fatally decreasing SDL content…

Mattias Engdeg?rd wrote:

The net result is higher framerates: who doesn’t want that? :slight_smile:

Nobody needs higher framerates than required for the recorded
video sequence, nor higher than vertical frequency of your monitor.
But if there is a lot of computation going on, you are right.

No, for recorded video, this is not required, except when the CPU power
to do the decompression is lacking. Making the CPU wait isn’t a winning
strategy when you have barely what you need to keep the full framerate
of the video sequence.

And for a game, you want the highest framerate possible. I have a
refresh rate around 100 Hz and games usually have a hard time getting
this framerate, so when this is going to be a problem, I think it
actually won’t be anymore (I’ll have to think of more effects to slow it
down!)… :slight_smile:

No problem with that. Just tag the blit command with the “wait for
retrace” flag, issue it and start uncompressing the next frame.

Sure, this is the principle behind triple buffering: assuming you own the
CPU, decouple the screen refresh from the frame generation. But you may
not have that luxury.

You may not always have that luxury (maybe your blits are software and
synchronous), but doing it that way doesn’t give worse performance than
an algorithm that doesn’t take advantage of asynchronous blits, so why
not assume asynchronous blits and try to do your best?

Even if you can maintain a CPU monopoly, I can imagine situations
where it is better to produce 40 than 51 frames/s when your monitor
refreshes at 80Hz. First-person shooters are remarkably immune to this,
but other games could produce temporal aliasing problems.

I don’t really believe in this. If you can make 51 fps on a monitor
going at 80 Hz, that will be better than 40 fps. Of course, the game
logic has to compensate the time properly!

issuing queued drawing requests at retrace). But that is not sufficient
if you want to play something at a fixed fraction of the monitor frequency.

Is that useful at all?

I’m painfully aware of tearing effects from the lack of synchronization
in X11, and I really wish there was a good standard way around it.
Too bad many platforms have no usable implementation of the double-buffer
extension (DBE).

I was told that XFree86 queues drawing requests at retrace to the
accelerator, but for operations that are done in software, the
synchronization too bad to help (the problem I described) and/or
software PIO transfers are so slow that they take more time than the
vertical refresh allows for.

Some of the drivers of XFree86 4.0 are almost completely using the
accelerator and exhibit MUCH better behavior regarding tearing.–
Pierre Phaneuf
Systems Exorcist

I don’t really believe in this. If you can make 51 fps on a monitor
going at 80 Hz, that will be better than 40 fps. Of course, the game
logic has to compensate the time properly!

For a 3D shooter you are probably right — they show a remarkable
tolerance to low frame rates and even tearing. But for a 2D game I’m
not so sure. Consider:

               |---------------------------------> time

monitor refresh m m m m m m m m
game frame update 1 2 3 4 5 6
player sees 111111111122222333333333344444555556
object position /\ ^ /\ ^ ^

In this example, frames 1 and 3 are shown twice as long as frames 2, 4 and 5.
Imagine you have an object moving with constant speed on the screen. To
maintain the illusion of constant speed, the time points for each frame
should be the midpoint of the interval during which the frame is seen by
the player (marked by ^ and /). Of course, these have to be known in
advance when rendering a frame (in order to draw the object at the right
place), but then you must know where you are in relation to the vertical
refresh.

And even then, I’m not 100% sure that motion will be perceived as
completely smooth, since the users sees the object occupying some
positions for a longer time than others.

I was told that XFree86 queues drawing requests at retrace to the
accelerator, but for operations that are done in software, the
synchronization too bad to help (the problem I described) and/or
software PIO transfers are so slow that they take more time than the
vertical refresh allows for.

This is great news. That would imply that we could replace

 XShmPutImage(Shared XImage to Window);
 XSync();

with

XShmPutImage(Shared XImage to Pixmap);
XCopyArea(Pixmap to Window);
XSync();

and get triple-buffering at (almost) no extra charge!