CSDL with quad-buffering and a seperate flip-thread

Jason Hoffoss wrote:> ----- Original Message -----

From: “David Olofson” <david.olofson at reologica.se>
To:
Sent: Wednesday, March 13, 2002 5:18 PM
Subject: Re: [SDL] CSDL with quad-buffering and a seperate flip-thread

I don’t see how you can ever get smooth animation without rendering
exactly one frame per refresh…

If you try to sync to the vertical sync rate you have to deal with all
possible rates. A quick look at a couple of monitor manuals on my shelf
shows that one supports vsync rates of 60, 70, 75, and 80 Hz. The other
one supports vsync rates of between 50 and 160 Hz. Look at screen
properties under windows I find I can set the vsync rate on my current
desktop to 60, 70, 72, 80 and 100 Hz. These are the rate available for
my video card with my monitor. The same monitor connected to a different
video card, and vice versa, will have different rates.

This is the problem with trying to base you animation on the vsync rate.
What is it? What will it be on other computers? The PC is NOT like
programming for a C64 or an old amigq where you can assume the vsync
rate is always the same. A game that works great when synched to the
refresh rate on one machine is going to look terrible when synched to
the refresh rate on another computer. So why bother?

It looks to me like you are making a number of assumptions. If I am
wrong, please tell me. It looks like you are assuming that the vsync
rate is the same on all machines. You can count on it being different.
that means you don’t know how much time you actually have to render a
frame. You don’t even know the speed and model of the CPU the game will
run on. You might have a million instructions per vsync, and you might
have 100 million. It also looks to me like you are assuming that you
will have exclusive use of the CPU. This is not true on any of the
widely used OSes. There are a number of tasks that will sneak in and
grab cycles. It seems like you are assuming that you know that each
frame can be rendered in roughly the same amount of time. Level
designers can really mess that up. One long hall with too many polygons
and your frame rate can drop by 90%… It looks like you are also making
assumptions about the quality of the video device driver. There is a
good chance (almost certain) that the driver will only do a swap at
vsync time. A lot of hardware won’t swap at any other time. There is
also a good chance that the driver is implementing a triple buffer
scheme if it can. The driver writers are highly motivated to make
graphics run as fast as possible on the device. (That is how they sell
video cards after all.) And, it also looks like you are assuming that a
hardware buffer swap involves and actual buffer swap, this also may not
be true, especially when dealing with a windowing system of any kind. It
can be faster to use the hardware to render in an off screen buffer and
do the swap with a blit. Even if the driver gives you a flag or a call
back on vsync there is a reasonable chance that the call back is
generated from a timer or some other way that has nothing to do with the
actual vsync.

In general, you just can’t know enough to justify doing anything to try
to sync with vertical retrace.

I have found that the best thing to do is just draw the graphics based
on actual elapsed time and check to see that the frame rate is fast
enough to give good game play. (Do the test up front!) This approach
gives your game a smooth visual feel. The frame you are looking at
reflects what the player expects at this time no matter how long it took
to create the frame. The frame rate may be jerky, but you don’t get
sudden slow downs and speed ups in the action.

Feel to ignore me and to disagree with me.

	Bob Pendleton


±-----------------------------------+

  • Bob Pendleton is seeking contract +
  • and consulting work. Find out more +
  • at http://www.jump.net/~bobp +
    ±-----------------------------------+

At 16:43 17.03.2002 -0600, you wrote:

In general, you just can’t know enough to justify doing anything to try to
sync with vertical retrace.

I have found that the best thing to do is just draw the graphics based on
actual elapsed time and check to see that the frame rate is fast enough to
give good game play. (Do the test up front!) This approach gives your game
a smooth visual feel. The frame you are looking at reflects what the
player expects at this time no matter how long it took to create the
frame. The frame rate may be jerky, but you don’t get sudden slow downs
and speed ups in the action.

Feel to ignore me and to disagree with me.

Agree … I found out that ppl who’d like CONSTANT framerates just don’t
like to make the step of generating time-dependant routines out of their
frame-dependant routines.
St0fF 64> Bob Pendleton


±-----------------------------------+

  • Bob Pendleton is seeking contract +
  • and consulting work. Find out more +
  • at http://www.jump.net/~bobp +
    ±-----------------------------------+

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

[…]

Kobo Deluxe is a 2D game, and it runs pretty well with SDL’s
software blitters on all platforms, AFAIK. (Only tried Linux and
Windows myself.)

Ok, sorry, I thought it only used OpenGL through glSDL. Maybe 2D
performance is identical on both Win32 and X and I am doing something
[…]

I don’t think it will be identical, as Win32 seems to use busmaster DMA
for the blitting from system memory, while X cannot do that. Performance
with OpenGL should be similar, though, although that’s a different story
totally. (Kobo Deluxe doesn’t use any “procedural surfaces” as of yet, os
texture transfer rates are not important - only rendering speed is.)

However, s/w rendering seems to have a major problem in the system ->
VRAM transfer here. Seems like software rendering games will be
physically restricted to about the speed they’re running at now (and
have been running at for quite a while), while games relying on h/w
acceleration are the ones that really benefit from the continous
increase in computing power.

All depends on how a programmer goes about the problem. There’s always
lots of different ways to solve a problem, after all. If they were to
render everything into a system memory surface to compose the scene,
and then blit that to the screen itself once a frame, CPU speed would
have a much larger impact.

Yes - but only if the buffer is transfered using busmaster DMA. If the
CPU has to do it, there goes your framerate. It’s that serious, really.

Might even become a wise idea down the road
if CPUs increase a lot more but system -> VRAM speeds stay the same.
Going OpenGL most likely would still be wiser, though, considering the
focus of video card makers.

And besides, transfers from system RAM to texture RAM should be DMA
accelerated as well, on most modern cards, so you can only win, almost
regardless of what kind of rendering you’re doing.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Saturday 16 March 2002 11:00, Jason Hoffoss wrote:

Jason Hoffoss wrote:

flip-thread

I don’t see how you can ever get smooth animation without rendering
exactly one frame per refresh…

If you try to sync to the vertical sync rate you have to deal with all
possible rates. A quick look at a couple of monitor manuals on my shelf
shows that one supports vsync rates of 60, 70, 75, and 80 Hz. The other
one supports vsync rates of between 50 and 160 Hz. Look at screen
properties under windows I find I can set the vsync rate on my current
desktop to 60, 70, 72, 80 and 100 Hz. These are the rate available for
my video card with my monitor. The same monitor connected to a
different video card, and vice versa, will have different rates.

Fine with me - as long as I get to render one frame for every refresh, or
at least, have a chance of knowing which refreshes I’m supposed to
"produce" frames for.

This is the problem with trying to base you animation on the vsync
rate.

That’s a completely different thing, and I’d never dream of it. It’s just
not flexible enough - you can’t even change the scroll speed in the game
without breaking it!

BTW, does Kobo Deluxe look like it’s controlled by the refresh rate? (I
hope not! :slight_smile:

What is it? What will it be on other computers? The PC is NOT
like programming for a C64 or an old amigq where you can assume the
vsync rate is always the same. A game that works great when synched to
the refresh rate on one machine is going to look terrible when synched
to the refresh rate on another computer.

Yeah - if you do it that way… (Haven’t done that since the original
Project Spitfire, which relied on a VGA compatible video card anyway.
Wasn’t a problem then, but eventually, VGA hardware scrolling started
breaking on many video cards, and I never bothered switching to software
scrolling as I considered DOS dead anyway.)

So why bother?

Because you’re missing the whole point.

Synchronizing the rendering with the retrace is completely different
from synchronizing the whole game with it. For example, the engine I’m
using in Kobo Deluxe lets you select any “logic” frame rate you like, and
then interopolates all coordinates up or down to the actual rendering
frame rate, as required.

Now, if the rendering is implemented as “native OpenGL”, utilizing the
fractional parts of the output coordinates (8 bits right now…) for
sub-pixel accurate rendering, you get incredibly smooth scrolling and
animation regardless of refresh rate, or “logic frame rate”/refresh rate
ratio.

This is a fact, and I’ve implemented it and seen the result. It’s very
real, and totally portable, if it wasn’t for ONE thing: Linux basically
doesn’t have OpenGL drivers that perform retrace sync’ed flips - so
you’re not seeing anything but the usual tearing and crap to be expected
with such drivers. Still works though, it just doesn’t look any better
than the drivers allow.

It looks to me like you are making a number of assumptions. If I am
wrong, please tell me.

(See above. :slight_smile:

It looks like you are assuming that the vsync
rate is the same on all machines.

Nope.

You can count on it being different.
that means you don’t know how much time you actually have to render a
frame.

If I don’t have enough power for the actual refresh rate, I’ll just have
to accept it, and “switch down” to a lower frame rate. Things would be
easier and look better if there was a reliable way of finding out exactly
which frames are dropped, but things would “kind of” work even if that
isn’t solved properly.

You don’t even know the speed and model of the CPU the game will
run on. You might have a million instructions per vsync, and you might
have 100 million.

You get the frame rate your machine can handle, and if that’s less than
full frame rate, the smoothness suffers. No news there.

It also looks to me like you are assuming that you
will have exclusive use of the CPU. This is not true on any of the
widely used OSes. There are a number of tasks that will sneak in and
grab cycles.

Right, and that’s a problem. Nothing much to do about that. However, it
gets much worse if you just hog the CPU, without ever blocking on
anything. (Like the retrace, when you’re out of buffers…)

Anyway, this won’t be much of a problem with Linux in the future. In
fact, it isn’t much of a problem if you use a Linux/lowlatency kernel,
or even one of the current preemptive kernels. If you’re entitled to run,
your thread will wake up within a fem ms, at most, and if it was
properly blocking on the retrace (real IRQ or timer emulation), things
would be just great.

We’ve been doing audio processing like this for quite a while (<3 ms
latency) with no drop-outs ever, despite heavy system stress of all
kinds. I’d say that anyone that claims that a “destop system” could never
do that is plain wrong - I have seen it in action on various PCs.

The fact that Windows can’t do it reliably, even with a real time kernel,
does not prove anything other than that there is lots of misbehaving code
in it, and it’s drivers.

It seems like you are assuming that you know that each
frame can be rendered in roughly the same amount of time.

I am, in the case of 2D games. Indeed, you could max out a 3D card with
2D levels as well, but that would be a very crowded screen with any
reasonably modern card… :slight_smile:

Level
designers can really mess that up. One long hall with too many polygons
and your frame rate can drop by 90%…

Yeah - but that sounds more like an example of poor level design to me.
(I did read up some on 3D level design.)

Either way, if the fram rate drops, so be it. The only to completely
avoid the risk of that is to very carefully tune the game for a specific
machine, and then never run it on anything that could possibly be
slower under any circumstances.

An arcade game would be an example of such a solution - but considering
that even my dog slow G400 seems to be fast enough for what I want to do,
I’d guess 90% of the users with OpenGL would get a constant full frame
rate with the kind of games I’m thinking about.

It looks like you are also
making assumptions about the quality of the video device driver. There
is a good chance (almost certain) that the driver will only do a swap
at vsync time. A lot of hardware won’t swap at any other time.

Well, the last machine I saw that was at all capable of anything but that
was the Amiga, which didn’t have buffered video DMA pointers. You had to
use the copper to change them when you desired. Usually, you’d just reset
them to the start of the screen some time during the retrace, but you
could change the pointers at any time, to implement independently h/w
scrolled splitscreens and the like.

PC video cards just latch the frame buffer offset/pointer into an
internal counter right when it’s about to start a refresh. You can’t
force them to latch during the refresh - and it would be usesless anyway,
as it would just result in the output restarting at the first pixel of
the buffer immediately when you write the register… heh

There is
also a good chance that the driver is implementing a triple buffer
scheme if it can.

In fact, that’s exactly what I would want, as that gives me a much
better chance of maintaining a steady full frame rate, even if the OS has
crappy timing and/or my rendering times are “jittering” near the frame
rate limit.

The driver writers are highly motivated to make
graphics run as fast as possible on the device. (That is how they sell
video cards after all.)

That’s also the way I like it. What’s the problem? (Of course, disabling
the retrace sync sucks, but then again, I don’t think I’ve seen a Windows
driver that does that by default… It’s only meant for benchmarks.)

And, it also looks like you are assuming that a
hardware buffer swap involves and actual buffer swap, this also may not
be true, especially when dealing with a windowing system of any kind.
It can be faster to use the hardware to render in an off screen buffer
and do the swap with a blit.

I’m perfectly aware of that - as a matter of fact, my first subpixel
accurate scroller prototype was implemented on a system with a driver
that did “blit flips” even in fullscreen mode. That wasn’t a serious
problem, and I even managed to get retrace sync to work with it, to some
extent. (I never bothered switching to “half buffering”, so there would
be some tearing occasionally.)

Even if the driver gives you a flag or a
call back on vsync there is a reasonable chance that the call back is
generated from a timer or some other way that has nothing to do with
the actual vsync.

Well, that’s a broken driver IMHO. Nothing much to do about that - unless
it’s Open Source, of course.

In general, you just can’t know enough to justify doing anything to try
to sync with vertical retrace.

There’s no way I can agree, considering what I’ve seen so far.

I’m planning to release some form of demo RSN. I’m quite sure it will
work just fine on any Windows machine with OpenGL, or any other system
with an OpenGL driver that does retrace sync’ed pageflipping.

I have found that the best thing to do is just draw the graphics based
on actual elapsed time and check to see that the frame rate is fast
enough to give good game play.

Have you considered sub-pixel accurate rendering, and/or higher
resolutions? (Of course, both more or less require OpenGL.)

Anyway, the basic version of my “magic scroller” is to do exactly what
you’re describing. Provided you’re using OpenGL and a sensible resolution
on a decent machine, the frame rate most likely will be sufficient -
and if you pass the fractional part of your coordinates on to OpenGL,
there you go: Perfectly smooth scrolling!

That’s pretty much all there is to it, for machines that are fast enough
to take advantage of it. For slower machines, it’ll work just like what
you’re suggesting - although it might be a good idea to disable the
sub-pixel accuracy, at it’ll probably generate some blurring, if
anything.

(Do the test up front!)

Indeed.

This approach
gives your game a smooth visual feel. The frame you are looking at
reflects what the player expects at this time no matter how long it
took to create the frame. The frame rate may be jerky, but you don’t
get sudden slow downs and speed ups in the action.

Feel to ignore me and to disagree with me.

Well, this is what I do in Kobo Deluxe, and it seems to work very well,
so I can hardly disagree with this part.

However, I’m still 100% sure that a driver with retrace sync’ed flips +
sub-sample accurate rendering results in smoother animation. :slight_smile:

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Sunday 17 March 2002 23:43, Bob Pendleton wrote:

----- Original Message -----
From: “David Olofson” <david.olofson at reologica.se>
To:
Sent: Wednesday, March 13, 2002 5:18 PM
Subject: Re: [SDL] CSDL with quad-buffering and a seperate

[…]

Kobo Deluxe is a 2D game, and it runs pretty well with SDL’s
software blitters on all platforms, AFAIK. (Only tried Linux and
Windows myself.)

Ok, sorry, I thought it only used OpenGL through glSDL. Maybe 2D
performance is identical on both Win32 and X and I am doing something
[…]

I don’t think it will be identical, as Win32 seems to use busmaster DMA
for the blitting from system memory, while X cannot do that. Performance
with OpenGL should be similar, though, although that’s a different story
totally. (Kobo Deluxe doesn’t use any “procedural surfaces” as of yet, os
texture transfer rates are not important - only rendering speed is.)

And why, for Pete’s sake, for a game like Kobo Deluxe use procedural textures?
I mean, it’s not like you’re writing the next be-all/is-all 3D game development RAD tool or something, right?

However, s/w rendering seems to have a major problem in the system ->
VRAM transfer here. Seems like software rendering games will be
physically restricted to about the speed they’re running at now (and
have been running at for quite a while), while games relying on h/w
acceleration are the ones that really benefit from the continous
increase in computing power.

All depends on how a programmer goes about the problem. There’s always
lots of different ways to solve a problem, after all. If they were to
render everything into a system memory surface to compose the scene,
and then blit that to the screen itself once a frame, CPU speed would
have a much larger impact.

Yes - but only if the buffer is transfered using busmaster DMA. If the
CPU has to do it, there goes your framerate. It’s that serious, really.

But… if the CPU has to do the tranfer too, his comment still applies: CPU does influence the framerate
a lot, especially if you are not using busmaster DMA…

Might even become a wise idea down the road
if CPUs increase a lot more but system -> VRAM speeds stay the same.
Going OpenGL most likely would still be wiser, though, considering the
focus of video card makers.

And besides, transfers from system RAM to texture RAM should be DMA
accelerated as well, on most modern cards, so you can only win, almost
regardless of what kind of rendering you’re doing.

Yep.>On Saturday 16 March 2002 11:00, Jason Hoffoss wrote:

19-3-2002 2:34:39, David Olofson <david.olofson at reologica.se> wrote:>On Sunday 17 March 2002 23:43, Bob Pendleton wrote:

Jason Hoffoss wrote:

----- Original Message -----
From: “David Olofson” <david.olofson at reologica.se>
To:
Sent: Wednesday, March 13, 2002 5:18 PM
Subject: Re: [SDL] CSDL with quad-buffering and a seperate
flip-thread

I don’t see how you can ever get smooth animation without rendering
exactly one frame per refresh…

If you try to sync to the vertical sync rate you have to deal with all
possible rates. A quick look at a couple of monitor manuals on my shelf
shows that one supports vsync rates of 60, 70, 75, and 80 Hz. The other
one supports vsync rates of between 50 and 160 Hz. Look at screen
properties under windows I find I can set the vsync rate on my current
desktop to 60, 70, 72, 80 and 100 Hz. These are the rate available for
my video card with my monitor. The same monitor connected to a
different video card, and vice versa, will have different rates.

Fine with me - as long as I get to render one frame for every refresh, or
at least, have a chance of knowing which refreshes I’m supposed to
"produce" frames for.

This is the problem with trying to base you animation on the vsync
rate.

That’s a completely different thing, and I’d never dream of it. It’s just
not flexible enough - you can’t even change the scroll speed in the game
without breaking it!

BTW, does Kobo Deluxe look like it’s controlled by the refresh rate? (I
hope not! :slight_smile:

What is it? What will it be on other computers? The PC is NOT
like programming for a C64 or an old amigq where you can assume the
vsync rate is always the same. A game that works great when synched to
the refresh rate on one machine is going to look terrible when synched
to the refresh rate on another computer.

Yeah - if you do it that way… (Haven’t done that since the original
Project Spitfire, which relied on a VGA compatible video card anyway.
Wasn’t a problem then, but eventually, VGA hardware scrolling started
breaking on many video cards, and I never bothered switching to software
scrolling as I considered DOS dead anyway.)

So why bother?

Because you’re missing the whole point.
Synchronizing the rendering with the retrace is completely different
from synchronizing the whole game with it. For example, the engine I’m
using in Kobo Deluxe lets you select any “logic” frame rate you like, and
then interopolates all coordinates up or down to the actual rendering
frame rate, as required.

Now, if the rendering is implemented as “native OpenGL”, utilizing the
fractional parts of the output coordinates (8 bits right now…) for
sub-pixel accurate rendering, you get incredibly smooth scrolling and
animation regardless of refresh rate, or “logic frame rate”/refresh rate
ratio.

This is a fact,

We’re not doubting you immediately here, relax…

and I’ve implemented it and seen the result. It’s very
real, and totally portable, if it wasn’t for ONE thing: Linux basically
doesn’t have OpenGL drivers that perform retrace sync’ed flips - so
you’re not seeing anything but the usual tearing and crap to be expected
with such drivers. Still works though, it just doesn’t look any better
than the drivers allow.

But this would run fine on Windows, ofcourse… hint, hint…

It looks to me like you are making a number of assumptions. If I am
wrong, please tell me.

(See above. :slight_smile:

It looks like you are assuming that the vsync
rate is the same on all machines.

Nope.

You can count on it being different.
that means you don’t know how much time you actually have to render a
frame.

If I don’t have enough power for the actual refresh rate, I’ll just have
to accept it, and “switch down” to a lower frame rate. Things would be
easier and look better if there was a reliable way of finding out exactly
which frames are dropped, but things would “kind of” work even if that
isn’t solved properly.

You don’t even know the speed and model of the CPU the game will
run on. You might have a million instructions per vsync, and you might
have 100 million.

You get the frame rate your machine can handle, and if that’s less than
full frame rate, the smoothness suffers. No news there.

It also looks to me like you are assuming that you
will have exclusive use of the CPU. This is not true on any of the
widely used OSes. There are a number of tasks that will sneak in and
grab cycles.

Right, and that’s a problem. Nothing much to do about that. However, it
gets much worse if you just hog the CPU, without ever blocking on
anything. (Like the retrace, when you’re out of buffers…)

Anyway, this won’t be much of a problem with Linux in the future. In
fact, it isn’t much of a problem if you use a Linux/lowlatency kernel,
or even one of the current preemptive kernels. If you’re entitled to run,
your thread will wake up within a fem ms, at most, and if it was
properly blocking on the retrace (real IRQ or timer emulation), things
would be just great.

But which kind of people are recompiling their kernel to play a game?
I mean, I wouldn’t release a game stating: Windows XP only…

We’ve been doing audio processing like this for quite a while (<3 ms
latency) with no drop-outs ever, despite heavy system stress of all
kinds. I’d say that anyone that claims that a “destop system” could never
do that is plain wrong - I have seen it in action on various PCs.

I would not say that, neither was the previous poster. His argument was
not about systems not being able to do this at all ever, but that CURRENT
systems are not able to do this. Ofcourse, some exotic build of
some kind of OS can do anything you want… But that is just not your
audience at this moment. You could just as well develop for Amiga again,
then you’ll get fixed framerates, for example…

The fact that Windows can’t do it reliably, even with a real time kernel,
does not prove anything other than that there is lots of misbehaving code
in it, and it’s drivers.

Hey, hey, at least it has got accelerated s/w->h/w blits + vsync’ed flips.
Let’s start at the beginning: First have that, than talk about misbehaving
code and drivers?

It seems like you are assuming that you know that each
frame can be rendered in roughly the same amount of time.

I am, in the case of 2D games. Indeed, you could max out a 3D card with
2D levels as well, but that would be a very crowded screen with any
reasonably modern card… :slight_smile:

I have maxed out a 3D card (GeForce 2 MX 400 with 32 MB) blitting a 16-bits
4-layer parallax scrolling 2d-game-like screen, it could only do 30 fps on a
Athlon 1.2 Ghz??? Ofcourse, I was using plain OpenGL, no optimization, but
I mean: WTF? Okay, I’ll probably need to dive more into OpenGL for that :slight_smile:

Level
designers can really mess that up. One long hall with too many polygons
and your frame rate can drop by 90%…

Yeah - but that sounds more like an example of poor level design to me.
(I did read up some on 3D level design.)

And that really depends on what the intended audience of the game is.
Dropping your fps by 90% when it could’ve been 400 fps in a normal
scene isn’t that bad, is it? (I mean, an occasional 40 fps is not really
noticable). So, just up your minimum system specs for your engine,
the level designers will work with that as a minimum.

Either way, if the fram rate drops, so be it. The only to completely
avoid the risk of that is to very carefully tune the game for a specific
machine, and then never run it on anything that could possibly be
slower under any circumstances.

An arcade game would be an example of such a solution - but considering
that even my dog slow G400 seems to be fast enough for what I want to do,
I’d guess 90% of the users with OpenGL would get a constant full frame
rate with the kind of games I’m thinking about.

It looks like you are also
making assumptions about the quality of the video device driver. There
is a good chance (almost certain) that the driver will only do a swap
at vsync time. A lot of hardware won’t swap at any other time.

Well, the last machine I saw that was at all capable of anything but that
was the Amiga, which didn’t have buffered video DMA pointers. You had to
use the copper to change them when you desired. Usually, you’d just reset
them to the start of the screen some time during the retrace, but you
could change the pointers at any time, to implement independently h/w
scrolled splitscreens and the like.

PC video cards just latch the frame buffer offset/pointer into an
internal counter right when it’s about to start a refresh. You can’t
force them to latch during the refresh - and it would be usesless anyway,
as it would just result in the output restarting at the first pixel of
the buffer immediately when you write the register… heh

Eeehmm… He meant: A lot of hardware
won’t swap at any other time than vsync, meaning you’ll have to wait for
the next vsync before your buffer is drawn. He was wondering whether or
not a lot of drivers are able to start a swap before the screen is done
drawing. And this is possible: It is called the ‘tearing’ effect?

There is
also a good chance that the driver is implementing a triple buffer
scheme if it can.

In fact, that’s exactly what I would want, as that gives me a much
better chance of maintaining a steady full frame rate, even if the OS has
crappy timing and/or my rendering times are “jittering” near the frame
rate limit.

Windoze can do it, if enough hw-memory is available…

The driver writers are highly motivated to make
graphics run as fast as possible on the device. (That is how they sell
video cards after all.)

That’s also the way I like it. What’s the problem? (Of course, disabling
the retrace sync sucks, but then again, I don’t think I’ve seen a Windows
driver that does that by default… It’s only meant for benchmarks.)

And, it also looks like you are assuming that a
hardware buffer swap involves and actual buffer swap, this also may not
be true, especially when dealing with a windowing system of any kind.
It can be faster to use the hardware to render in an off screen buffer
and do the swap with a blit.

I’m perfectly aware of that - as a matter of fact, my first subpixel
accurate scroller prototype was implemented on a system with a driver
that did “blit flips” even in fullscreen mode. That wasn’t a serious
problem, and I even managed to get retrace sync to work with it, to some
extent. (I never bothered switching to “half buffering”, so there would
be some tearing occasionally.)

Ofcourse, we are still talking Linux/DOS here then? Otherwise, I don’t
see the use opposed to the DirectX-native flips?

Even if the driver gives you a flag or a
call back on vsync there is a reasonable chance that the call back is
generated from a timer or some other way that has nothing to do with
the actual vsync.

Well, that’s a broken driver IMHO. Nothing much to do about that - unless
it’s Open Source, of course.

Or unless the creators receive an e-mail from angry users/developers? But
indeed, if a driver promises to do this, why would it do it wrongly in the first
place? I mean, anyone would notice the crap appearing at their screens
immediately with 2D games, for example?

In general, you just can’t know enough to justify doing anything to try
to sync with vertical retrace.

There’s no way I can agree, considering what I’ve seen so far.

I’m planning to release some form of demo RSN. I’m quite sure it will
work just fine on any Windows machine with OpenGL, or any other system
with an OpenGL driver that does retrace sync’ed pageflipping.

And, even more obviously, if you don’t know the vsync, forget about
steady screen updates. (My sister gets sick of games not adhering to the
vsync)

I have found that the best thing to do is just draw the graphics based
on actual elapsed time and check to see that the frame rate is fast
enough to give good game play.

Have you considered sub-pixel accurate rendering, and/or higher
resolutions? (Of course, both more or less require OpenGL.)

But a higher resolution would just decrease the framerate, right?
And sub-pixel accurate rendering will do nothing more than making
compensations for lost/won time more smoothly visible? If that’s what
you meant with the above comment, then Ok…

Anyway, the basic version of my “magic scroller” is to do exactly what
you’re describing. Provided you’re using OpenGL and a sensible resolution
on a decent machine, the frame rate most likely will be sufficient -
and if you pass the fractional part of your coordinates on to OpenGL,
there you go: Perfectly smooth scrolling!

That’s pretty much all there is to it, for machines that are fast enough
to take advantage of it. For slower machines, it’ll work just like what
you’re suggesting - although it might be a good idea to disable the
sub-pixel accuracy, at it’ll probably generate some blurring, if
anything.

Blur?

(Do the test up front!)

Indeed.

This approach
gives your game a smooth visual feel. The frame you are looking at
reflects what the player expects at this time no matter how long it
took to create the frame. The frame rate may be jerky, but you don’t
get sudden slow downs and speed ups in the action.

Feel to ignore me and to disagree with me.

Well, this is what I do in Kobo Deluxe, and it seems to work very well,
so I can hardly disagree with this part.

However, I’m still 100% sure that a driver with retrace sync’ed flips +
sub-sample accurate rendering results in smoother animation. :slight_smile:

And I do agree on that… Over and out…

Well, I never did with XKobo/Kobo Deluxe (still running at exactly
33.3333 fps), and it happily generates graphics coordinates for any frame
rate in happens to achieve. Tried with up to several hundred fps on fast
OpenGL cards with retrace sync disabled, all the way down to some 10-15
fps on some dog slow machines around where I work… :slight_smile:

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Monday 18 March 2002 09:04, St0fF 64 wrote:

At 16:43 17.03.2002 -0600, you wrote:

In general, you just can’t know enough to justify doing anything to
try to sync with vertical retrace.

I have found that the best thing to do is just draw the graphics based
on actual elapsed time and check to see that the frame rate is fast
enough to give good game play. (Do the test up front!) This approach
gives your game a smooth visual feel. The frame you are looking at
reflects what the player expects at this time no matter how long it
took to create the frame. The frame rate may be jerky, but you don’t
get sudden slow downs and speed ups in the action.

Feel to ignore me and to disagree with me.

Agree … I found out that ppl who’d like CONSTANT framerates just
don’t like to make the step of generating time-dependant routines out
of their frame-dependant routines.

[…]

Kobo Deluxe is a 2D game, and it runs pretty well with SDL’s
software blitters on all platforms, AFAIK. (Only tried Linux and
Windows myself.)

Ok, sorry, I thought it only used OpenGL through glSDL. Maybe 2D
performance is identical on both Win32 and X and I am doing
something

[…]

I don’t think it will be identical, as Win32 seems to use busmaster
DMA for the blitting from system memory, while X cannot do that.
Performance with OpenGL should be similar, though, although that’s a
different story totally. (Kobo Deluxe doesn’t use any “procedural
surfaces” as of yet, os texture transfer rates are not important -
only rendering speed is.)

And why, for Pete’s sake, for a game like Kobo Deluxe use procedural
textures? I mean, it’s not like you’re writing the next be-all/is-all
3D game development RAD tool or something, right?

Look at the map, consider the fact that it’s chase-scrolling the player
in the latest versions, and then think again…

Besides, it’s the only way you can do “real” pixel effects with OpenGL
without forcing the driver to wait for all rendering to finish, read the
pixels from VRAM (ouch!), give them to you to modify and then take them
back from you as a texture and blit that, or directly copy it into VRAM.

The latter doesn’t apply to Kobo Deluxe, as it uses only sprites and
simple alpha or colorkeyed blits, but it seems that custom pixel effects
are a rather popular reason to dismiss OpenGL acceleration as useless for
2D, so I thought I should explain that it’s not quite that bad. In fact,
procedural textures have been used in 3D games for a good while, so the
"framework" is already in place, and is pretty fast.

However, s/w rendering seems to have a major problem in the system
-> VRAM transfer here. Seems like software rendering games will be
physically restricted to about the speed they’re running at now
(and have been running at for quite a while), while games relying
on h/w acceleration are the ones that really benefit from the
continous increase in computing power.

All depends on how a programmer goes about the problem. There’s
always lots of different ways to solve a problem, after all. If
they were to render everything into a system memory surface to
compose the scene, and then blit that to the screen itself once a
frame, CPU speed would have a much larger impact.

Yes - but only if the buffer is transfered using busmaster DMA. If
the CPU has to do it, there goes your framerate. It’s that serious,
really.

But… if the CPU has to do the tranfer too, his comment still applies:
CPU does influence the framerate a lot, especially if you are not
using busmaster DMA…

Well, yes, if in fact the rendering takes a significant amount of time in
relation to the “blit”. In my experience, your rendering code has to be
pretty darn slow to have any visible impact on the frame rate, if the
blits to VRAM are done with the CPU - unless of course, you’re doing
alpha blending and other CPU heavy stuff.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Tuesday 19 March 2002 02:45, Martijn Melenhorst wrote:

On Saturday 16 March 2002 11:00, Jason Hoffoss wrote:

Actually, the NVidia drivers will perform sync to vblank if the
__GL_SYNC_TO_VBLANK env variable is set to a non-zero value :slight_smile:

So, I suppose that one could have a shell script that started their game
and set this env variable in hopes that they had an NVidia card (which
most people who want real OpenGL conformance/performance under Linux
and own a consumer level card do…)

-EvilTypeGuyOn Tue, Mar 19, 2002 at 02:34:39AM +0100, David Olofson wrote:

This is a fact, and I’ve implemented it and seen the result. It’s very
real, and totally portable, if it wasn’t for ONE thing: Linux basically
doesn’t have OpenGL drivers that perform retrace sync’ed flips - so
you’re not seeing anything but the usual tearing and crap to be expected
with such drivers. Still works though, it just doesn’t look any better
than the drivers allow.

grumble This putenv crap for Linux OpenGL drivers absolutely must die.
Somebody please remind me to rant about that to someone at NVIDIA…On Mon, Mar 18, 2002 at 08:40:33PM -0600, EvilTypeGuy wrote:

This is a fact, and I’ve implemented it and seen the result. It’s very
real, and totally portable, if it wasn’t for ONE thing: Linux basically
doesn’t have OpenGL drivers that perform retrace sync’ed flips - so
you’re not seeing anything but the usual tearing and crap to be expected
with such drivers. Still works though, it just doesn’t look any better
than the drivers allow.

Actually, the NVidia drivers will perform sync to vblank if the
__GL_SYNC_TO_VBLANK env variable is set to a non-zero value :slight_smile:

So, I suppose that one could have a shell script that started their game
and set this env variable in hopes that they had an NVidia card (which
most people who want real OpenGL conformance/performance under Linux
and own a consumer level card do…)


Joseph Carter <-- That boy needs therapy

  • Knghtktty whispers sweet nothings to Thyla (stuff about compilers and
    graphics and ram upgrades and big hard drives…)
    oooooooooOOOOOOOOOO
    Knghtktty: that’s positively pornographic…
  • Thyla goes off into fits of ecstasy…

-------------- next part --------------
A non-text attachment was scrubbed…
Name: not available
Type: application/pgp-signature
Size: 273 bytes
Desc: not available
URL: http://lists.libsdl.org/pipermail/sdl-libsdl.org/attachments/20020318/9eb3a88b/attachment.pgp

[…]

This is a fact,

We’re not doubting you immediately here, relax…

One starts to wonder at times… :wink:

and I’ve implemented it and seen the result. It’s very
real, and totally portable, if it wasn’t for ONE thing: Linux
basically doesn’t have OpenGL drivers that perform retrace sync’ed
flips - so you’re not seeing anything but the usual tearing and crap
to be expected with such drivers. Still works though, it just doesn’t
look any better than the drivers allow.

But this would run fine on Windows, ofcourse… hint, hint…

Yeah, I know. As I think I mentioned somewhere, I’ll throw some kind of
demo together soon.

[…]

Anyway, this won’t be much of a problem with Linux in the future. In
fact, it isn’t much of a problem if you use a Linux/lowlatency
kernel, or even one of the current preemptive kernels. If you’re
entitled to run, your thread will wake up within a fem ms, at most,
and if it was properly blocking on the retrace (real IRQ or timer
emulation), things would be just great.

But which kind of people are recompiling their kernel to play a game?

Some do anyway, as it’s the only way they can get OpenGL working - but
hopefully, you should rarely need this with recent distros…

Besides, this stuff (kernel preemption) is already in Linux 2.5, so you
won’t need any patches or anything after that is released.

Anyway, Linux/lowlatency or the preemptive kernels just cut the
occasional peaks that you’ll see if you try to get real time scheduling
on a heavily loaded system. They’re not required to get the average
scheduling latency down to the current fraction of a millisecond.

So, unless your game is playing some 50 tracks of CD quality WAV audio
from the hard drive, you’ll most probably be fine with the current
standard kernels. There might be a dropped frame occasionally, but it
should definitely not be any worse than with any Windows version.

We’ve been doing audio processing like this for quite a while (<3 ms
latency) with no drop-outs ever, despite heavy system stress of all
kinds. I’d say that anyone that claims that a “destop system” could
never do that is plain wrong - I have seen it in action on various
PCs.

I would not say that, neither was the previous poster. His argument was
not about systems not being able to do this at all ever, but that
CURRENT systems are not able to do this.

Right, I was losing track a bit there - but the fact still is that even
Windows will schedule your threads in time. I don’t consider the
occasional dropped frame a valid reason to entirely give up on smooth
animation.

Ofcourse, some exotic build of
some kind of OS can do anything you want… But that is just not your
audience at this moment. You could just as well develop for Amiga
again, then you’ll get fixed framerates, for example…

Well, if people can put up with just an occasional minor hickup (rather
than the usual constant tearing and/or jittering), my solution should
work just great on Windows, BeOS, Mac OS, Mac OS X and even Linux with
some drivers.

My reasoning was basically about never dropping a frame - and of
course, that kind of quality cannot be achieved without a real time OS.
Linux/lowlatency and BeOS should be able to do it, but not standard
Linux, or any Windows version. Don’t know about the later versions of Mac
OS X - could do it eventually, maybe.

The fact that Windows can’t do it reliably, even with a real time
kernel, does not prove anything other than that there is lots of
misbehaving code in it, and it’s drivers.

Hey, hey, at least it has got accelerated s/w->h/w blits + vsync’ed
flips. Let’s start at the beginning: First have that, than talk about
misbehaving code and drivers?

Yep - but that reasoning was about hard real time (as in “never ever
dropping a frame”), and people are already using Linux for that kind of
stuff in industrial applications. (Mostly RTLinux and RTAI, but
Linux/lowlatency is used alone or together with one of those in some
cases.)

Some are using NT + some RTK as well, but they usually pay with one or
two serious failures (broken machines, chemicals all over the place,
fire, possibly personal injuries or even death…) and then switch to QNX
or some other real RTOS, or use dedicated hardware for the most critical
parts if possible.

It seems like you are assuming that you know that each
frame can be rendered in roughly the same amount of time.

I am, in the case of 2D games. Indeed, you could max out a 3D card
with 2D levels as well, but that would be a very crowded screen
with any reasonably modern card… :slight_smile:

I have maxed out a 3D card (GeForce 2 MX 400 with 32 MB) blitting a
16-bits 4-layer parallax scrolling 2d-game-like screen, it could only
do 30 fps on a Athlon 1.2 Ghz??? Ofcourse, I was using plain OpenGL,
no optimization, but I mean: WTF? Okay, I’ll probably need to dive more
into OpenGL for that :slight_smile:

What did you expect? Not even a Quake 3 level designed by a sane person
will abuse a card like that! :wink:

Of course, fill rate is not unlimited - but even my G400 can fill a
640x480 screen a few times per frame up to 100 fps or so. Triangle count
certainly isn’t a problem, so just get in there and do some basic 2D
overdraw elimination. :slight_smile:

Oh, and if you’re lazy, the Z-buffer should help a great deal…

Level
designers can really mess that up. One long hall with too many
polygons and your frame rate can drop by 90%…

Yeah - but that sounds more like an example of poor level design to
me. (I did read up some on 3D level design.)

And that really depends on what the intended audience of the game is.
Dropping your fps by 90% when it could’ve been 400 fps in a normal
scene isn’t that bad, is it? (I mean, an occasional 40 fps is not
really noticable). So, just up your minimum system specs for your
engine, the level designers will work with that as a minimum.

Yeah, you’re right about that of course.

Still doesn’t mean it doesn’t apply to 2D games, or games that are meant
to maintain a steady full frame rate; just select your minimum system and
start building levels for it…

Either way, if the fram rate drops, so be it. The only to completely
avoid the risk of that is to very carefully tune the game for a
specific machine, and then never run it on anything that could
possibly be slower under any circumstances.

An arcade game would be an example of such a solution - but
considering that even my dog slow G400 seems to be fast enough for
what I want to do, I’d guess 90% of the users with OpenGL would get a
constant full frame rate with the kind of games I’m thinking about.

It looks like you are also
making assumptions about the quality of the video device driver.
There is a good chance (almost certain) that the driver will only do
a swap at vsync time. A lot of hardware won’t swap at any other
time.

Well, the last machine I saw that was at all capable of anything but
that was the Amiga, which didn’t have buffered video DMA pointers.
You had to use the copper to change them when you desired. Usually,
you’d just reset them to the start of the screen some time during the
retrace, but you could change the pointers at any time, to implement
independently h/w scrolled splitscreens and the like.

PC video cards just latch the frame buffer offset/pointer into an
internal counter right when it’s about to start a refresh. You can’t
force them to latch during the refresh - and it would be usesless
anyway, as it would just result in the output restarting at the
first pixel of the buffer immediately when you write the
register… heh

Eeehmm… He meant: A lot of hardware
won’t swap at any other time than vsync, meaning you’ll have to wait
for the next vsync before your buffer is drawn. He was wondering
whether or not a lot of drivers are able to start a swap before the
screen is done drawing.

Any driver that’s capable of triple buffering should be able to do it any
time before the next fram is to start.

Anyway, on the driver level, flipping and syncing are usually separate
things; flipping is always sync’ed automatically as a result of the
pointer/counter only being latched when the refresh starts, while
synchronization is about synchronizing the CPU with that event in one way
or another.

And this is possible: It is called the
’tearing’ effect?

Nope, that’s what you get if you “flip” by blitting into the frame buffer
without any synchronization. Most hardware can’t be made to do that with
real h/w pageflipping, as it would require that the flipping is
implemented through bank switching or similar, in a way that forces the
RAMDAC to fetch every pixel directly through the bank switching adress
logic. (Or it would just go on and finish the frame it’s on, ignoring
anything you write into the registers, until it’s time for the next
frame.)

There is
also a good chance that the driver is implementing a triple buffer
scheme if it can.

In fact, that’s exactly what I would want, as that gives me a much
better chance of maintaining a steady full frame rate, even if the OS
has crappy timing and/or my rendering times are “jittering” near the
frame rate limit.

Windoze can do it, if enough hw-memory is available…

Yep, I know that too…

The driver writers are highly motivated to make
graphics run as fast as possible on the device. (That is how they
sell video cards after all.)

That’s also the way I like it. What’s the problem? (Of course,
disabling the retrace sync sucks, but then again, I don’t think I’ve
seen a Windows driver that does that by default… It’s only meant
for benchmarks.)

And, it also looks like you are assuming that a
hardware buffer swap involves and actual buffer swap, this also may
not be true, especially when dealing with a windowing system of any
kind. It can be faster to use the hardware to render in an off
screen buffer and do the swap with a blit.

I’m perfectly aware of that - as a matter of fact, my first subpixel
accurate scroller prototype was implemented on a system with a driver
that did “blit flips” even in fullscreen mode. That wasn’t a serious
problem, and I even managed to get retrace sync to work with it, to
some extent. (I never bothered switching to “half buffering”, so
there would be some tearing occasionally.)

Ofcourse, we are still talking Linux/DOS here then?

Linux/XFree86 with modified Utah-GLX drivers. (I never did OpenGL
programming in DOS… :slight_smile:

Otherwise, I don’t
see the use opposed to the DirectX-native flips?

Right.

Even if the driver gives you a flag or a
call back on vsync there is a reasonable chance that the call back
is generated from a timer or some other way that has nothing to do
with the actual vsync.

Well, that’s a broken driver IMHO. Nothing much to do about that -
unless it’s Open Source, of course.

Or unless the creators receive an e-mail from angry users/developers?

Yeah…

But indeed, if a driver promises to do this, why would it do it wrongly
in the first place? I mean, anyone would notice the crap appearing at
their screens immediately with 2D games, for example?

Yeah, seems strange to me as well… Why spend time setting up some kind
of “timer emulation” and not care to do it right? Trying to avoid
implementing a required feature for the platform by cheating,
perhaps…?

In general, you just can’t know enough to justify doing anything to
try to sync with vertical retrace.

There’s no way I can agree, considering what I’ve seen so far.

I’m planning to release some form of demo RSN. I’m quite sure it will
work just fine on any Windows machine with OpenGL, or any other system
with an OpenGL driver that does retrace sync’ed pageflipping.

And, even more obviously, if you don’t know the vsync, forget about
steady screen updates. (My sister gets sick of games not adhering to
the vsync)

That bad? Well, if nothing else, it indicates that these things make a
difference.

I have found that the best thing to do is just draw the graphics
based on actual elapsed time and check to see that the frame rate is
fast enough to give good game play.

Have you considered sub-pixel accurate rendering, and/or higher
resolutions? (Of course, both more or less require OpenGL.)

But a higher resolution would just decrease the framerate, right?

Unless you still have power to max out the frame rate… And I’m thinking
about “traditional” 2D action game resolutions such as 320x240 - so
"higher" resolutions could include even 640x480.

Besides, if you design an action game for 640x480 or higher, you’ll
probably not use detail smaller than some two pixels “frequency wise”, so
the interpolation “pulsar effect” wouldn’t be a problem anyway.

And sub-pixel accurate rendering will do nothing more than making
compensations for lost/won time more smoothly visible? If that’s what
you meant with the above comment, then Ok…

Well, it will simply place objects exactly where they should be every
frame, rather than at the nearest integer pixel position. Whenever
scrolling or movement speeds are not integer multiples of the rendering
frame rate, this makes a big difference, especially for low scrolling
speeds. It’s possible to make a background scroll so slowly that you can
hardly see that it’s moving without checking the edges - and still have
it completely smooth.

Anyway, the basic version of my “magic scroller” is to do exactly what
you’re describing. Provided you’re using OpenGL and a sensible
resolution on a decent machine, the frame rate most likely will be
sufficient - and if you pass the fractional part of your coordinates
on to OpenGL, there you go: Perfectly smooth scrolling!

That’s pretty much all there is to it, for machines that are fast
enough to take advantage of it. For slower machines, it’ll work just
like what you’re suggesting - although it might be a good idea to
disable the sub-pixel accuracy, at it’ll probably generate some
blurring, if anything.

Blur?

Caused by the interpolation, whenever you don’t happen to hit right on
integer pixel positions. It’s not really visible as long as the graphics
data is free of very small detail, and/or graphics rendered or drawn
without proper antialiazing.

//David Olofson — Programmer, Reologica Instruments AB

.- M A I A -------------------------------------------------.
| Multimedia Application Integration Architecture |
| A Free/Open Source Plugin API for Professional Multimedia |
----------------------------> http://www.linuxdj.com/maia -' .- David Olofson -------------------------------------------. | Audio Hacker - Open Source Advocate - Singer - Songwriter |-------------------------------------> http://olofson.net -'On Tuesday 19 March 2002 03:17, Martijn Melenhorst wrote: