Fullscreen video performance vs. running in window & latency issues

Benno_Senoner · October 30, 1999, 4:55pm

Hi,

( for linux-audio-dev readers: not directly audio related but low-latency
related, and we want our realtime audio apps deliver reliable video too )

I have a few questions about the actual performance you can achieve doing
graphics output using direct access techniques.

In the case of SDL in fullscreen: is the Xserver still involved in the
displaying process ?
That means: cal you achieve true hardware speed by writing directly into
the framebuffer memory (double buffering), and then flip buffers,
or are you forced to write to an intermediate graphics buffer, which is
updated by the xserver ?

I ask this because Linux is becoming low-latency capabilities
( patches available, very probably scheduled to go into kernel 2.4)
thanks to the exeptional work of Ingo Molnar who found the problematic,high
latency causing spots in the kernel.

This feature is wonderful for audio since it allows one to run
realtime audio processing with very low latency ( <3ms on a P133 !)
extremely reliably even when you stress the machine with huge
disk transfers / graphics output etc.

see the whole story, accurate tests with charts, kernel patch
and my latency measurement tool at my page:

http://www.gardena.net/benno/linux/audio

Now to my video questions:
typical video latencies are much higher than audio latencies:
realtime audio <5-10ms
realtime video <20-50ms

You can only achieve rock solid low latencies,
by using SCHED_FIFO scheduling in your process and make sure that
the pages are locked in mem by using mlock() / mlockall().

When the system is stressed with disk access, regular processes
(SCHED_OTHER) can experience bad latencies because the disk driver
/ bdflush can steal CPU time to these processes. (especially in the EIDE disk
case).
And one of these processes could be your X server.
The idea or running your X server SCHED_FIFO / SCHED_RR and mlock()ed in mem,
is simply silly, because it could eat up your entire memory and/or freeze your
system (during heavy font rendering etc).

A typical example of high performance video playback + rock solid low-latency
could be DVD playback (by software-only decoder) without frame dropouts/audio
dropouts when there is something running in background.

Assume that your DVD software player uses 50% of the CPU for the decoding,
then if in fullscreen mode the SDL (or Xv / DGA) allows you to write directly
to the framebuffer, using a low-latency-enabled kernel would provide wonderful
performance:
the video would not loose any video frames/audio fragments as long your run
SCHED_FIFO+mlock()ed , even if you do heavy disk access and/or CPU work.

What in the case that I watch my DVD movie in a window ?
Is the X server now involved ? = potential frame losses due to other processes
running in background ?
Or is there a way to use an overlay mask on GFX boards which does support
this ?
It would be ideal to specify "give me direct access to the rectancle 300x200
startinng at position (150,170) and leave out the X server of the business"
If the user changes the position/size of the video window, just update
the overlay region in the gfx board in order to reflect changes.

Could please the SDL / Xv / DGA experts explain the actual / planned
capabilities in this field.

Of course these features would put Linux much ahead in the realtime video field,
and on the same level playing field as BeOs. ( Linux becoming the mediaOS
soon )

regards,
Benno.

slouken · October 31, 1999, 2:34pm

In the case of SDL in fullscreen: is the Xserver still involved in the
displaying process ?

No.

That means: cal you achieve true hardware speed by writing directly into
the framebuffer memory (double buffering), and then flip buffers,
or are you forced to write to an intermediate graphics buffer, which is
updated by the xserver ?

You can achieve true hardware speed by writing directly into framebuffer
memory and then flipping buffers. Note that in many cases true hardware
speed is often slower than writing into an intermediate graphics buffer
and then letting the X server use the card’s blitter to perform an
accelerated blit from system memory.

I ask this because Linux is becoming low-latency capabilities
( patches available, very probably scheduled to go into kernel 2.4)
thanks to the exeptional work of Ingo Molnar who found the problematic,high
latency causing spots in the kernel.

I am very much looking forward to working with this, as well as the new
X server architecture in XFree86 4.0. These alone should improve game
performance dramatically on Linux in the near future.

What in the case that I watch my DVD movie in a window ?
Is the X server now involved ? = potential frame losses due to other processes
running in background ?

Yes.

Or is there a way to use an overlay mask on GFX boards which does support
this ?

There are currently no drivers that I know of that support this directly
to user applications.

It would be ideal to specify "give me direct access to the rectancle 300x200
startinng at position (150,170) and leave out the X server of the business"
If the user changes the position/size of the video window, just update
the overlay region in the gfx board in order to reflect changes.

BeOS does this, but X does not at the moment.
The closest you can get to this under X11 is using the DGA fullscreen
extension, which SDL automatically supports via the SDL_FULLSCREEN flag
passed to SDL_SetVideoMode(). The DGA extension requires the
application to be running as root so it can memory map the video memory
into user-space.

Of course these features would put Linux much ahead in the realtime video field,
and on the same level playing field as BeOs. ( Linux becoming the mediaOS
soon )

It’s an exciting possibility. :)–
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

Benno_Senoner · October 31, 1999, 11:45pm

In the case of SDL in fullscreen: is the Xserver still involved in the
displaying process ?

No.

ah that is very good to know !

That means: cal you achieve true hardware speed by writing directly into
the framebuffer memory (double buffering), and then flip buffers,
or are you forced to write to an intermediate graphics buffer, which is
updated by the xserver ?

You can achieve true hardware speed by writing directly into framebuffer
memory and then flipping buffers. Note that in many cases true hardware
speed is often slower than writing into an intermediate graphics buffer
and then letting the X server use the card’s blitter to perform an
accelerated blit from system memory.

hmmm, does anyone know how DirectX under windoze solves this ?
If this is better, wouldn’t it be better to use the same features in
the SDL.
Perhaps the XFree4.0 driver model gives us a device independent API to the
hardware blitting capabilities, so that we can use this approach,
but with leaving out X:
write in main memory, and then cal the BitBlit provided by XFrees 4.0 driver,
but without going through the X event queue.

I ask this because Linux is becoming low-latency capabilities
( patches available, very probably scheduled to go into kernel 2.4)
thanks to the exeptional work of Ingo Molnar who found the problematic,high
latency causing spots in the kernel.

I am very much looking forward to working with this, as well as the new
X server architecture in XFree86 4.0. These alone should improve game
performance dramatically on Linux in the near future.

I can immagine running a 50fps game, and runnnig other apps in the background
(which use the spare CPU cycles), without slowing down the game.

What in the case that I watch my DVD movie in a window ?
Is the X server now involved ? = potential frame losses due to other processes
running in background ?

Yes.

Or is there a way to use an overlay mask on GFX boards which does support
this ?

There are currently no drivers that I know of that support this directly
to user applications.

It would be ideal to specify "give me direct access to the rectancle 300x200
startinng at position (150,170) and leave out the X server of the business"
If the user changes the position/size of the video window, just update
the overlay region in the gfx board in order to reflect changes.

BeOS does this, but X does not at the moment.
The closest you can get to this under X11 is using the DGA fullscreen
extension, which SDL automatically supports via the SDL_FULLSCREEN flag
passed to SDL_SetVideoMode(). The DGA extension requires the
application to be running as root so it can memory map the video memory
into user-space.

Yes, but you are talking about fullscreen, but when you do videoediting, you
often have to run the GUI too, in order to provide control elements, or maybe
even multiple video windows.

I think supporting hardware overlaying would help VERY much in
the video editing/playback (or gaming in a window ) field, making
Linux the OS of choice for this tasks.

Of course these features would put Linux much ahead in the realtime video field,
and on the same level playing field as BeOs. ( Linux becoming the mediaOS
soon )

It’s an exciting possibility.

Hopefully the XFree people will take this into account soon (by supporting
hardware overlay etc.)

I’m sick of these “Linux as a Desktop/Multimedia OS just sucks” !

regards,

Benno.On Sun, 31 Oct 1999, Sam Lantinga wrote:

David_Olofson · October 31, 1999, 10:18pm

In the case of SDL in fullscreen: is the Xserver still involved in the
displaying process ?

No.

ah that is very good to know !

That means: cal you achieve true hardware speed by writing directly into
the framebuffer memory (double buffering), and then flip buffers,
or are you forced to write to an intermediate graphics buffer, which is
updated by the xserver ?

You can achieve true hardware speed by writing directly into framebuffer
memory and then flipping buffers. Note that in many cases true hardware
speed is often slower than writing into an intermediate graphics buffer
and then letting the X server use the card’s blitter to perform an
accelerated blit from system memory.

hmmm, does anyone know how DirectX under windoze solves this ?

The some way. When in fullscreen mode, you can do just about as you
like; single buffer, double/triple buffer + flip or single + blit
from offscreen. When in windowed mode, you get the whole screen as a
surface, so you have to keep track of your window’s position
yourself.

Note 1: DX handles the clipping for you (don’t know how exactly).

Note 2: and some accelerators do not work in this mode.

If this is better, wouldn’t it be better to use the same features in
the SDL.
Perhaps the XFree4.0 driver model gives us a device independent API to the
hardware blitting capabilities, so that we can use this approach,
but with leaving out X:
write in main memory, and then cal the BitBlit provided by XFrees 4.0 driver,
but without going through the X event queue.

It has to go to the same accelerator hardware. I don’t know how
things will look with XFree86 4.0, but so far, X has been the driver,
so I can’t tell if you can avoid depending on the server’s thread…

Rip that layer out into a SCHED_FIFO thread? Risk of a synchronization
performance hit, I’m afraid…

Of course these features would put Linux much ahead in the realtime video field,
and on the same level playing field as BeOs. ( Linux becoming the mediaOS
soon )

It’s an exciting possibility.

Hopefully the XFree people will take this into account soon (by supporting
hardware overlay etc.)

I’m sick of these “Linux as a Desktop/Multimedia OS just sucks” !

Well, so does Windows. Or is it OK that many machines for some
reason can’t even run fullscreen without stalling every now and then?
Or that Win95/98 blows sky high and reboots, where Linux throws the
application out with a tiny “Segmentation fault” message?

It seems to be for lots of users, but I don’t have nerves for that
anymore… I want to be able to screw up without taking my system down
when hacking. The same goes for using audio plugins and editing
some audio…

What sucks with Linux is that there are too few usable applications
in this area - the rest is just about the performance limitations
that are now just about to be eliminated.

//David

?A?U?D?I?A?L?I?T?Y? P r o f e s s i o n a l L i n u x A u d i o

?Rock Solid                                      David Olofson:
?Low Latency    www.angelfire.com/or/audiality   ?Audio Hacker
?Plug-Ins            audiality at swipnet.se        ?Linux Advocate
?Open Source                                     ?Singer/ComposerOn Mon, 01 Nov 1999, Benno Senoner wrote:

On Sun, 31 Oct 1999, Sam Lantinga wrote:

Pierre_Phaneuf · November 1, 1999, 5:59pm

Sam Lantinga wrote:

In the case of SDL in fullscreen: is the Xserver still involved in the
displaying process ?

No.

Not even the page flip? I wouldn’t think so… Yet another context
switch required for a very simple operation like a page flip.

You can achieve true hardware speed by writing directly into framebuffer
memory and then flipping buffers. Note that in many cases true hardware
speed is often slower than writing into an intermediate graphics buffer
and then letting the X server use the card’s blitter to perform an
accelerated blit from system memory.

One case where this isn’t true is a case where you do some processing
between each machine word written to the framebuffer.

If you write a word (using PIO) right after another to the framebuffer,
you suffer a stall, which makes blitting a large area very slow. You can
avoid this by using the card’s blitter when available, which uses some
block transfer facility like DMA. But if you have some processing to do,
you can do it inside the blitting loop, using the time that would
normally be going into stalls.

But some chips (not all) can let you do this processing in a preliminary
pass in main memory and then do a full blit right afterward in the same
time or faster than this overlapping method.

I am very much looking forward to working with this, as well as the new
X server architecture in XFree86 4.0. These alone should improve game
performance dramatically on Linux in the near future.

Those patches imply using the realtime scheduling facilities in the
kernel, which require root privs… Aww, will we ever break free from
those?

I very much away XFree86 4.0 tho, I heard very good news from Raster
concerning Pixmap to Window XCopyArea (normally, they are just about as
fast as XShmImage to Window XShmPutImage, because the X server never
bothered to put Pixmaps in video memory and use the blitter (or is there
a way to get that? I’d like to know!))… And DGA 2.0 looks very
promising!

BeOS does this, but X does not at the moment.

DirectX also allows this (direct writing to video memory in a window),
but you can’t do page flipping, of course, you have to simulate it with
double-buffering.–
Pierre Phaneuf
Ludus Design, http://ludusdesign.com/
“First they ignore you. Then they laugh at you.
Then they fight you. Then you win.” – Gandhi

Pierre_Phaneuf · November 1, 1999, 7:13pm

Benno Senoner wrote:

Not even the page flip? I wouldn’t think so… Yet another context
switch required for a very simple operation like a page flip.

Agreed, if this is true then hopefully XFree 4.0 will avoid this.
(by calling the Blitter functions directly instead of going through the X
queue, correct me if I’m wrong)

No, I think XFree86 4.0 still does it that way. DGA 2.0 doesn’t seem to
load video card specific code in the application, instead asking the X
server to do this stuff. It only looks similar to DRI, it isn’t as
sophisticated (I do wish it was).

One case where this isn’t true is a case where you do some processing
between each machine word written to the framebuffer.

Yes but often you do not render the data in a linear fashion.

Doesn’t matter. What matter is avoiding consecutive writes to the
framebuffer memory. If you can do something else in between, it is
better, since you do not waste time in stalls.

Those patches imply using the realtime scheduling facilities in the
kernel, which require root privs… Aww, will we ever break free from
those?

Yes it would be nice for Linux to get the HP-UX way, by having a special
"realtime" group or something which you assign to "multimedia enabled"
users, so that this grop can call sched_set_scheduler() , mlock(),
mlockall() etc. Without this “hard realtime” performance isn’t guaranteed.
I think this will definitively be needed because we don’t want that all our
desktop apps requiring realtime like games/ audio/video apps etc, run as
root with potential security problems, and doing dangerous stuff.
Agreed ?

There is something being hacked on at this moment, a capability system,
that let you give some powers to a user without giving him full root
access.

DirectX also allows this (direct writing to video memory in a window),
but you can’t do page flipping, of course, you have to simulate it with
double-buffering.

How exactly is this done ?
overlay region1 , and then change the overlay pointer to region2,
or directly blitting in the gfx board mem (by syncing with the VBlank ?)

Directly blitting in the video memory. Overlays are available in
DirectDraw, but very few cards have them, so requiring them would be
quite dumb.–
Pierre Phaneuf
Ludus Design, http://ludusdesign.com/
“First they ignore you. Then they laugh at you.
Then they fight you. Then you win.” – Gandhi

Cameron_Layne · November 3, 1999, 8:19pm

Benno Senoner wrote:

What in the case that I watch my DVD movie in a window ?
Is the X server now involved ? = potential frame losses due to other processes
running in background ?

Yes.

Or is there a way to use an overlay mask on GFX boards which does support
this ?

I have a TV tuner card (which allows watching TV on the computer.)
It works in Windows, using a driver and application from the
manufacturer (Hauppauge). It also works in Linux, using the bttv driver
that came on RedHat 6.0 and the Xawtv application running in X. In both
cases there is an option in the application to use “overlay” mode (and
still run in a window.) In this mode, the TV application seems to
consume little or no CPU usage. The advantage of the non-overlay mode
is smooth movement when moving the window on the screen or when
partially covering the window with another application…> On Sun, 31 Oct 1999, Sam Lantinga wrote:

–
Cameron Layne

Garrett · November 3, 1999, 8:37pm

This might have been issued before but, if your running an application in
fullscreen like quake3 or any DGA program, and the program crashes, most of
the time the X server won’t be able to recover and you’ll be stuck in a
blank screen. Now is there a way to recover from this? I thought I heard
someone say something about running a script that ran the program and then
ran some video reinitializer after the main program. So when the program
crashed, the next program would be run in the script which would reinit the
video. Does this sound right?

-Mongoose WPI student majoring in Computer Science
This messge sent from Windoze… ugh.

Michael_Vance · November 3, 1999, 10:27pm

ran some video reinitializer after the main program. So when the program
crashed, the next program would be run in the script which would reinit the
video. Does this sound right?

For 3Dfx based programs, many people would run Quake{n} inside of
wrappers that ran ‘pass’ or my ‘reset3Dfx’ or some such after
execution to make sure that the game wasn’t left in the pass-through
mode.

As far as regular 2D DGA games, I’m not sure.

m.On Wed, Nov 03, 1999 at 03:37:35PM -0500, Garrett Banuk wrote:

–
Programmer "I wrote a song about dental floss,
Loki Entertainment Software but did anyone’s teeth get cleaner?"
http://lokigames.com/~briareos/ - Frank Zappa, re: the PMRC

Pierre_Phaneuf · November 4, 1999, 5:53pm

Garrett Banuk wrote:

This might have been issued before but, if your running an application in
fullscreen like quake3 or any DGA program, and the program crashes, most of
the time the X server won’t be able to recover and you’ll be stuck in a
blank screen. Now is there a way to recover from this? I thought I heard
someone say something about running a script that ran the program and then
ran some video reinitializer after the main program. So when the program
crashed, the next program would be run in the script which would reinit the
video. Does this sound right?

See XF86DGAForkApp.–
Pierre Phaneuf
Ludus Design, http://ludusdesign.com/
“First they ignore you. Then they laugh at you.
Then they fight you. Then you win.” – Gandhi

Andrei_de_A_Formiga · November 5, 1999, 4:15pm

The some way. When in fullscreen mode, you can do just about as you
like; single buffer, double/triple buffer + flip or single + blit
from offscreen. When in windowed mode, you get the whole screen as a
surface, so you have to keep track of your window’s position
yourself.

I don't think this is acurate. When in windowed mode, the

primary surface maps only the client area of your application
window, not the whole screen. And you don’t have to keep track of
your window’s position, the system does this for you, as usual.

Note 1: DX handles the clipping for you (don’t know how exactly).

Using a DirectDrawClipper object. This object is provided in

DirectDraw to create retangular clipping regions in any surface. In
windowed mode, you just create a DirectDrawClipper object that
covers the entire primary surface and you’re done.On 31 Oct 99, at 23:18, David Olofson wrote:

[]s, Andrei de A. Formiga
andrei at elogica.com.br
andrei at dee.ufpb.br
@Andrei_de_A_Formiga

Steve_Madsen · November 7, 1999, 1:19pm

Hi all,

I have been transferring a project which uses SDL
machine (same kernel but completely different hardware).

When I try to run
my program, I get a “video memory protecting” message
on stdout, then it dies. I believe that I’ve seen this
before from SDL. I’m sure I’ve just forgotten something
stupid. Any clues as to where I should be looking (like
X colordepth etc) to circumvent this error?

Thanks for any ideas,

Steve Madsen
H2Eye Ltd
24-28 Hatton Wall
London EC1N 8JH
Tel: +44-171-404-9600
Fax: +44-171-404-9490
Email: @Steve_Madsenfrom my main linux development box to another

Brenden_Tuck · November 7, 1999, 4:36pm

I just fixed this last night…
check your #include line.

It looks like previous installations
of SDL put all of the include files
into the /usr/local/include directory
by default, but they now put them into
a /usr/local/include/SDL directory.

–friar>Hi all,

I have been transferring a project which uses SDL
from my main linux development box to another
machine (same kernel but completely different hardware).

When I try to run
my program, I get a “video memory protecting” message
on stdout, then it dies. I believe that I’ve seen this
before from SDL. I’m sure I’ve just forgotten something
stupid. Any clues as to where I should be looking (like
X colordepth etc) to circumvent this error?

Thanks for any ideas,

Steve Madsen
H2Eye Ltd
24-28 Hatton Wall
London EC1N 8JH
Tel: +44-171-404-9600
Fax: +44-171-404-9490
Email: steve at h2eye.com

slouken · November 8, 1999, 4:48pm

Hi all,

I have been transferring a project which uses SDL
machine (same kernel but completely different hardware).

When I try to run
my program, I get a “video memory protecting” message
on stdout, then it dies. I believe that I’ve seen this
before from SDL. I’m sure I’ve just forgotten something
stupid. Any clues as to where I should be looking (like
X colordepth etc) to circumvent this error?

“video memory protecting” comes from the X libraries, and is a
harmless message. The real problem is that you have a segmentation fault
in your code. Try running it in a window under gdb.

-Sam Lantinga				(slouken at devolution.com)

Lead Programmer, Loki Entertainment Software> from my main linux development box to another

“Any sufficiently advanced bug is indistinguishable from a feature”
– Rich Kulawiec