SDL X11 blit/update performance tweaking?

Hi,

I’ve run into an unexpected performance bottleneck with SDL on my
linux/X11 system. The problem is that either SDL_UpdateRect() with
the X11 driver, or SDL_BlitSurface() with the DGA driver and a
SDL_HWSURFACE, take over 25ms for a screen update. However, my total
time budged per frame is 20ms, and there’s a little more to be done
than just updating the screen :-).

A few facts:

  • P4 1.7, i845E, GeForce2 GTS
  • linux 2.4.20 (with a few patches and hacks that shouldn’t affect X)
  • XFree86 4.1.0 (oldish, but so is the GeForce2;)
  • 32bpp X visual, updated from a 1024x768 32b ‘native’ format surface
  • the surface flags do not report SDL_HWACCEL (with X11 or DGA).

Dropping depth or resolution isn’t really an option. And 25ms seems
too much anyway (and in theory unnecessary with HW acceleration and
async X operation).

I’m into any possible solution for this, including changing
hardware or linking to a hacked version of SDL if necessary (this app
is for a custom embedded system, so anything goes). I’d really,
really appreciate suggestions for improvement. Any SDL programming
or driver tricks? Are there better Xserver / videocard combo’s (with
HWACCEL)? Can SDL be forced to do async X updates?

Thanks!

  • Reinoud

The flags don’t indicate hardware acceleration because there is none under
X11 normally. Also, if you use the DGA driver, you must be root.

Some possible options include lowering color depth or resolution. A
better solution would be to switch to some sort of dirty update. If much
of your screen isn’t changing, there is really no need to update those
parts.

SteveOn February 19, 2003 12:30 pm, Reinoud wrote:

Hi,

I’ve run into an unexpected performance bottleneck with SDL on my
linux/X11 system. The problem is that either SDL_UpdateRect() with
the X11 driver, or SDL_BlitSurface() with the DGA driver and a
SDL_HWSURFACE, take over 25ms for a screen update. However, my total
time budged per frame is 20ms, and there’s a little more to be done
than just updating the screen :-).

A few facts:

  • P4 1.7, i845E, GeForce2 GTS
  • linux 2.4.20 (with a few patches and hacks that shouldn’t affect X)
  • XFree86 4.1.0 (oldish, but so is the GeForce2;)
  • 32bpp X visual, updated from a 1024x768 32b ‘native’ format surface
  • the surface flags do not report SDL_HWACCEL (with X11 or DGA).

How are you loading your surfaces? Are these images in memory or off a
hard drive, or at worse a CDROM or Flash Card? You have to consider the
size of the image in the transfer. You have ~3 meg worth of bytes per
surface.

RobertOn Wed, 2003-02-19 at 10:00, Reinoud wrote:

Hi,

I’ve run into an unexpected performance bottleneck with SDL on my
linux/X11 system. The problem is that either SDL_UpdateRect() with
the X11 driver, or SDL_BlitSurface() with the DGA driver and a
SDL_HWSURFACE, take over 25ms for a screen update. However, my total
time budged per frame is 20ms, and there’s a little more to be done
than just updating the screen :-).

A few facts:

  • P4 1.7, i845E, GeForce2 GTS
  • linux 2.4.20 (with a few patches and hacks that shouldn’t affect X)
  • XFree86 4.1.0 (oldish, but so is the GeForce2;)
  • 32bpp X visual, updated from a 1024x768 32b ‘native’ format surface
  • the surface flags do not report SDL_HWACCEL (with X11 or DGA).

Dropping depth or resolution isn’t really an option. And 25ms seems
too much anyway (and in theory unnecessary with HW acceleration and
async X operation).

I’m into any possible solution for this, including changing
hardware or linking to a hacked version of SDL if necessary (this app
is for a custom embedded system, so anything goes). I’d really,
really appreciate suggestions for improvement. Any SDL programming
or driver tricks? Are there better Xserver / videocard combo’s (with
HWACCEL)? Can SDL be forced to do async X updates?

Thanks!

  • Reinoud

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Folks,

What I’m looking for is support for DMA transfer from system memory
to video memory. Even during the PCI era (for video cards), several
graphics chips already supported bus-mastering DMA for this.
However, it seems like my current XFree/AGP/GeForce setup spends most
of its time waiting for the CPU to do the transfer… Ouch.

I have already ‘fixed’ SDL to do async updates, by simply changing
some XSync() to XFlush(); so far so good. But the CPU time spent in
the X server is still the same (naturally)…

I find it hard to believe that there is no X server / video card
combo that supports DMA from system to video memory. (But maybe this
is the wrong list for this subject; with SDL doing sync X updates out
of the box it wouldn’t matter much for SDL apps anyway).

  • Reinoud

Most video cards under X do support it–when using a DGA video mode. I
believe the architecture of X11 itself makes this difficult in any other
situation.

Someone correct me if I’m wrong, please!On Thu, 2003-02-20 at 06:53, Reinoud wrote:

I find it hard to believe that there is no X server / video card
combo that supports DMA from system to video memory. (But maybe this
is the wrong list for this subject; with SDL doing sync X updates out
of the box it wouldn’t matter much for SDL apps anyway).

Have your fixes made it into the main branch of code? If not, could you
let me know where these changes need to be made, I would love to
experiment with this. Currently, at certain points I need to display
full screen 800x600x32bpp images, and I get decent speed, but more is
always better.

RobertOn Thu, 2003-02-20 at 06:53, Reinoud wrote:

Folks,

What I’m looking for is support for DMA transfer from system memory
to video memory. Even during the PCI era (for video cards), several
graphics chips already supported bus-mastering DMA for this.
However, it seems like my current XFree/AGP/GeForce setup spends most
of its time waiting for the CPU to do the transfer… Ouch.

I have already ‘fixed’ SDL to do async updates, by simply changing
some XSync() to XFlush(); so far so good. But the CPU time spent in
the X server is still the same (naturally)…

I find it hard to believe that there is no X server / video card
combo that supports DMA from system to video memory. (But maybe this
is the wrong list for this subject; with SDL doing sync X updates out
of the box it wouldn’t matter much for SDL apps anyway).

  • Reinoud

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

If your modifications seems correct to the gurus, why not making a patch ?

Julien> ----- Original Message -----

From: robert@littlebitlost.com (Robert Diel)
To:
Sent: Thursday, February 20, 2003 4:16 PM
Subject: Re: [SDL] SDL X11 blit/update performance tweaking?

Have your fixes made it into the main branch of code? If not, could you
let me know where these changes need to be made, I would love to
experiment with this. Currently, at certain points I need to display
full screen 800x600x32bpp images, and I get decent speed, but more is
always better.

Robert

On Thu, 2003-02-20 at 06:53, Reinoud wrote:

Folks,

What I’m looking for is support for DMA transfer from system memory
to video memory. Even during the PCI era (for video cards), several
graphics chips already supported bus-mastering DMA for this.
However, it seems like my current XFree/AGP/GeForce setup spends most
of its time waiting for the CPU to do the transfer… Ouch.

I have already ‘fixed’ SDL to do async updates, by simply changing
some XSync() to XFlush(); so far so good. But the CPU time spent in
the X server is still the same (naturally)…

I find it hard to believe that there is no X server / video card
combo that supports DMA from system to video memory. (But maybe this
is the wrong list for this subject; with SDL doing sync X updates out
of the box it wouldn’t matter much for SDL apps anyway).

  • Reinoud

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Folks,

What I’m looking for is support for DMA transfer from system memory
to video memory. Even during the PCI era (for video cards), several
graphics chips already supported bus-mastering DMA for this.
However, it seems like my current XFree/AGP/GeForce setup spends most
of its time waiting for the CPU to do the transfer… Ouch.

I have already ‘fixed’ SDL to do async updates, by simply changing
some XSync() to XFlush(); so far so good. But the CPU time spent in
the X server is still the same (naturally)…

I find it hard to believe that there is no X server / video card
combo that supports DMA from system to video memory. (But maybe this
is the wrong list for this subject; with SDL doing sync X updates out
of the box it wouldn’t matter much for SDL apps anyway).

Yes, it is surprising. But, to the best of my knowledge that is the
state of the world. The path to high performance graphics in X has
always been to lean on a 3D API such as OpenGL or PEX that allows you to
do all the hard part through a kernel driver with just a little
negotiation with the X server to find out which bits of the frame buffer
the driver can touch.

There is actually a logical chain of events that lead X down the
slippery slope to the current situation. It all starts with the
assumption that X will run on a “3M” machine, a computer with a Megabyte
of RAM, a Million (1 bit) pixels, and a processor that can execute a
Million instructions per second. And, that the applications don’t run on
the same machine that X runs on. Having lived through it I can write
pages about it, but no one really wants to hear all that. So, the answer
to your question is that if you want high performance graphics, even 2D
graphics, under X, use OpenGL.

	Bob Pendleton.On Thu, 2003-02-20 at 06:53, Reinoud wrote:
  • Reinoud

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

±-----------------------------------+

I find it hard to believe that there is no X server / video card
combo that supports DMA from system to video memory. (But maybe this
is the wrong list for this subject; with SDL doing sync X updates out
of the box it wouldn’t matter much for SDL apps anyway).

Most video cards under X do support it–when using a DGA video mode. I
believe the architecture of X11 itself makes this difficult in any other
situation.

Someone correct me if I’m wrong, please!

From what little I know about X, this sounds like something that would
only work if you’re using the MIT shared memory extension. Otherwise,
X is unaware that the client and server are running on the same
machine.

Ron Steinke

“The sound of gunfire, off in the distance. I’m getting used to it now.”
– Talking Heads> From: Shawn

On Thu, 2003-02-20 at 06:53, Reinoud wrote:

Robert Diel wrote:

Have your fixes made it into the main branch of code?

Are you kidding? Did you miss the scare quotes around ‘fixed’ in my
post? Making the updates asynchronous breaks the default semantics
and works only if you can deal with unfinished outstanding operations
somehow (which is no problem with my app but will be for many).

If not, could you
let me know where these changes need to be made, I would love to
experiment with this. Currently, at certain points I need to display
full screen 800x600x32bpp images, and I get decent speed, but more is
always better.

Like I said, as long as there is no hardware DMA there is no speed
advantage. I just read that there is such XFree DMA support for some
old ATI card; it also occurred to me that this whole problem will be
moot using a mobo with some on-board UMA-type video hardware (i.e.
video mem in system RAM; no slow bus to pass through).

Anyway, if you want to play around look in:

SDL-1.2.5/src/video/x11/SDL_x11image.c

and replace

XSync(GFX_Display, False);

in the proper update function for your system with

XFlush(GFX_Display);
blit_queued = 1;

Yes, that’s the SDL_ASYNCBLIT code that sits right next to it; I
haven’t figured out why SDL doesn’t grant requests for
SDL_ASYNCBLIT…

Have fun,

  • Reinoud

Bob Pendleton wrote:

So, the answer
to your question is that if you want high performance graphics, even 2D
graphics, under X, use OpenGL.

Okay, you’re the second one suggesting that to me today, so it must
be the path to enlightenment :-). Funny situation this with X!

Thanks man,

  • Reinoud

Bob Pendleton <@Bob_Pendleton> wrote:

So, the answer
to your question is that if you want high performance graphics, even 2D
graphics, under X, use OpenGL.

Okay, you’re the second one suggesting that to me today, so it must
be the path to enlightenment :-). Funny situation this with X!

It should be a warning to all system architects. Much of the X
architecture was based on assumptions about the machines that were going
to be available in the future, but they ignored the machines that were
going to be available AFTER the future arrived.

I started working with X11R3 in about '86 or '87. The was the third
release of X11. Before X11 was X10, and before X10 was X9, and so on to
X. And, before X, was W. So, development of X is at least 20 years old.
It is hard to plan 20 years in the future when it comes to computers. It
is amazing that X has held up as well as it has.

What is needed is a complete reexamination of the architecture of X11.
There is no way we can get rid of the semantics of the X APIs and if you
try to get rid of network transparency people will hunt you down… But,
the architecture that implements it all CAN be updated to match modern
hardware. But, no one really wants to do that. And, as hardware gets
better and better the problems with X become less and less important.

Unless you want to change it, you’ll have to live with it.

	Bob PendletonOn Thu, 2003-02-20 at 12:55, Reinoud wrote:

Thanks man,

  • Reinoud

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

±-----------------------------------+

Reinoud wrote:

Bob Pendleton wrote:

So, the answer
to your question is that if you want high performance graphics, even 2D
graphics, under X, use OpenGL.

Okay, you’re the second one suggesting that to me today, so it must
be the path to enlightenment :-). Funny situation this with X!

And as a (relatively) ignorant SDL user, I can vouch for the fact that
SDL/X/OpenGL really flies with a well-supported video card (e.g.
nVidia). And OpenGL is way cool (in gamer-talk). We are so lucky!

Gib

I’m baffled.

Some good soul from XFree86.Org told me that there’s no 2D DMA
support in XFree servers (because it would require kernel support,
good point). However, he suggested that nvidia’s closed source
drivers should have all the 2D accceleration goodies.

So of course I tried the nvidia drivers.

What gives? No acceleration in SDL with the X11 driver. At all.
(Yes, the nvidia drivers are installed and working, everything checks
out correctly, and OpenGL flies).

In the mean time, I found out about SDL_GetVideoInfo(). Here’s the
result, which is the same for both XFree/nv and nvidia drivers:

SDL_GetVideoInfo():
hw_available = 0
wm_available = 1
blit_hw = 0
blit_hw_CC = 0
blit_hw_A = 0
blit_sw = 0
blit_sw_CC = 0
blit_sw_A = 0
blit_fill = 0
video_mem = 0

See? No acceleration.

The only situation which gives any acceleration at all is SDL’s DGA
driver in combination with the nvidia XFree driver:

SDL_GetVideoInfo():
hw_available = 1
wm_available = 0
blit_hw = 1
blit_hw_CC = 1
blit_hw_A = 0
blit_sw = 0
blit_sw_CC = 0
blit_sw_A = 0
blit_fill = 1
video_mem = 32576

Two points: 1) it makes no sense to me that these features are
accelerated with DGA only, and 2) they are useless to me, I need
’blit_sw’.

Yes, I’ll try the OpenGL route now, but this situation makes no sense
to me: no 2D acceleration with the SDL X11 driver. Anyone have seen
anything better than what I found?

  • Reinoud

Well the NVidia driver have 2D acceleration in the form that it have
Accelerated Render extension if you enable it from the X’s config.
The thing is that the rendering have to use the Render extension to be
accelerated. I don’t know if SDL uses Render extension, but I doubt.

In fact I don’t know enough about the Render extension to even tell you
specifically what it can accelerate, but I will notice if I don’t
enable it. It is not enabled by default, because it’s some kind of
testing code. On the earlier versions they used XAA acceleration, but
that doesn’t exists in the newer versions anymore.

To look what options the Nvidia drivers can use you should look about
the nvidia-glx documentation it will tell you how to set your X’s
config.On Friday 21 February 2003 23:44, Reinoud wrote:

I’m baffled.

Some good soul from XFree86.Org told me that there’s no 2D DMA
support in XFree servers (because it would require kernel support,
good point). However, he suggested that nvidia’s closed source
drivers should have all the 2D accceleration goodies.

So of course I tried the nvidia drivers.

What gives? No acceleration in SDL with the X11 driver. At all.
(Yes, the nvidia drivers are installed and working, everything checks
out correctly, and OpenGL flies).

What gives? No acceleration in SDL with the X11 driver. At all.
(Yes, the nvidia drivers are installed and working, everything checks
out correctly, and OpenGL flies).

SDL doesn’t use any acceleration on the X11 driver, IIRC. I’ve got no idea if there’s a reason for this or if there’s plans to implement it in the future.On Fri, 21 Feb 2003 22:44:27 +0100 Reinoud wrote:


SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl

Okay, here are the results so far of my quest for fast 2D (DMA)
system to video memory transfers with X on linux.

The options seem to be:

  • NVidia’s closed source drivers in combination with fast GeForces.
    These drivers actually use AGP DMA transfers for the various PutImage
    calls. However, it looks like they need fairly fast AGP (4x or more)
    and video card (GF2 or more) to see significant improvements over the
    XFree nv driver. My setup (P4 + i845E + GF2GTS, at AGP4x) sees
    speedups in the range of 25 - 100% with the nvidia driver, for image
    sizes of 1 - 3 MB. More recent (GF4) video cards are reportedly
    faster at this. Note that DMA is not done in the background; the X
    server uses 100% CPU during the transfer.

  • Support for the i845G seems to be getting there in XFree (4.3.0).
    I haven’t tried it yet, but transfers to video memory should be fast
    (sharing video and system memory).

  • Finally, I noticed that performance of DGA with the XFree nv driver
    is quite good if, and only if, the image size is relatively small
    (<1MB).

Thanks to everyone who responded - especially Mark Vojkovich for his
extremely helpful email support!

  • Reinoud

Thanks to everyone who responded - especially Mark Vojkovich for his
extremely helpful email support!

Thanks for following up on this! :slight_smile:

See ya,
-Sam Lantinga, Software Engineer, Blizzard Entertainment