2D Accelerated Hardware Support?

OK, I’ve spent most of the day digging through FAQs and mailing
list archives and have yet to find a satisfactory answer to this
question: What is the easiest way to take advantage of 2D
accelerated video hardware on a linux box? My hope was that
some combination of framebuffer drivers and SDL or some such
would do the trick, but I’m not having any luck finding a
solution. Am I missing something obvious? I have a multitude
of 2D/3D accelerated cards, all of which give me blazingly
fast blits under Windows/DirectX, but under Linux I seem
limited to copying individual pixels between direct buffers…
which is comparatively slooooow.

Please someone tell me there is a good answer to my dilemma.
With all the focus on hardware 3D support for Linux, I find
it hard to believe that accelerated 2D support has been
completely ignored. All I want is a freakin’ hardware
accelerated blit for goodness sake!

Thanks,

Thad

Newer X servers, especially XFree4, support acceleration. They may be
accelerating your blits.On Wed, Jun 28, 2000 at 10:52:58PM -0500, Thad Phetteplace wrote:

Please someone tell me there is a good answer to my dilemma.
With all the focus on hardware 3D support for Linux, I find
it hard to believe that accelerated 2D support has been
completely ignored. All I want is a freakin’ hardware
accelerated blit for goodness sake!


“I know not with what weapons World War III will be fought, but I know
that World War IV will be fought with sticks and stones.” — Einstein

Thad Phetteplace wrote:

Please someone tell me there is a good answer to my dilemma.
With all the focus on hardware 3D support for Linux, I find
it hard to believe that accelerated 2D support has been
completely ignored. All I want is a freakin’ hardware
accelerated blit for goodness sake!

Well, after frustration led me to post this rather unfocused
rant, I continued digging into the issue and found that: 1)
some hardware acceleration drivers are supported under SDL, and
2) A mistake in my own code was making the problem seem worse
that it really was. I was forgetting the HWPALETTE flag when
initializing the video mode, causing the surface to end up in
system ram instead of video ram. I am still not getting
hardware acceleration on my Voodoo3 2000 PCI card, but the
direct memory copies seem fast enough for my needs. Now that
I have discovered and fixed the bug in my code, I think I will
go back and install my other cards and get timings on them as
well.

Is there a list anywhere of cards that support hardware accel
under SDL? Is this mostly a function of framebuffer drivers
or does this generally require additions to SDL for each card?
Any advice would be greatly appreciated.

Thanks,

Thad

Please someone tell me there is a good answer to my dilemma.
With all the focus on hardware 3D support for Linux, I find
it hard to believe that accelerated 2D support has been
completely ignored. All I want is a freakin’ hardware
accelerated blit for goodness sake!

Most of the more common video cards are supported via. hardware
acceleration in 2D. In fact, most 2D accel was available before 3D
accel was available for a particular card.

Goto www.xfree86.org and read the documentation for each chipset to find
out if it is accelerated.–
Brian

Is there a list anywhere of cards that support hardware accel
under SDL?

SDL Is a programming library, not a device driver. For acceleration, you
need to check the actual video card driver layer.

If you are using X, check with them. If you are using the frame buffer,
check the documentation for that chipset in the framebuffer docs.

The only additional thing that is important to point out is that there are
features on each platform that may work better or worse depending on your
particular application that may or may not be used under SDL. For
example, SDL does not make use of DGA under XFree86 3.x (It does under
4.0)–
Brian

Most of the more common video cards are supported via. hardware
acceleration in 2D. In fact, most 2D accel was available before 3D
accel was available for a particular card.

Unless you are running SDL + Linux, of course.

Goto www.xfree86.org and read the documentation for each chipset to find
out if it is accelerated.

X11 acceleration != SDL acceleration. If you want 2D acceleration under
Linux, all of the following conditions must be fulfilled:

  1. XFree86 4.0 or later
  2. video hardware supported by XFree 4.0 (a limited set right now)
  3. video hardware for which XFree supports DGA2 (a further subset of 2)
  4. SDL must be configured to take advantage of DGA2
  5. root privileges!
  6. you don’t expect anything but rectangular blits/fills or colourkeyed
    blits to be accelerated

yes, then you will have 2D acceleration. Maybe the fbcon target is/will
be accel’d as well, I have no information about that.

Don’t interpret this as defaitism; see it as a challenge to improve the
current state of the art. I personally believe 5) is unacceptable, and more
operations (particularly alpha blits, stretching /rotating blits) should be
made available, when supported by the hardware. DRI shows some promise,
even though the support is narrow and the design not uncontroversial.

Mattias Engdeg?rd wrote:

Don’t interpret this as defaitism; see it as a challenge to improve the
current state of the art. I personally believe 5) is unacceptable, and more
operations (particularly alpha blits, stretching /rotating blits) should be
made available, when supported by the hardware. DRI shows some promise,
even though the support is narrow and the design not uncontroversial.

I agree (with 5) being unacceptable and DRI showing some promise).

Which parts of the design of DRI are deemed as controversial?

I believe that an X extension that would allow for some more direct
exposure of the acceleration hardware without giving access to the
framebuffer (which is the big thing needing root privileges) would be
more than welcome. Even in an indirect mode, the ability to do color-key
blitting from Pixmaps and making sure a Pixmap is in video memory could
help a lot some applications.

The DRM part of the DRI could also be used to drop the root privileges
requirement of DGA (v1 and v2) and could add an optional direct
framebuffer access to the previously described extension concept (or
perharps as a separate extension).–
“Unix is the worst operating system; except for all others.”
– Berry Kercheval

Most of the more common video cards are supported via. hardware
acceleration in 2D. In fact, most 2D accel was available before 3D
accel was available for a particular card.

Unless you are running SDL + Linux, of course.

Not exactly true. For the longest time, X has had XAA acceleration.
Sure - you could run svgalib GL quake under linux, but thats a single
video card. During that time there was already quite a bit of work done
on making use of video card acceleration. And this has nothing todo
with DGA 1.0. Theres more about video card acceleration than just DGA, I
believe.

Belew, you describe the conditions to get DGA 2.0. While DGA gives you
better acceleration, it doesn’t mean you aren’t using acceleration if you
aren’t using DGA. Do some research into X11 XAA. Read the documentation
for XFree86 3.X, you’ll find specifications describing which chipsets
are supported, which ones are supported with XAA acceleration, etc.>X11 acceleration != SDL acceleration. If you want 2D acceleration under

Linux, all of the following conditions must be fulfilled:

  1. XFree86 4.0 or later
  2. video hardware supported by XFree 4.0 (a limited set right now)
  3. video hardware for which XFree supports DGA2 (a further subset of 2)
  4. SDL must be configured to take advantage of DGA2
  5. root privileges!
  6. you don’t expect anything but rectangular blits/fills or colourkeyed
    blits to be accelerated

Not exactly true. For the longest time, X has had XAA acceleration.

XAA as a name and architecture is new in XFree 4.0. The XFree86 server has
had 2D acceleration for years, but that was not the point since SDL does
not use it. It applies to window to window copies, scrolling, some drawing
primitives, and (recently) to non-shared pixmaps; none of which SDL can
easily use.

Sure - you could run svgalib GL quake under linux, but thats a single
video card. During that time there was already quite a bit of work done
on making use of video card acceleration. And this has nothing todo
with DGA 1.0. Theres more about video card acceleration than just DGA, I
believe.

I’m not aware that DGA 1 gave you any acceleration at all; it just
presented a dumb frame buffer that you could access over a slow bus.

Belew, you describe the conditions to get DGA 2.0. While DGA gives you
better acceleration, it doesn’t mean you aren’t using acceleration if you
aren’t using DGA.

Unfortunately, for the purposes of SDL in 2D, it does. Without DGA2,
all you can do is to memcpy directly into video memory (with DGA1),
or use XShmPutImage to let the X server do the same thing, with
about the same (low) speed. (XFree 4.x does this apparently faster on
some hardware, but it still does not use hardware acceleration for blits.)

XAA as a name and architecture is new in XFree 4.0. The XFree86 server has
had 2D acceleration for years, but that was not the point since SDL does
not use it. It applies to window to window copies, scrolling, some drawing
primitives, and (recently) to non-shared pixmaps; none of which SDL can
easily use.

XAA has been around since 3.x, not 4.0. Here is a cut/paste from my
XFree86 3.3.6 server startup messages:-------------
(–) SVGA: Using XAA (XFree86 Acceleration Architecture)
(–) SVGA: XAA: Solid filled rectangles
(–) SVGA: XAA: Screen-to-screen copy
(–) SVGA: XAA: 8x8 color expand pattern fill
(–) SVGA: XAA: CPU to screen color expansion (TE/NonTE imagetext,
TE/NonTE polytext)
(–) SVGA: XAA: Using 12 128x128 areas for pixmap caching
(–) SVGA: XAA: Caching tiles and stipples
(–) SVGA: XAA: General lines and segments
(–) SVGA: XAA: Dashed lines and segments

I pulled the following quote from:
http://www.xfree86.org/3.3.6/MGA2.html#3

“Makes extensive use of the graphics accelerator. This server is very well
accelerated, and is one of the fastest XFree86 X servers”


You are talking about directly accessing video memory, not generic
acceleration.

SDL uses the underlying drivers (whether it’s a frame buffer, X11, etc.)
Under X11 - if your chipset is supported for XAA (even in 3.3.6) and you
do not set the “no_accel” option in XF86Config, it will use some level of
hardware acceleration. Though again, it isn’t as good as 4.0, it’s much
better than no acceleration at all.


Brian

XAA has been around since 3.x, not 4.0. Here is a cut/paste from my
XFree86 3.3.6 server startup messages:

It appears you are right and my recall flawed. I suppose my XFree server
is somewhat aged then.

“Makes extensive use of the graphics accelerator. This server is very well
accelerated, and is one of the fastest XFree86 X servers”

For X11 primitives. Nothing SDL uses, alas. Read the source.

hayward at slothmud.org wrote:

Not exactly true. For the longest time, X has had XAA acceleration.
Sure - you could run svgalib GL quake under linux, but thats a single
video card. During that time there was already quite a bit of work done
on making use of video card acceleration. And this has nothing todo
with DGA 1.0. Theres more about video card acceleration than just DGA, I
believe.

XFree86 3.x acceleration is plain crap. The only two semi-useful things
that are accelerated are filled rectangles and window to window
blitting. Offscreen pixmaps are never put in video memory, so when I
mean “window to window”, I actually mean “Window to Window” (as in the X
thing that is not a Pixmap).

Maybe stipples and a few others stuff are accelerated, didn’t try much
(stipple in games?)…

Everything was there in XAA, it just wasn’t used by the Xlib primitives
implementations it seems.–
“You can have my Unix system when you pry it from my cold, dead
fingers.”
– Cal Keegan

Mattias Engdeg?rd wrote:

Not exactly true. For the longest time, X has had XAA acceleration.

XAA as a name and architecture is new in XFree 4.0. The XFree86 server has
had 2D acceleration for years, but that was not the point since SDL does
not use it. It applies to window to window copies, scrolling, some drawing
primitives, and (recently) to non-shared pixmaps; none of which SDL can
easily use.

There was an XAA in XFree86 3.x, but the one in 4.x is a complete
rewrite I think.

SDL “does not use it”, but this is not by choice: X doesn’t tell you
what is accelerated or not, you just use a primitive, and if the X
server can do it accelerated (and knows how!), it will be. But the set
of accelerated primitives and the primitives that SDL uses don’t have
much in common.

Pixmap acceleration? In which version? What does it do exactly and what
primitives are accelerated?

I’m not aware that DGA 1 gave you any acceleration at all; it just
presented a dumb frame buffer that you could access over a slow bus.

Exactly. If the X server had been properly implemented, DGA would have
been SLOWER than XCopyArea, but since the X server didn’t use any kind
of DMA transfer or whatever to blit from system memory to video memory,
the two were very similar (but you could do page flipping with DGA).

Unfortunately, for the purposes of SDL in 2D, it does. Without DGA2,
all you can do is to memcpy directly into video memory (with DGA1),
or use XShmPutImage to let the X server do the same thing, with
about the same (low) speed. (XFree 4.x does this apparently faster on
some hardware, but it still does not use hardware acceleration for blits.)

Most modern video cards have DMA transfers for system->video blits, but
I was told that doing DMA to/from shared memory is quite hard or even
impossible. So the shared image/pixmap that was oh-so-fast actually
slows things down. I think that with XFre86 4.x, blitting from
non-shared Pixmaps is accelerated (and if the pixmap is big enough and
is used enough, it can get promoted into video memory, which will be
even faster).

I was thinking about this a lot a while ago, and the best solution I
found (as in “the one that sucks the least”) would be to use the Window
as the “framebuffer” instead of using a shared image which is then
copied over. Then, each of the surfaces would be separate shared images
(maybe you could make one big image and suballocate it internally).
Drawing operations (such as lines and filled rectangles) done on the
"framebuffer" directly should be done using Xlib primitives, to give the
server a chance of doing it in hardware.

This has the disadvantage of disallowing direct access to the conceptual
framebuffer (because it is a Window instead of a shared image). You
could put lock/unlock functions which would blit a copy in a local
buffer and blit it back when unlocking, which would suck (ever locked an
hardware DirectDraw surface on a TNT2? hehehe!). When root privileges
are available, this could be alleviated using DGA maybe?–
“We make rope.” – Rob Gingell on Sun Microsystem’s new virtual memory.

That may be true, but for my personal machine I spent quite some time
testing different video cards and found that X (and X applications of
different types - from fullscreen games to toolkit apps) behaved much
better on the better accelerated cards (especially Matrox).

Sure, some of this may just be attributed to a better quality piece of
hardware, but to me it seemed to make a big difference.–
Brian

XFree86 3.x acceleration is plain crap. The only two semi-useful things
that are accelerated are filled rectangles and window to window
blitting. Offscreen pixmaps are never put in video memory, so when I
mean “window to window”, I actually mean “Window to Window” (as in the X
thing that is not a Pixmap).

Maybe stipples and a few others stuff are accelerated, didn’t try much
(stipple in games?)…

Everything was there in XAA, it just wasn’t used by the Xlib primitives
implementations it seems.


“You can have my Unix system when you pry it from my cold, dead
fingers.”
– Cal Keegan

Most modern video cards have DMA transfers for system->video blits, but
I was told that doing DMA to/from shared memory is quite hard or even
impossible. So the shared image/pixmap that was oh-so-fast actually
slows things down.

It is definitely not impossible (shared memory is not different from
any other, from a hardware point of view), but it may be hard to fit
into the memory architecture of your OS. If it’s Linux, well, there’s
always a way :slight_smile:

I think that with XFre86 4.x, blitting from
non-shared Pixmaps is accelerated (and if the pixmap is big enough and
is used enough, it can get promoted into video memory, which will be
even faster).

The problem with using pixmaps in SDL is that the masked blit
operation rarely (if ever) is accel’d even if rectangular blits are.
Also alpha blits will be slow as molasses, at least until Keith
Packard swings his magic wand. :slight_smile: And direct pixel access will be slow
even then.

I was thinking about this a lot a while ago, and the best solution I
found (as in “the one that sucks the least”) would be to use the Window
as the “framebuffer” instead of using a shared image which is then
copied over.

For transparent blits, you either have to use the slow GC clip-mask or
fetch the rectangle, combine with your sprite in RAM, and put it back.
Same goes for alpha blits. What games can be done with just
rectangular blits?

But if Keith actually does come up with an all-singing, all-dancing
X11 with alpha and colourkey blits, then we’ll have some fun retargeting
SDL :slight_smile:

When root privileges
are available, this could be alleviated using DGA maybe?

Someone on irc (sorry, can’t remember) told me that the DGA root
restriction will be lifted in the future. We can only hope.

Mattias Engdeg?rd wrote:

Most modern video cards have DMA transfers for system->video blits, but
I was told that doing DMA to/from shared memory is quite hard or even
impossible. So the shared image/pixmap that was oh-so-fast actually
slows things down.

It is definitely not impossible (shared memory is not different from
any other, from a hardware point of view), but it may be hard to fit
into the memory architecture of your OS. If it’s Linux, well, there’s
always a way :slight_smile:

DMA transfers require physically contiguous memory, which is hard to
get. “Regular” memory, such as the one that is used for shared memory
(as you pointed out) is not always contiguous. I was also told that AGP
lifts this requirement, but for PCI cards, there would be a need for
some hack to get shared contiguous memory (from the /dev/drm device?).
Moby hack here.

I think that with XFre86 4.x, blitting from
non-shared Pixmaps is accelerated (and if the pixmap is big enough and
is used enough, it can get promoted into video memory, which will be
even faster).

The problem with using pixmaps in SDL is that the masked blit
operation rarely (if ever) is accel’d even if rectangular blits are.
Also alpha blits will be slow as molasses, at least until Keith
Packard swings his magic wand. :slight_smile: And direct pixel access will be slow
even then.

We don’t care for freakin’ direct pixel access! :wink:

Alpha blits have to be done in software by the X client, which means
mangling stuff locally then putting it up through a XPutImage (eek!), a
XShmPutImage or a XCopyArea+shared pixmap. Masked blit is awful, yes.

I was thinking mainly about things like “most of the background image”,
allowing very fast background restoration using XCopyArea. Tiles for
example. The parts with sprites over them would still have to be fiddled
with the usual shared memory fuss.

I was thinking about this a lot a while ago, and the best solution I
found (as in “the one that sucks the least”) would be to use the Window
as the “framebuffer” instead of using a shared image which is then
copied over.

For transparent blits, you either have to use the slow GC clip-mask or
fetch the rectangle, combine with your sprite in RAM, and put it back.
Same goes for alpha blits. What games can be done with just
rectangular blits?

Tetris! :wink:

Hmm, Asteroids over a solid black background? (stars could be drawn, but
the basic background would have to be a solid color) :slight_smile:

But if Keith actually does come up with an all-singing, all-dancing
X11 with alpha and colourkey blits, then we’ll have some fun retargeting
SDL :slight_smile:

I’ll cry with joy that day. :-)–
“How should I know if it works? That’s what beta testers are for.
I only coded it.” – Linus Torvalds

DMA transfers require physically contiguous memory, which is hard to
get. “Regular” memory, such as the one that is used for shared memory
(as you pointed out) is not always contiguous. I was also told that AGP
lifts this requirement, but for PCI cards, there would be a need for
some hack to get shared contiguous memory (from the /dev/drm device?).
Moby hack here.

And I thought most modern PCI devices did scatter/gather these days. At
least I’m pretty sure that SCSI cards do. Even in the absense of that,
it’s just a matter of fragmenting the transfer. Tedious but not fatal.

Alpha blits have to be done in software by the X client, which means
mangling stuff locally then putting it up through a XPutImage (eek!), a
XShmPutImage or a XCopyArea+shared pixmap. Masked blit is awful, yes.

Not only that, you have to do XShmGetImage first to retrieve the background
to combine. Either that, or be prepared to maintain a duplicate of the
window and all pixmaps in memory, and mirror all operations there.

I was thinking mainly about things like “most of the background image”,
allowing very fast background restoration using XCopyArea. Tiles for
example. The parts with sprites over them would still have to be fiddled
with the usual shared memory fuss.

An interesting hybrid. I tried making a pure Xlib game once with everything
in pixmaps (using GC clip-masks etc), but it was just too slow, and careful
analysis revealed that on the servers I tested, copy between pixmap and
window was no faster than an shmputimage.

Note that with xf86vidmode, you can hide images in the area left over when
the resolution is lowered, and thus always get fast vidmem-vidmem blits :slight_smile:

For transparent blits, you either have to use the slow GC clip-mask or
fetch the rectangle, combine with your sprite in RAM, and put it back.
Same goes for alpha blits. What games can be done with just
rectangular blits?

Tetris! :wink:

Of course, didn’t think of that. A lot of puzzle games, actually, very
few being performance-sensitive. On the other hand you get a lot
of potentially accelerated primitives (lines, text, etc).

Mattias Engdeg?rd wrote:

DMA transfers require physically contiguous memory, which is hard to
get. “Regular” memory, such as the one that is used for shared memory
(as you pointed out) is not always contiguous. I was also told that AGP
lifts this requirement, but for PCI cards, there would be a need for
some hack to get shared contiguous memory (from the /dev/drm device?).
Moby hack here.

And I thought most modern PCI devices did scatter/gather these days. At
least I’m pretty sure that SCSI cards do. Even in the absense of that,
it’s just a matter of fragmenting the transfer. Tedious but not fatal.

That’s what I was told, in any case. Could it be that bus mastering and
DMA are two different things and that bus mastering can do
scatter/gather? Some details about hardware that I do not know…

Fragmenting the transfer could hurt the performance, no? And for large
blits, it could look awful, assuming you set the “do it during the
vertical retrace”, with part of the blit being done before, and the rest
at the next refresh… I am not sure it would look that bad, but it
could.

An interesting hybrid. I tried making a pure Xlib game once with everything
in pixmaps (using GC clip-masks etc), but it was just too slow, and careful
analysis revealed that on the servers I tested, copy between pixmap and
window was no faster than an shmputimage.

With 3.x, normal pixmaps offer strictly no advantage over shared
images/pixmaps. They are blitted using the CPU and they are never put
in video memory (could improve a XCopyArea radically, like in the
window-to-window case).

Note that with xf86vidmode, you can hide images in the area left over when
the resolution is lowered, and thus always get fast vidmem-vidmem blits :slight_smile:

Nice trick, but only good for lower resolution and not too large
pixmaps…

For transparent blits, you either have to use the slow GC clip-mask or
fetch the rectangle, combine with your sprite in RAM, and put it back.
Same goes for alpha blits. What games can be done with just
rectangular blits?

Tetris! :wink:

Of course, didn’t think of that. A lot of puzzle games, actually, very
few being performance-sensitive. On the other hand you get a lot
of potentially accelerated primitives (lines, text, etc).

Hey, doing a GOOD Tetris is a lot harder than it seems, and can take up
noticeable resources. Quadra for example has smoothed movement, which
looks much nicer.–
“Unix was not designed to stop you from doing stupid things, because
that would also stop you from doing clever things.” – Doug Gwyn