Using video memory in X11

slouken · August 23, 2000, 8:19am

I’m contemplating adding hardware acceleration support to SDL X11 video.
Keith Packard and I are talking about a direct pixmap access extension
to XFree86 so that touching video memory will not involve copying it over
the network, but even if that is the case, it may be possible to get fairly
good speed with purely sprite-based action games.

This would allow locking the video card accelerator and memory mapping
the video memory, but would still probably require root permissions for
systems without the framebuffer console set up.

The basic trade-off is still the same - if the screen surface gets in
video memory, then it is very fast to blit to, and copy to the screen,
but it is very slow to modify directly. The reason I hadn’t added
support for this before is that there’s no way to tell whether a pixmap
actually made it into video memory, and if it didn’t then you get the
very slow updates using XGetImage() and XPutImage(), without any of the
benefits of hardware acceleration. So many SDL applications rely on
being able to directly modify the screen surface (for alpha blending,
particle effects, etc.), that it hasn’t been worth it until now.

Anyway, the side-effect of this would be changing the semantics of
SDL_HWSURFACE to mean simply that the surface is in video memory, not
necessarily being the visible video memory. A new flag, SDL_HWVISIBLE
would imply that you are writing directly to the visible video memory.
I could use these semantics to allow blit hardware acceleration under
windowed DirectX as well.

How useful would this be to people?

I needed to write out all the semantics to keep things straight:===

There are a number of different video memory configurations available:

Display surface has SDL_HWSURFACE and SDL_DOUBLEBUF

In this configuration, you are writing to the non-visible page of the
video hardware. When you call SDL_Flip(), the hardware will wait for
the next vertical retrace, and then swap pages so that the previously
non-visible screen becomes visible.
This configuration is normally only available in fullscreen mode.
[Only available using the DGA 2.0 video driver in X11]
Display surface has SDL_HWSURFACE and SDL_HWVISIBLE

In this configuration, you are writing to the visible video memory.
All drawing and blits become immediately visible.
This configuration is normally only available in fullscreen mode.
[Only available using the DGA 2.0 video driver in X11]
Display surface has SDL_HWSURFACE

In this configuration, you are writing to an offscreen area of video
memory. Areas that you wish to become visible must be copied to the
visible video memory by using SDL_UpdateRects().
[Possibly available under X11 with the proposed extension]
Display surface does not have SDL_HWSURFACE set

In this configuration, you are writing to an offscreen area of system
memory. Areas that you wish to become visible must be copied to the
visible video memory by using SDL_UpdateRects().
[Currently the only thing available under X11]

If you requested that the display surface reside in video memory, but the
resulting surface did not end up in video memory, it may be possible that
there was not enough spare video memory, the video driver does not support
direct video access, or some sort of format conversion is occurring.

If the display surface ended up in video memory, it is also possible to
have secondary surfaces in video memory. In general, you should create
your secondary surfaces as a single collection of multiple sprites, and
blit portions of that surface to the display surface. You should place
the largest and most frequently used artwork in video memory first, so
it is most likely to take advantage of accelerated blits.

Accessing video memory tends to be fairly slow. If you plan to change
pixels directly, or use a lot of unaccelerated blits, you should probably
create your display surface in system memory and perform rectangle updates
to video memory (via SDL_UpdateRects().)

====

Again, this is a tentative expansion of the current semantics.

Comments are welcome!

See ya,
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

Karsten_Laux · August 23, 2000, 9:14pm

Sam Lantinga wrote:

I’m contemplating adding hardware acceleration support to SDL X11 video.
Keith Packard and I are talking about a direct pixmap access extension
to XFree86 so that touching video memory will not involve copying it over
the network, but even if that is the case, it may be possible to get fairly
good speed with purely sprite-based action games.

This would allow locking the video card accelerator and memory mapping
the video memory, but would still probably require root permissions for
systems without the framebuffer console set up.

The basic trade-off is still the same - if the screen surface gets in
video memory, then it is very fast to blit to, and copy to the screen,
but it is very slow to modify directly. The reason I hadn’t added
support for this before is that there’s no way to tell whether a pixmap
actually made it into video memory, and if it didn’t then you get the
very slow updates using XGetImage() and XPutImage(), without any of the
benefits of hardware acceleration. So many SDL applications rely on
being able to directly modify the screen surface (for alpha blending,
particle effects, etc.), that it hasn’t been worth it until now.

Anyway, the side-effect of this would be changing the semantics of
SDL_HWSURFACE to mean simply that the surface is in video memory, not
necessarily being the visible video memory. A new flag, SDL_HWVISIBLE
would imply that you are writing directly to the visible video memory.
I could use these semantics to allow blit hardware acceleration under
windowed DirectX as well.

How useful would this be to people?

This would be a great feature in my eyes ! I think it would make a major
speedup for any scrolling game. I imagine having a bigger surface in
video memory thus scrolling would only imply hardware-blitting inside
the video memory as I would just copy parts of the invisible video
surface to the visible one.

Even if the gameworld would be much larger than the hidden surface this
new feature along with some clever caching of the background would
provide us with fast smooth scrolling within X11
Currently scrolling always means copying the complete memory surface to
the screen … without the help of any hardware blitters … a very
expensive operation.

my two pence,–
Karsten-O. Laux
@Karsten_Laux

Ray_Kelm · August 24, 2000, 12:12am

Could this mean, somewhere down the road, hardware accelerated
alpha blits? If so, I’m all for it

Actually, anything that allows access to hardware acceleration
under X is a good thing, IMHO. Presumably SDL would check for the
existance of this extension, and fall back to something else which
will work, but is not as fast, if it is not available?

-Ray

slouken · August 24, 2000, 12:23am

Could this mean, somewhere down the road, hardware accelerated
alpha blits? If so, I’m all for it

Possibly, though some tap-dancing will have to be done to make
this work with the right extensions.

Actually, anything that allows access to hardware acceleration
under X is a good thing, IMHO. Presumably SDL would check for the
existance of this extension, and fall back to something else which
will work, but is not as fast, if it is not available?

Correct.
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

Mattias_Engdegard · August 24, 2000, 9:28am

I’m contemplating adding hardware acceleration support to SDL X11 video.
Keith Packard and I are talking about a direct pixmap access extension
to XFree86 so that touching video memory will not involve copying it over
the network, but even if that is the case, it may be possible to get fairly
good speed with purely sprite-based action games.

This would allow locking the video card accelerator and memory mapping
the video memory, but would still probably require root permissions for
systems without the framebuffer console set up.

I’m not sure how this is different from DGA2, except that you get to
use X11 drawing primitives (is clipmasked XCopyArea() accelerated yet?.

The basic trade-off is still the same - if the screen surface gets in
video memory, then it is very fast to blit to, and copy to the screen,
but it is very slow to modify directly. The reason I hadn’t added
support for this before is that there’s no way to tell whether a pixmap
actually made it into video memory, and if it didn’t then you get the
very slow updates using XGetImage() and XPutImage(), without any of the
benefits of hardware acceleration.

The reason why this wasn’t a viable option before (Pierre Phaneuf and
I had a little discussion on sdl-l about it some time ago) was that
there were doubts about the acceleration of shaped blits.
A clever X server could probably use some kind of LRU replacement
scheme (combined with size heuristics) to use vidmem optimally for
pixmaps. (A hint mechanism would probably help performance under
low-vidmem conditions greatly, and avoid a lot of thrashing).

So many SDL applications rely on
being able to directly modify the screen surface (for alpha blending,
particle effects, etc.), that it hasn’t been worth it until now.

I don’t really see what has changed in that regard. Direct manipulating
pixels in vidmem is just as slow as always.

Display surface has SDL_HWSURFACE and SDL_DOUBLEBUF

Display surface has SDL_HWSURFACE and SDL_HWVISIBLE

Display surface has SDL_HWSURFACE

Display surface does not have SDL_HWSURFACE set

The only program-visible differences between 1. and 3. is that 1.
flips pages and 3. copies it — either equivalent, or merely a
question of maintaining a single- or double-level dirty rectangle
system. Page flipping has the added benefit of completely eliminating
tearing artifacts (a refresh-synchronised copy could do the same).

So, what does this give us that DGA2 doesn’t?

In general, you should create
your secondary surfaces as a single collection of multiple sprites, and
blit portions of that surface to the display surface.

Ummm… why?

slouken · August 24, 2000, 11:50am

This would allow locking the video card accelerator and memory mapping
the video memory, but would still probably require root permissions for
systems without the framebuffer console set up.

I’m not sure how this is different from DGA2, except that you get to
use X11 drawing primitives (is clipmasked XCopyArea() accelerated yet?.

You could do this in windowed mode.
I don’t know if clipmasked XCopyArea() is accelerated or not.

I need to get in touch with Keith and find out more about his rendering API.

So many SDL applications rely on
being able to directly modify the screen surface (for alpha blending,
particle effects, etc.), that it hasn’t been worth it until now.

I don’t really see what has changed in that regard. Direct manipulating
pixels in vidmem is just as slow as always.

The difference would be that the application could directly touch the video
memory instead of upload imformation to the server and have the server copy
it into video memory.

So, what does this give us that DGA2 doesn’t?

Mostly acceleration in a window. DGA2 does a really good job of fullscreen
acceleration, but none of those benefits are available in interaction with
the X server as a windowed application. I’m still feeling out whether or
not this is feasible and worth the effort.

In general, you should create
your secondary surfaces as a single collection of multiple sprites, and
blit portions of that surface to the display surface.

Ummm… why?

Large pixmaps are more likely to make it into video memory.

I’m experimenting here, so be gentle.

See ya!
-Sam Lantinga, Lead Programmer, Loki Entertainment Software

Keith_Packard · August 24, 2000, 2:30pm

I’m not sure how this is different from DGA2, except that you get to
use X11 drawing primitives (is clipmasked XCopyArea() accelerated yet?.

That’s still slow, but we’ll have full accelerated alpha blending before too
long; embed shape information in the alpha channel and it’ll go plenty
fast.

A clever X server could probably use some kind of LRU replacement
scheme (combined with size heuristics) to use vidmem optimally for
pixmaps. (A hint mechanism would probably help performance under
low-vidmem conditions greatly, and avoid a lot of thrashing).

The server isn’t that clever yet; it sticks pixmaps off screen while they
fit; everything else lives in memory. That works pretty well; cards
either have plenty of memory or way too little.

@Keith_Packard XFree86 Core Team SuSE, Inc.