Fri, 03 Nov 2000 Mattias Engdeg?rd wrote:
Is there a chance this configuration can enable faster blitting into VRAM?
(For software rendering, that is.) I’ve seen nothing near the transfer rate of
the AGP bus so far, on any Linux system…
We’ve had the discussion here many times before, and the basic fact is
that if nobody is willing to do the work to provide fast memory->vidmem
transfers through DMA or bus-mastering, nothing will happen.
Ok. That is, system RAM surfaces aren’t implemented in 3D accelerated drivers
either? (Some cards can do that, and it’s usually supported under Windoze,
AFAIK.)
I simply don’t think sub video frame rate, non-retrace-synced displays cut it
for scrolling 2D action games. This goes for Windows as well; most games don’t
bother to deal with the refresh rate vs. animation/scrolling speed issues.
However, at least it’s possible to achieve good results there. The problem
is that I hate Windoze programming, so… what to do?
Page-flipping usually synchronizes to vertical refresh, and SDL
supports that.
Does it synchronize only the flip operation, or does it also block until the
flip has been performed? Subtle (?) but fundamental difference. (I’ve browsed
earlier threads on this, and got the impression that SDL doesn’t support the
latter. What did I miss?)
You can also store stuff in video memory (hardware surfaces,
in SDL parlance) to reduce bandwidth requirements. Fbcon, svgalib
and DGA2 should be able to use these two concepts. X11 itself doesn’t,
unless someone writes it
Hmm… browsing the 1.1.6 source, looking only for raster sync code
I found that the fbdev code uses an ioctl() that supposedly does the same
thing. (BTW, is that define in by default?)
The svgalib code has a NOP for FlipHWSurface(), but does the sync in
LockHWSurface instead, but only on double buffered displays. This should work
for me, but the semantics are different from fbcon, unless I’m missing
something on a higher level in the code.
As for DX5/Win32, DDFLIP_WAIT seems to be used at all times when flipping. (The
lock function should behave like svgalib, I’m guessing after a quick glance.)
The X11 code seems to have a LockHWSurface function that’s similar to that of
svgalib/double buffered - there is an XSync() call that (probably) will be
made as a result of the kind of engine loop I’m thinking of. However, as X11
doesn’t support double buffering (that’s the probem if I understand it
correctly), this is quite irrelevant, unless the blits are faster than the CRT
beam. (Which they should be on decent machines, but…)
Finally, DGA seems to sync as expected, and it seems like some other video
subsystems which I haven’t programmed for, or even seen, also do the Right
Thing (albeit not always in a nice way, but that’s everything from hard to
impossible to fix…), so my conclusion is basically “What’s all the fuzz
about!?”.
Just don’t expect to get rock solid, smooth video in any windowed
environment. (Although, it’s possible to achieve if the blitting is fast
enough.)
As for the “where to place data” and blitting vs. software rendering issues,
that’s obviously the complicated part, and unfortunately, I can’t say that I’m
much more motivated to fix it than most other people, unless it turns out
that 3D accelerators doing 2D stuff can’t result in a decent, good looking and
ultra smooth 2D game engine.
That is, I’m not enough interested in things that require software
rendering, such as video decoders and some special effects. (This may change,
but unfortunately, it doesn’t change the fact that I’m already involved in to
many projects of all kinds, not really getting anything done…
)
Is it realistic to fix Linux, or are we looking at some very serious design
problems? (I know about the basic issues with X and kernel drivers, but this
bandwidth problem seems to be on a lower level, as it hits svgalib just as
hard.)
I have never seen svgalib being used with anything but memcpy-style
transfers to video memory, which should be considerably slower than
direct transfer even if the MTRR are set to write-combining for the
vidmem area and judicious amount of prefetching hints are used.
I’d like to see a comparison though
Do you mean accelerated blits vs. CPU blits in VRAM, or sysram->VRAM blits
using the CPU vs. DMA? (I’m kind of interested in both, although I’m most
likely going to use 3D acceleration to do VRAM->VRAM blits.) The former
shouldn’t be too hard to try, but the latter would require DMA, obviously…
Is this entirely unsupported by all drivers on Linux?
David Olofson
Programmer
Reologica Instruments AB
david.olofson at reologica.se
…- M u C o S --------------------------------. .- David Olofson ------.
| A Free/Open Multimedia | | Audio Hacker |
| Plugin and Integration Standard | | Linux Advocate |
------------> http://www.linuxdj.com/mucos -' | Open Source Advocate | ..- A u d i a l i t y ------------------------. | Singer | | Rock Solid Low Latency Signal Processing | | Songwriter |
—> http://www.angelfire.com/or/audiality -’ `-> david at linuxdj.com -’