My “approximately 9 MB” was using 1920x1200, at
32 bit per pixels
(either padded RGB or RGBA).
yes… i figured the math but I was wondering what your actual config is like (sorry if I misunderstood again and you really meant your config)
That’s indeed what my iMac desktop runs at, and I think is pretty
typical of 24" LCDs native resolution (but of course, I play many
games at lower resolutions). The laptop I’m writing this on is only
1024x768, though.
A blit from one place to the other involves reading, then
writing, so if the video card is capable of moving 3.2 GB/s between the
GPU and video memory, you need to halve it to figure out the
theoretical maximum frame rate.
Hmm… maybe this is getting too low level for my understanding, but… when I ‘read’ something, “what” is reading it? I mean, its going somewhere, right? When you say you have to read, THEN write, you mean the reading goes, say, to the processor, and then to the right place? Are you sure it doesnt happen at the same time?
For example, as far as I understand it, when you read a memory position, it gets stored in the cpu and then goes to the other position? OR memory can copy from one place to another in the same chip? (that would save a lot of traffic in the bus!)…
“What” is reading it can be either the CPU, the GPU, or a DMA
controller, from the point of view of the memory bus, it doesn’t
happen at the same time, so to move 9 MB of pixel data, you need 18 MB
of memory bus bandwidth. The advantage of using DMA or have the GPU do
it is that it leaves the CPU free to do other stuff (and in the case
of VRAM-to-VRAM, the GPU has a faster memory bus).
… but in the case of vRAM, this cpu would be the gpu, and to copy to the right position would be on the board already, so no time spent there virtually… but this I’m just especulating, I dont know how exactly the interaction between RAM and VRAM happens… maybe it goes to the cpu, then to the gpu… but then the speed to use would be CPU<->RAM… hehe, i’m confused
You’ve got it pretty much right, when copying from VRAM to VRAM, the
GPU is the one reading and writing, playing the role of the CPU. But
they have to be programmed in special ways, specific to each cards.
This knowledge is put inside of the drivers. When you ask to get
direct access to the video memory, you basically giving up the GPU,
and doing everything over the (relatively slow) I/O bus, using the
main CPU. This made sense for old school DOS games, because there was
no GPU, but today, it’s just crazy.
The interaction between RAM and VRAM is a bit more complicated,
because there are multiple ways to do it. You can either use
memory-mapped I/O to give the CPU access to the VRAM (which is what
happens when you get direct access to the VRAM), or you can use
programmed DMA transfers, which has the DMA controller copy the memory
without having it go through the CPU (it’s like a mini-CPU that just
does simple copies, really). Programming the DMA controller is the
kind of thing you have to ask the kernel to do, so it’s generally
managed by the DirectX drivers (and equivalents on other platforms).
Again, if you use direct video memory access, you’re missing out on
yet another performance boost.
resolution that would still give you 150 fps and up
Hmm… Maybe I should test the system like many games do… btw, whats the maximum refresh rate recent monitors and lcd can achieve?
The best CRT monitors can probably do in the 120 Hz range, I’d guess?
I use to drive my ViewSonic P-series at about 100 Hz. LCD usually
behave as if they had a 60 Hz refresh rate.
but actually blits from a backbuffer
With SDL 1.3, it’s just a call to
SDL_UpdateRect the whole screen, all the time.
dang… thats just for portability purposes?!
Yeah, to provide compatibility for SDL 1.2 games. The new SDL 1.3 does
not have SDL_Flip, which is good, since it hasn’t been relevant since
about 1998.
that SDL is actually just an abstraction layer:
very optimized assembly pixel format conversion code, for
example, but no real magic.
Sure sure… but, in my opinion, SDL should allow you to send commands to the wrapped libraries IF you want to. For example, say linux allows me to access that pointer, but windows dont. Then if I used that functionality SDL should then translate for me to windows code, OR at least allow me to translate myself. Sort of providing an #ifdef WIN32 #else preprocessor-like functions. It would allow the users of that particular SO to benefit, but wouldnt change for the others (since SDL must comply to the weakest link in the chain)
It does let you talk to the underlying libraries, actually! You have
to use the SDL_GetWMInfo call.
But to make a portable game, you have to avoid that. SDL tries to take
the subset of things that are either available everywhere, or are fast
to emulate on those platforms which don’t have it. If you use
something else, then there is almost certainly a platform where it
isn’t possible without some slow emulation.
That, or it was just forgotten, and you should submit a patch to add it.
But generally, there’s a common model between platforms where you can
assume a few things, and worst case, it’s actually faster than you
thought. For example, you should assume that surfaces in video memory
should not be inaccessible directly, and that copying data to/from
main memory to video memory is slow. In the case of video cards with
shared memory (like the cheap Intels, and now the lower end nVidia and
ATI), accessing video memory is actually very cheap (it’s already in
the main memory!), but assuming that it’s slow is safer (you get a
"bonus" speed increase, rather than a “surprising” slowdown).
get, but SDL_Flip and SDL_UpdateRect would most likely end
up throttling your application, much like when using
glXSwapBuffers (when vsync is enabled).
Sorry for misunderstanding again but english is not my native language: when you say “throttling”, it would mean that would end up desync’n, OR eat up most of my cycles?
That info is essential for not wasting iterations calculating(maybe even actual game states) and blitting frames that would never be seen. That would free cycles for having a smarter game and even more stuff on the screen.
By throttling, I mean that they would slow down your application if
you called them twice, because while the first one would be done in
the background, it wouldn’t be finished, and the second one would have
to wait for the first one to be done. So without doing anything, it
would “naturally” slow itself down there (which isn’t the best thing
ever, you might want to do other stuff than “wait”!).
Basically, what you want to do is avoid touching SDL surfaces for as
long as possible after doing an SDL_UpdateRects/SDL_Flip, because if
it’s not done, you’ll have a “stall”, where the first call will have
to wait until the surface is done being updated. So you should pack
all of your drawing in a single place, done as quickly as possible
(use pre-calculated stuff, instead of calculating stuff as you’re
drawing them, for example), do a single SDL_UpdateRect(s), then do
"other stuff" (handling input, network, game logic, physics, etc).
More info there (it says “Mac OS X FAQ”, but the recommendations more
or less generally apply to most modern platforms):
http://www.libsdl.org/faq.php?action=listentries&category=7#68On Tue, Jan 27, 2009 at 10:40 AM, Antonio Marcos wrote:
–
http://pphaneuf.livejournal.com/