Hello, new member reporting

Antonio_Marcos · February 1, 2009, 12:14am

If you’re gonna be using the GPU you probably want the
whole thing preloaded into
VRAM if you can. And yes, it still needs to go through the
GPU. DMA is a very
fast way to copy, but that’s all it does is copy
memory. It can’t do the number
crunching part.

I dont get the point of DMA then probably because I dont know whats this number crunching doing… but I dont get whats the point of DMA if the whole data have still to go through the gpu…

cheers,
AM.— Em fri, 30/1/09, Mason Wheeler escreveu:

De: Mason Wheeler
Assunto: Re: [SDL] DirectAccess to Video Memory [was: Hello, new member reporting]
Para: sdl at lists.libsdl.org
Data: Sexta-feira, 30 de Janeiro de 2009, 20:11

----- Original Message ----

From: Antonio Marcos <@Antonio_Marcos>
Subject: Re: [SDL] DirectAccess to Video Memory [was:
Hello, new member reporting]

I still dont get why I need 18MB bandwidth for
9MB…
Because a blit is not a straight-up copy. It
requires
processing. So the GPU has to load it, crunch
some numbers,
and save the results. 9 MB each way.

Ok, but after it crunches its waaay faster to write
than it was to read! no? I mean… to
read it has to go from RAM<->vRAM to write it
goes GPU<->vRAM …(ok, so maybe
its 2x that last speed, but its still faster than the
1st one)… also, if its using a DMA
controller, does it really needs to go though gpu?

If you’re gonna be using the GPU you probably want the
whole thing preloaded into
VRAM if you can. And yes, it still needs to go through the
GPU. DMA is a very
fast way to copy, but that’s all it does is copy
memory. It can’t do the number
crunching part.

SDL mailing list
SDL at lists.libsdl.org
http://lists.libsdl.org/listinfo.cgi/sdl-libsdl.org

  Veja quais s?o os assuntos do momento no Yahoo! +Buscados

Pierre_Phaneuf · February 1, 2009, 3:50pm

I dont get the point of DMA then probably because I dont know whats this number crunching doing… but I dont get whats the point of DMA if the whole data have still to go through the gpu…

DMA is to send data from the main RAM to the video RAM. So in the case
of this example, where we wanted to top out the capacity of the GPU,
we’d use DMA (well, by “we”, I mean the video card driver) to put the
sprites once in video RAM, then use the GPU to blit it at various
locations on the screen in the fastest possible way.

As Mason Wheeler said, you can’t do anything special with DMA, though,
no colorkeying, no nothing. If it can’t be done with memcpy(), it
can’t be done with DMA.On Sat, Jan 31, 2009 at 7:14 PM, Antonio Marcos wrote:

–

Pierre_Phaneuf · February 1, 2009, 5:28pm

Oh, so you were already assuming the surface was on video memory? Because I was thinking it was going from RAM to vRAM. hence much slower.

Oh, yes! The I/O bus is much slower than the 51.2 GB/s (well, on a
normal computer, there’s some crazy PCI Express v3.0 32x buses that
are extremely fast, but you then probably hit the limits of the main
memory bandwidth), so you don’t even think about the video card
memory bandwidth when doing main RAM to video RAM transfers.

i dont know… after all each blit, on the lower level, is actually repeating memory copies for each pixel(assuming a 32 bit depth on a 32bit machine). so the fewer pixels the better.

True, but when telling the video card to do VRAM to VRAM blits, that
has to go through a command buffer that is generally not optimized for
high volumes, so there’s a cost to each command, and there’s a
breaking point where you’re better off just blitting the whole thing
(or at least, in some bigger chunks).

In principle you’re right, though, fewer pixels is better.

Also, you were mentioning RLE earlier, if I’m not mistaken, most video
cards nowadays use texture compression, which usually gets a similar
effect. That’s also one of the reasons why locking surfaces is
something you want to avoid, because on those modern cards, it will
not only wait for the GPU to be done with it, but also needs to
uncompress it somewhere (all wasted if you were going to overwrite it
completely!), then recompress it when you unlock the surface. That
“somewhere” is also often main memory, so it also goes over the I/O
bus twice! Another reason why the “old DOS ways” are quite wrong
nowadays.

actually it would have the same number of branches (if i understood correctly what you mean), when I do dirty rectangles i have to copy the BG, and then blit it on the previous position of the moving sprite. Then I blit the actual sprite at the new position. This last blit is already colorkeyed(otherwise I would blit only the sides that moved and exposed the BG, like I said previously), the idea is to do the same with the first blit.

The catch though is that its a little trickier, because i would have to use a ‘negative mask’ of the sprite… and the way im seeing with SDL, i would need an actual blit of the whole BG rect to a new surface, then paste the mask to it, and only then pass it to the video card… and though this can save data going to the card, it could be done in a direct blit if I could ‘modify’ the colorkey code of SDL, and use the ‘negative mask’ at the AND phase… SDL could have a blit with a mask parameter

What I mean by more branches is that for every pixel, you add a
“should I get it from surface A or surface B”. I was mistaken, though,
it’s no worse than a colorkeyed blit, with the added twist of the
colorkey not just saying “don’t blit this pixel”, but rather “get this
pixel from the background instead”. You don’t really need a separate
“negative mask”, the colorkey could be used for this purpose, no?

I’m not sure if that’s an operation commonly available on GPUs,
though, so while you could definitely do something like that in main
memory (using the CPU, so no acceleration, you’re on your own, but you
have full flexibility), I’m not sure you would do it easily with
everything in video memory.

But if everything is in video memory, I wouldn’t care. You could
reblit the entire background, then blit every sprite on every
frame, and you could still be close to 1000 frames per second on my
GeForce 8800 (ok, maybe just 500, still plenty).

I understand what you mean by the “two rectangles on each side”, you
were thinking of a moving sprite. With everything in video memory, I
think I’d just blit the background at the sprite old location, and
blit the sprite at its new location, and that’d be plenty fast (if you
don’t have many sprites, possibly in the 10000 fps range,
theoretically, but I’m sure something else would be the limiter, of
course!), without trying to optimize the details.

I don’t want to sound too much like Bill Gates, but 10000 frames per
seconds REALLY ought to be enough for everyone. ;-)On Sat, Jan 31, 2009 at 7:10 PM, Antonio Marcos wrote:

–

Pierre_Phaneuf · February 1, 2009, 5:36pm

actually it would have the same number of branches (if i understood correctly what you mean), when I do dirty rectangles i have to copy the BG, and then blit it on the previous position of the moving sprite. Then I blit the actual sprite at the new position. This last blit is already colorkeyed(otherwise I would blit only the sides that moved and exposed the BG, like I said previously), the idea is to do the same with the first blit.

The catch though is that its a little trickier, because i would have to use a ‘negative mask’ of the sprite… and the way im seeing with SDL, i would need an actual blit of the whole BG rect to a new surface, then paste the mask to it, and only then pass it to the video card… and though this can save data going to the card, it could be done in a direct blit if I could ‘modify’ the colorkey code of SDL, and use the ‘negative mask’ at the AND phase… SDL could have a blit with a mask parameter

Just to put things in perspective:

One of my co-worker on the 3D team explained to me how the
depth-of-field effect in Crysis is done. They re-render every frame a
number of times, with slightly different camera locations, all pointed
at the same focal point, and blend the resulting multiple frames
together.

Think about it: the frame you see has not been rendered once, but
many times, and blended. Just the blending effect is probably more
work for the GPU than doing what you’re saying the naive way, and it’s
just a tiny part of every single frame in that game!

So don’t worry too much, if everything is in video memory and you’ve
got an OpenGL renderer in the style of SDL 1.3, you need to be very
wasteful with 2D operations before you start to feel it.

But if you SDL_LockSurface even once per frame, everything will
probably go down the toilet. So direct access to video memory (as per
the subject) is the last thing you want, really. ;-)On Sat, Jan 31, 2009 at 7:10 PM, Antonio Marcos wrote:

–

Antonio_Marcos · February 2, 2009, 12:58am

As Mason Wheeler said, you can’t do anything special
with DMA, though,
no colorkeying, no nothing. If it can’t be done with
memcpy(), it
can’t be done with DMA.

Yes, thats what I believed my intention is to get the image ready to display, and THEN DMA it to the card. But yes, if the sprite is already at the vRAM great, it will make it “ready” (the number crunching) using the gpu… before that is just loading time… disk to RAM… DMA to vRAM, the user will just see it once, probably at the start of the game, or level… but I believe you were talking about RAM to vRAM bandwidth… not vRAM to vRAM… well, sorry for any misunderstanding :)— Em dom, 1/2/09, Pierre Phaneuf escreveu:

De: Pierre Phaneuf
Assunto: Re: [SDL] DirectAccess to Video Memory [was: Hello, new member reporting]
Para: @Antonio_Marcos, “A list for developers using the SDL library. (includes SDL-announce)”
Data: Domingo, 1 de Fevereiro de 2009, 13:50
On Sat, Jan 31, 2009 at 7:14 PM, Antonio Marcos <@Antonio_Marcos> wrote:

I dont get the point of DMA then probably because I
dont know whats this number crunching doing… but I dont
get whats the point of DMA if the whole data have still to
go through the gpu…

DMA is to send data from the main RAM to the video RAM. So
in the case
of this example, where we wanted to top out the capacity of
the GPU,
we’d use DMA (well, by “we”, I mean the video
card driver) to put the
sprites once in video RAM, then use the GPU to blit it at
various
locations on the screen in the fastest possible way.

As Mason Wheeler said, you can’t do anything special
with DMA, though,
no colorkeying, no nothing. If it can’t be done with
memcpy(), it
can’t be done with DMA.

–
http://pphaneuf.livejournal.com/

  Veja quais s?o os assuntos do momento no Yahoo! +Buscados

Antonio_Marcos · February 2, 2009, 1:34am

What I mean by more branches is that for every pixel, you
add a “should I get it from surface A or surface B”. I
was mistaken, though, it’s no worse than a colorkeyed blit, with the
added twist of the colorkey not just saying “don’t blit this
pixel”, but rather “get this
pixel from the background instead”.

exactly!

You don’t really need a separate
“negative mask”, the colorkey could be used for
this purpose, no?

…let me think… oh… Indeed!! hheheheheheh thanks!! But yes, the colorkey is in the sprite surface’s pixels, SDL still need the mask parameter (any ideas how to use a mask in SDL?)

But if everything is in video memory, I wouldn’t care.
You could
reblit the entire background, then blit every sprite on
every
frame, and you could still be close to 1000 frames per
second on my
GeForce 8800 (ok, maybe just 500, still plenty).

wow!

I understand what you mean by the “two rectangles on
each side”, you
were thinking of a moving sprite. With everything in video
memory, I
think I’d just blit the background at the sprite old
location, and
blit the sprite at its new location, and that’d be
plenty fast (if you
don’t have many sprites, possibly in the 10000 fps
range,
theoretically, but I’m sure something else would be the
limiter, of
course!), without trying to optimize the details.

Woooow! But… what IF I intended to build lots of paralaxed layers, each with its own particle system, using lots of sprites, all with colorkey and alpha, hence needing a system of “emerging” dirty triangles for this to work? I guess i better build the game then, and only after it doesnt work(and im assuming it wont… j/k), come back with the torches

(ok, I realize that this much action would probably make dirty triangles less efficient, but I was exaggerating a little)

I don’t want to sound too much like Bill Gates, but
10000 frames per
seconds REALLY ought to be enough for everyone.

huahahaha we would need new eyes and brains for this to sound like Billy…
And though Ray Kurzweil believes otherwise, I think Moore’s Law is quite far to applying to those yet

cheers!
AM— Em dom, 1/2/09, Pierre Phaneuf escreveu:

De: Pierre Phaneuf
Assunto: Re: [SDL] DirectAccess to Video Memory [was: Hello, new member reporting]
Para: @Antonio_Marcos, “A list for developers using the SDL library. (includes SDL-announce)”
Data: Domingo, 1 de Fevereiro de 2009, 15:28
On Sat, Jan 31, 2009 at 7:10 PM, Antonio Marcos <@Antonio_Marcos> wrote:

Oh, so you were already assuming the surface was on
video memory? Because I was thinking it was going from RAM
to vRAM. hence much slower.

Oh, yes! The I/O bus is much slower than the 51.2 GB/s
(well, on a
normal computer, there’s some crazy PCI Express v3.0
32x buses that
are extremely fast, but you then probably hit the limits of
the main
memory bandwidth), so you don’t even think about the
video card
memory bandwidth when doing main RAM to video RAM
transfers.

i dont know… after all each blit, on the lower level,
is actually repeating memory copies for each pixel(assuming
a 32 bit depth on a 32bit machine). so the fewer pixels the
better.

True, but when telling the video card to do VRAM to VRAM
blits, that
has to go through a command buffer that is generally not
optimized for
high volumes, so there’s a cost to each command, and
there’s a
breaking point where you’re better off just blitting
the whole thing
(or at least, in some bigger chunks).

In principle you’re right, though, fewer pixels is
better.

Also, you were mentioning RLE earlier, if I’m not
mistaken, most video
cards nowadays use texture compression, which usually gets
a similar
effect. That’s also one of the reasons why locking
surfaces is
something you want to avoid, because on those modern cards,
it will
not only wait for the GPU to be done with it, but also
needs to
uncompress it somewhere (all wasted if you were going to
overwrite it
completely!), then recompress it when you unlock the
surface. That
“somewhere” is also often main memory, so it also
goes over the I/O
bus twice! Another reason why the “old DOS ways”
are quite wrong
nowadays.

actually it would have the same number of branches (if
i understood correctly what you mean), when I do dirty
rectangles i have to copy the BG, and then blit it on the
previous position of the moving sprite. Then I blit the
actual sprite at the new position. This last blit is already
colorkeyed(otherwise I would blit only the sides that moved
and exposed the BG, like I said previously), the idea is to
do the same with the first blit.

The catch though is that its a little trickier,
because i would have to use a ‘negative mask’ of the
sprite… and the way im seeing with SDL, i would need an
actual blit of the whole BG rect to a new surface, then
paste the mask to it, and only then pass it to the video
card… and though this can save data going to the card, it
could be done in a direct blit if I could ‘modify’
the colorkey code of SDL, and use the ‘negative
mask’ at the AND phase… SDL could have a blit with a
mask parameter

What I mean by more branches is that for every pixel, you
add a
“should I get it from surface A or surface B”. I
was mistaken, though,
it’s no worse than a colorkeyed blit, with the added
twist of the
colorkey not just saying “don’t blit this
pixel”, but rather “get this
pixel from the background instead”. You don’t
really need a separate
“negative mask”, the colorkey could be used for
this purpose, no?

I’m not sure if that’s an operation commonly
available on GPUs,
though, so while you could definitely do something like
that in main
memory (using the CPU, so no acceleration, you’re on
your own, but you
have full flexibility), I’m not sure you would do it
easily with
everything in video memory.

But if everything is in video memory, I wouldn’t care.
You could
reblit the entire background, then blit every sprite on
every
frame, and you could still be close to 1000 frames per
second on my
GeForce 8800 (ok, maybe just 500, still plenty).

I understand what you mean by the “two rectangles on
each side”, you
were thinking of a moving sprite. With everything in video
memory, I
think I’d just blit the background at the sprite old
location, and
blit the sprite at its new location, and that’d be
plenty fast (if you
don’t have many sprites, possibly in the 10000 fps
range,
theoretically, but I’m sure something else would be the
limiter, of
course!), without trying to optimize the details.

I don’t want to sound too much like Bill Gates, but
10000 frames per
seconds REALLY ought to be enough for everyone.

–
http://pphaneuf.livejournal.com/

  Veja quais s?o os assuntos do momento no Yahoo! +Buscados

Antonio_Marcos · February 2, 2009, 1:46am

Think about it: the frame you see has not been rendered
once, but
many times, and blended. Just the blending effect is
probably more
work for the GPU than doing what you’re saying the
naive way, and it’s
just a tiny part of every single frame in that game!

Hmpf… there goes my optimizing-l33t-hacking-drives down the drain… but they will be back with torches, Pierre, be warned!

So don’t worry too much, if everything is in video
memory and you’ve
got an OpenGL renderer in the style of SDL 1.3, you need to
be very
wasteful with 2D operations before you start to feel it.

wait a sec… will I need to actually talk to openGL myself? or this will be taken care of by SDL?

how the depth-of-field effect in Crysis is done.

Awesome! Got any videos of this in action, so I can check it out? It sounds like the player would feel stoned or something (sorry if this offends the programmers in any way haha, and it does seems to be at least a couple of ways, lol)

But if you SDL_LockSurface even once per frame,
everything will
probably go down the toilet. So direct access to video
memory (as per
the subject) is the last thing you want, really.

ookaay… got it back to DosBox then…

cheers!
AM— Em dom, 1/2/09, Pierre Phaneuf escreveu:

De: Pierre Phaneuf
Assunto: Re: [SDL] DirectAccess to Video Memory [was: Hello, new member reporting]
Para: @Antonio_Marcos, “A list for developers using the SDL library. (includes SDL-announce)”
Data: Domingo, 1 de Fevereiro de 2009, 15:36
On Sat, Jan 31, 2009 at 7:10 PM, Antonio Marcos <@Antonio_Marcos> wrote:

actually it would have the same number of branches (if
i understood correctly what you mean), when I do dirty
rectangles i have to copy the BG, and then blit it on the
previous position of the moving sprite. Then I blit the
actual sprite at the new position. This last blit is already
colorkeyed(otherwise I would blit only the sides that moved
and exposed the BG, like I said previously), the idea is to
do the same with the first blit.

The catch though is that its a little trickier,
because i would have to use a ‘negative mask’ of the
sprite… and the way im seeing with SDL, i would need an
actual blit of the whole BG rect to a new surface, then
paste the mask to it, and only then pass it to the video
card… and though this can save data going to the card, it
could be done in a direct blit if I could ‘modify’
the colorkey code of SDL, and use the ‘negative
mask’ at the AND phase… SDL could have a blit with a
mask parameter

Just to put things in perspective:

One of my co-worker on the 3D team explained to me how the
depth-of-field effect in Crysis is done. They re-render
every frame a
number of times, with slightly different camera locations,
all pointed
at the same focal point, and blend the resulting multiple
frames
together.

Think about it: the frame you see has not been rendered
once, but
many times, and blended. Just the blending effect is
probably more
work for the GPU than doing what you’re saying the
naive way, and it’s
just a tiny part of every single frame in that game!

So don’t worry too much, if everything is in video
memory and you’ve
got an OpenGL renderer in the style of SDL 1.3, you need to
be very
wasteful with 2D operations before you start to feel it.

But if you SDL_LockSurface even once per frame,
everything will
probably go down the toilet. So direct access to video
memory (as per
the subject) is the last thing you want, really.

–
http://pphaneuf.livejournal.com/

  Veja quais s?o os assuntos do momento no Yahoo! +Buscados

Pierre_Phaneuf · February 2, 2009, 6:37pm

Hmpf… there goes my optimizing-l33t-hacking-drives down the drain… but they will be back with torches, Pierre, be warned!

Yeah, it’s a bit disappointing for the bit-twiddling aspects that
we’re losing, but there’s a whole new art of making the fixed set of
APIs (well, less fixed now that there are more and more shaders kind
of programmability) of OpenGL do the wacky things we want to do as
fast as possible.

wait a sec… will I need to actually talk to openGL myself? or this will be taken care of by SDL?

No, in SDL 1.3, the new API has a “renderer” that talks OpenGL behind the scene.

how the depth-of-field effect in Crysis is done.

Awesome! Got any videos of this in action, so I can check it out? It sounds like the player would feel stoned or something (sorry if this offends the programmers in any way haha, and it does seems to be at least a couple of ways, lol)

There’s got to be some stuff on Youtube, I guess? There’s a similar
effect in Call of Duty 4, when you bring up a weapon for more accurate
firing, it simulates the effect of your eyes focusing on the sights,
but I think it’s faked with a bit of blurring, instead of being more
optically correct, as in Crysis (that game really needs a powerful
system, no need to say!).

But if you SDL_LockSurface even once per frame,
everything will
probably go down the toilet. So direct access to video
memory (as per
the subject) is the last thing you want, really.

ookaay… got it back to DosBox then…

Well, if it’s any comfort, with PCI Express, the “toilet” is much
faster than it used to be. But if you want to use the hardware
properly and kick real ass, you’ll have to keep to the new SDL_Texture
API (rather than the old SDL_Surface API). You can still lock textures
with the new API, but there’s a flag when creating the texture to say
whether it is “static” or “streamable”, and you can only lock the
latter (this is so SDL knows when it’s free to optimize the heck out
of stuff, so presumably, operations done with “static” textures have
better chances of being in the fast path).On Sun, Feb 1, 2009 at 8:46 PM, Antonio Marcos wrote:

–

Donny_Viszneki · February 2, 2009, 7:15pm

I haven’t looked at the SDL texture API yet, but “streamable” versus
“static” probably refers to whether or not memory mapped buffer
objects were used/available when the texture was created. You would
have to “lock” the static type of texture because texture memory
download and upload are atomic. If that’s the case, perhaps another
API could be provided to keep a local copy of static textures so that
it only had to be uploaded (perhaps it already does this, even.)On Mon, Feb 2, 2009 at 1:37 PM, Pierre Phaneuf wrote:

Well, if it’s any comfort, with PCI Express, the “toilet” is much
faster than it used to be. But if you want to use the hardware
properly and kick real ass, you’ll have to keep to the new SDL_Texture
API (rather than the old SDL_Surface API). You can still lock textures
with the new API, but there’s a flag when creating the texture to say
whether it is “static” or “streamable”, and you can only lock the
latter (this is so SDL knows when it’s free to optimize the heck out
of stuff, so presumably, operations done with “static” textures have
better chances of being in the fast path).

–
http://codebad.com/

Pierre_Phaneuf · February 2, 2009, 7:27pm

I haven’t looked at the SDL texture API yet, but “streamable” versus
“static” probably refers to whether or not memory mapped buffer
objects were used/available when the texture was created. You would
have to “lock” the static type of texture because texture memory
download and upload are atomic. If that’s the case, perhaps another
API could be provided to keep a local copy of static textures so that
it only had to be uploaded (perhaps it already does this, even.)

From the 1.3 SDL_video.h:

/**

\enum SDL_TextureAccessOn Mon, Feb 2, 2009 at 2:15 PM, Donny Viszneki <donny.viszneki at gmail.com> wrote:
\brief The access pattern allowed for a texture
*/
typedef enum
{
SDL_TEXTUREACCESS_STATIC, /< Changes rarely, not lockable */
SDL_TEXTUREACCESS_STREAMING /< Changes frequently, lockable */
} SDL_TextureAccess;

/**

\fn void SDL_LockTexture(SDL_TextureID textureID, const SDL_Rect
*rect, int markDirty, void **pixels, int *pitch)
\brief Lock a portion of the texture for pixel access.
\param textureID The texture to lock for access, which was created
with SDL_TEXTUREACCESS_STREAMING.
\param rect A pointer to the rectangle to lock for access. If the
rect is NULL, the entire texture will be locked.
\param markDirty If this is nonzero, the locked area will be marked
dirty when the texture is unlocked.
\param pixels This is filled in with a pointer to the locked
pixels, appropriately offset by the locked area.
\param pitch This is filled in with the pitch of the locked pixels.
\return 0 on success, or -1 if the texture is not valid or was
created with SDL_TEXTUREACCESS_STATIC
\sa SDL_DirtyTexture()
\sa SDL_UnlockTexture()
*/
extern DECLSPEC int SDLCALL SDL_LockTexture(SDL_TextureID textureID,
const SDL_Rect * rect,
int markDirty, void **pixels,
int *pitch);

There’s also a curious SDL_QueryTexturePixels call to actually get to
the pixel, which, well, isn’t all that curious, but says that you can
try it without locking the texture first (still has to be a streamable
texture), which makes for a bit of a weird API. If you use that, you
have to provide a fallback when locking is required… I would have
rather just said “you have to lock all the time”, and made locking a
noop when unnecessary.

I’m hoping there isn’t a serious speed advantage when
SDL_QueryTexturePixels can be used without locking, this would be
really strange.

–

Donny_Viszneki · March 30, 2009, 4:33pm

? ?SDL_TEXTUREACCESS_STATIC, ? ?/< Changes rarely, not lockable */
? ?SDL_TEXTUREACCESS_STREAMING ?/< Changes frequently, lockable */

Why is STREAMING lockable and not STATIC? This doesn’t make sense.

If you use that, you
have to provide a fallback when locking is required… I would have
rather just said “you have to lock all the time”, and made locking a
noop when unnecessary.

I agree.

I’m hoping there isn’t a serious speed advantage when
SDL_QueryTexturePixels can be used without locking, this would be
really strange.

I’m still unsure that the documented lockability you excerpted from
the source code is accurate. It doesn’t make any sense. If anything,
either STATIC should be lockable and STREAMING should not be lockable,
or they should both be lockable. Locking STREAMING should only exist
as a means of coalescing blitting operations before finally actually
changing the memory-mapped texture’s pixel data, which is more
expensive.On Mon, Feb 2, 2009 at 3:27 PM, Pierre Phaneuf wrote:

–
http://codebad.com/

Pierre_Phaneuf · March 30, 2009, 5:15pm

? ?SDL_TEXTUREACCESS_STATIC, ? ?/< Changes rarely, not lockable */
? ?SDL_TEXTUREACCESS_STREAMING ?/< Changes frequently, lockable */

Why is STREAMING lockable and not STATIC? This doesn’t make sense.

Because STATIC says that you won’t touch it too much, and, most
importantly, that you won’t be touching the bits directly, so SDL and
its underlying cohorts are free to put it in inaccessible video
memory, RLE encode it, DXTC it, and smear fresh chicken blood on it,
among other things, to make it as fast as it can.

You can still mess with its bits if you want, by creating a separate
texture of the same size and display format, but with
SDL_TEXTUREACCESS_STREAMING instead, copying it from the STATIC one,
lock the STREAMING one, mess with it, unlock it, then copy it back to
the STATIC one. If you think that’s slow, well, yes, but that’s what
you get anyway on a lot of modern video cards when you lock surfaces
that are in video memory, so locking the surface was just hiding all
of that for you. Now, you can tell SDL what you’ll be doing, so you
know what to expect and SDL knows what your intentions are and will
avoid doing these slow things. I like that you can tell whether you’re
dirtying the pixel data, so SDL can optimize away a potential transfer
back to the video card after you unlock (it can just throw away the
buffer if it wants).

Note that the screen texture is most likely to be STATIC, reflecting
that on many platforms, we can’t access the framebuffers anymore (and
it’s silly to do so anyway).

I’m still unsure that the documented lockability you excerpted from
the source code is accurate. It doesn’t make any sense. If anything,
either STATIC should be lockable and STREAMING should not be lockable,
or they should both be lockable. Locking STREAMING should only exist
as a means of coalescing blitting operations before finally actually
changing the memory-mapped texture’s pixel data, which is more
expensive.

I’m not sure I get this… STREAMING has to be lockable, because we
can use the hardware accelerator to blit them over to STATIC textures
(like the screen), and we need to coordinate their usage. STATIC
textures, you can’t access the bits other than through the APIs
(instead of through a pointer to the pixel data), which will do the
locking as appropriate. In the optimal case, all the SDL calls
involving STATIC textures are just translated into platform calls that
just all go to the hardware accelerator, so there’s no locking needed,
since the accelerator is the only one touching the precious pixel
data… :-)On Mon, Mar 30, 2009 at 12:33 PM, Donny Viszneki <donny.viszneki at gmail.com> wrote:

–