glSDL Texture sizes

Greg_Trounson · November 10, 2003, 1:56am

Gidday,

I notice that when I run a program compiled with glSDL on my nVidia
card, I get a line of output like this:
glSDL: Max texture size: 4096
, and when I run it on my 3DFX Voodoo3:
glSDL: Max texture size: 256
, as you might expect for a card that can only support 256x256 textures.

My question is; can glSDL can take advantage of this? For example, if I
create a background surface on a 640x480 screen, this exceeds the
256x256 limit of the voodoo card, resulting in nothing being blitted.

I figure glSDL should provide a function to return the maximum texture
size so I can divide up my surfaces accordingly, or glSDL could do the
splitting transparently.

I realise that the former would break compatibility with generic SDL,
but might this case warrant an exception?

Greg

David_Olofson · November 10, 2003, 4:17am

Gidday,

I notice that when I run a program compiled with glSDL on my nVidia
card, I get a line of output like this:
glSDL: Max texture size: 4096
, and when I run it on my 3DFX Voodoo3:
glSDL: Max texture size: 256
, as you might expect for a card that can only support 256x256
textures.

My question is; can glSDL can take advantage of this?

Well, it does take the maximum texture size in account…

For example,
if I create a background surface on a 640x480 screen, this exceeds
the 256x256 limit of the voodoo card, resulting in nothing being
blitted.

…but two-way tiling is not yet implemented, so if a surface is
taller and wider than then max, it can’t be tiled at all, and
conversion will fail.

I figure glSDL should provide a function to return the maximum
texture size so I can divide up my surfaces accordingly, or glSDL
could do the splitting transparently.

It’s definitely the job of glSDL, and it already handles textures that
are wider or taller than the max texture size.

I realise that the former would break compatibility with generic
SDL, but might this case warrant an exception?

No, glSDL should be fixed. I just never got around to implement that
last tiling case, as I’ve never had any complaints about it missing.
Apparently, most people (at least glSDL users) are using cards that
aren’t limited to 256x256 textures.

Another reason why I haven’t messed with it is that this code should
be redesigned, using a texture space allocator (a 2D memory manager,
basically; able to use a single texture for lots of surfaces and
stuff like that), but that’s still a TODO for the backend version.

I’ll have a look at it right away and see if I can fill in the missing
case. (I have a new glSDL/wrapper version ready for release anyway,
with some glSDL/backend collission warnings avoided and stuff.) Then
we’ll at least have it working in both glSDL versions until someone
gets around to do it properly.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Monday 10 November 2003 10.55, Greg Trounson wrote:

Greg_Trounson · November 10, 2003, 11:30am

David Olofson wrote:
…

For example,
if I create a background surface on a 640x480 screen, this exceeds
the 256x256 limit of the voodoo card, resulting in nothing being
blitted.

…but two-way tiling is not yet implemented, so if a surface is
taller and wider than then max, it can’t be tiled at all, and
conversion will fail.

…

I realise that the former would break compatibility with generic
SDL, but might this case warrant an exception?

No, glSDL should be fixed. I just never got around to implement that
last tiling case, as I’ve never had any complaints about it missing.
Apparently, most people (at least glSDL users) are using cards that
aren’t limited to 256x256 textures.

Another reason why I haven’t messed with it is that this code should
be redesigned, using a texture space allocator (a 2D memory manager,
basically; able to use a single texture for lots of surfaces and
stuff like that), but that’s still a TODO for the backend version.

I’ll have a look at it right away and see if I can fill in the missing
case. (I have a new glSDL/wrapper version ready for release anyway,
with some glSDL/backend collission warnings avoided and stuff.) Then
we’ll at least have it working in both glSDL versions until someone
gets around to do it properly.

That’d be great. Let me know if I can help in any way.

I am trying to make a basic game engine that implements smooth motion
and glSDL seems to be the only way to do it under Linux/SDL. One
objective is to get it working on as many OpenGL-capable configurations
as possible.

thanks,
Greg> On Monday 10 November 2003 10.55, Greg Trounson wrote:

Gabriel_Gambetta · November 10, 2003, 11:51am

Another reason why I haven’t messed with it is that this code should
be redesigned, using a texture space allocator (a 2D memory manager,
basically; able to use a single texture for lots of surfaces and
stuff like that), but that’s still a TODO for the backend version.

What good algorithms exist for that? I have a very similar need,
although
not directly related. I’ve tried using a quadtree and 2D BSP trees but
the results weren’t good (lots of space wasted)

Thanks,
–Gabriel

Lic. Gabriel Gambetta
ARTech - GeneXus Development Team
ggambett at artech.com.uy

Bob_Pendleton · November 10, 2003, 1:31pm

Another reason why I haven’t messed with it is that this code should
be redesigned, using a texture space allocator (a 2D memory manager,
basically; able to use a single texture for lots of surfaces and
stuff like that), but that’s still a TODO for the backend version.

What good algorithms exist for that? I have a very similar need,
although
not directly related. I’ve tried using a quadtree and 2D BSP trees but
the results weren’t good (lots of space wasted)

Yeah, this is a variation of the knapsack problem and is fairly hard to
solve. I worked on it once. We found that the problems were caused by
fonts. Lots of little glyphs cause horrible fragmentation. So, we built
a special purpose allocator for fonts. It found the largest glyph and
then tried to tile the font into a single large region or several
smaller regions. Doing that got us past the worst of the fragmentation
problems.

We didn’t try anything as complex as a bsp tree. We just broke up square
regions into smaller square regions and kept the free squares on a list,
one list for each size. Basically just a variation of the memory
allocators used by malloc() and friends.

		Bob PendletonOn Mon, 2003-11-10 at 13:50, Gabriel Gambetta wrote:

Thanks,
–Gabriel

Lic. Gabriel Gambetta
ARTech - GeneXus Development Team
ggambett at artech.com.uy

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl
–
±--------------------------------------+

Bob Pendleton: writer and programmer. +
email: Bob at Pendleton.com +
web: www.GameProgrammer.com +
±--------------------------------------+

David_Olofson · November 10, 2003, 2:36pm

[…]

I’ll have a look at it right away and see if I can fill in the
missing case. (I have a new glSDL/wrapper version ready for
release anyway, with some glSDL/backend collission warnings
avoided and stuff.) Then we’ll at least have it working in both
glSDL versions until someone gets around to do it properly.

That’d be great. Let me know if I can help in any way.

Well, you can test it when I’m done.

Unfortunately, I had to fix some bugs for money and stuff, so I only
got halfway before my training session. I’ll try to get it working
tonight.

I am trying to make a basic game engine that implements smooth
motion and glSDL seems to be the only way to do it under Linux/SDL.
One objective is to get it working on as many OpenGL-capable
configurations as possible.

Note that if you desperately need OpenGL rendering, you might actually
be better off using OpenGL directly. That gives you much more control
(lots of blending modes, 2D and 3D transformations, Z buffering etc
etc) and probably slightly better performance.

The point with glSDL is that it’s still the SDL 2D API, meaning that
the applications will run with SDL 2D backends as well. On many
platforms, a properly used 2D backend is quite fast enough even for
fullscreen scrollers, and more seriously; it is sometimes the only
option.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Monday 10 November 2003 20.28, Greg Trounson wrote:

David_Olofson · November 10, 2003, 2:47pm

None that are really good, AFAIK. It’s a hard problem to solve really
well…

However, I’m only looking for something thats a bit better than the
current solution, which shouldn’t be all that hard.

The current one assumes that all tiles belonging to a surface are of
the same size (parts of the “odd edge” tiles are unused), and it
can’t keep tiles from multiple surfaces in the same texture. The new
two-way mode is even worse (right now, at least); it just assumes
that all tiles are of the maximum texture size, meaning that you
could potentially end up with some tiles where only a row of pixels
along one edge is used. It would be trivial to just select a smaller
tile size (say, halve it until no tile has less than 50% area
utilization), but then the surface would be rendered using many more
quads than necessary, or I would have to implement support for mixed
tile sizes in a surface.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Monday 10 November 2003 20.50, Gabriel Gambetta wrote:

Another reason why I haven’t messed with it is that this code
should be redesigned, using a texture space allocator (a 2D
memory manager, basically; able to use a single texture for lots
of surfaces and stuff like that), but that’s still a TODO for the
backend version.

What good algorithms exist for that?

David_Olofson · November 10, 2003, 3:08pm

[…]

We didn’t try anything as complex as a bsp tree. We just broke up
square regions into smaller square regions and kept the free
squares on a list, one list for each size. Basically just a
variation of the memory allocators used by malloc() and friends.

Sounds like a 2D version of a commonly used real time memory manager
design.

Did you merge small squares back together when possible, and if so,
how did you go about finding them in the free pool?

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Monday 10 November 2003 22.29, Bob Pendleton wrote:

Ivan_Montes · November 10, 2003, 5:23pm

Hi,

I implemented a simple texture manager for an OpenGL 2D tiling engine I was
developing some time ago. I used a “bit grid” to solve the problem of
storing several small sprites/tiles into a texture.
I didn’t have the problem of storing a bitmap larger than the maximum
texture size so I didn’t take into account that case

Well, what I did was to “attach” a bit grid, an array[32] of dwords, to
every texture allocated by the engine. So it repressents a 32x32 grid of the
texture, then we calculate some helping variables : CellWidth (Texture Width
/ 32) and CellHeigth (Texture Height / 32). All this info plus the texture’s
size and format is stored in an structure and added to a list of created
textures.

When we load an sprite we check the list of textures for one with the
propper format/size. If we find one we check the grid for a free rect equal
or bigger than the sprite, if found we return a sprite structure containing
a pointer to the texture info plus an X and Y offset of the sprite in the
texture. If no space was available on the textures already in use we create
a new one with a default size and add the sprite at offset (0,0) so we can
use it for another sprite later.

Of course that some space is wasted/unused but the algo is quite simple/fast
and doesn’t requiere much memory for the grid (128 bytes). If we don’t use
very large textures, for example 512x512, splitting them in a 32x32 grid
gives us a cell size (granularity) of 16x16 which is quite good for most
graphics. We can even define the grid height to a greater value to reduce
the granularity, or use qwords/int64 (natively on 64bit cpus or faking it on
32bit ones) to have a 64xYY grid

Pascal pseudo-code of the algo to find a free space.
It could be easily optimized, however I’ll left it like is for the sake of
readability (sp?)-------------------------------------------------------------------------

Const
GRIDWIDTH = 32;
GRIDHEIGHT = 32;
var
grid :array[0…GRIDHEIGHT-1] of DWord;

//clear the mask (only when creating a new texture)
fillchar(grid, sizeof(grid), 0);

//find out how many cells are needed to store the image
neededWidth := ImageWidth div CellWidth;
if (ImageWidth mod CellWidth)>0 then inc(neededWidth);
neededHeight := ImageHeight div CellHeight;
if (ImageHeight mod CellHeight)>0 then inc(neededHeight);

x := 0; y := 0;
while (y < GRIDHEIGHT-neededHeight) do
begin
yy := y;

while (x < GRIDWIDTH-neededWidth) do
begin
xx := 0;
//this loop could be easily replaced for a simple AND comparison with
a mask!
while (xx < neededWidth) AND NOT(grid[yy] AND (1 SHL (32-(x+xx)))) do
inc(xx);

  if xx=neededWidth then
   begin
      inc(yy);
      if (yy-y)=neededHeigth then break;
   end
  else inc(x, xx);
end;

if (yy-y)=neededHeight then break;

inc(y);
end;

//if an space was found then set texture offset where it should be placed
if (yy-y)=neededHeigth then
begin
offsetX := x * CellWidth;
offsetY := y * CellHeight;

[ UPLOAD THE SPRITE TO THE TEXTURE HERE ]

//update the grid to reflect the changes
yy := 0;
while yy<neededHeight do
begin
//here again we could just OR a mask instead of the loop!
xx:=0;
while xx<neededWidth do
begin
grid[y+yy] := grid[y+yy] OR (1 SHL (32-(x+xx)));
inc(xx);
end;
inc(yy);
end;

end;

ciao, Ivan

Bob_Pendleton · November 11, 2003, 11:53am

[…]

We didn’t try anything as complex as a bsp tree. We just broke up
square regions into smaller square regions and kept the free
squares on a list, one list for each size. Basically just a
variation of the memory allocators used by malloc() and friends.

Sounds like a 2D version of a commonly used real time memory manager
design.

That is exactly what it was but with all chunks being 2^n in size. So,
we indexed the list of lists by n.

Did you merge small squares back together when possible, and if so,
how did you go about finding them in the free pool?

We tried. The thing is you have to get all four sub-squares together
before you can merge them. Which is nearly impossible. We tried
inserting them in the free list in address order. Then you can do a
simple scan looking at sequential headers and look at the addresses to
see if any sequence of four items are all part of the same larger
square. (You can do that while you are inserting squares in the free
list.) This was actually for use in the X server on the old ESV
workstations so it worked pretty well. But, there was always some amount
of unrecoverable fragmentation.

	Bob PendletonOn Mon, 2003-11-10 at 17:07, David Olofson wrote:

On Monday 10 November 2003 22.29, Bob Pendleton wrote:

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl
–
±--------------------------------------+

Bob Pendleton: writer and programmer. +
email: Bob at Pendleton.com +
web: www.GameProgrammer.com +
±--------------------------------------+

David_Olofson · November 11, 2003, 1:28pm

[…]

Sounds like a 2D version of a commonly used real time memory
manager design.

That is exactly what it was but with all chunks being 2^n in size.
So, we indexed the list of lists by n.

Actually, most RT memory managers also use 2^n chunk sizes, since that
keeps the calculations fast and simple.

Did you merge small squares back together when possible, and if
so, how did you go about finding them in the free pool?

We tried. The thing is you have to get all four sub-squares
together before you can merge them. Which is nearly impossible. We
tried inserting them in the free list in address order. Then you
can do a simple scan looking at sequential headers and look at the
addresses to see if any sequence of four items are all part of the
same larger square. (You can do that while you are inserting
squares in the free list.) This was actually for use in the X
server on the old ESV workstations so it worked pretty well. But,
there was always some amount of unrecoverable fragmentation.

It seems to me that it should be theoretically possible to merge back
any groups of squares that used to belong together, although there
are no simple and obvious ways of finding out when to merge.

How about this:

When allocating a rect for splitting, mark it as ALLOCATED and keep
the structure. Add the resulting child rects to the free list for
their size. Also store child referenses in a table or list in the
parent rect. Put a parent reference in each resulting rectangle.

Now, whenever a rect is freed, it’s removed from the free list, as
well as from it’s paren’t table/list of children. When a rect has no
more children, it is marked as free, and returned to the free list.

(Beware of holes; I just dreamed this up as I wrote it.

Of course, this cannot eliminate fragmentation caused by long lived
objects, but that’s expected. The only sane (*) way to deal with that
is to move things around to physically defragment the texture space.

(*) If you really don’t want to move things around, you’re
out of texture RAM, and you can’t DMA from system RAM, you
could resort to using smaller tiles to make use of the areas
around those long lived tiles. Actually, maybe this is
sane in the case of glSDL, considering that it already does
tiling for other reasons…?

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Tuesday 11 November 2003 20.52, Bob Pendleton wrote:

Bob_Pendleton · November 12, 2003, 11:32am

[…]

Sounds like a 2D version of a commonly used real time memory
manager design.

That is exactly what it was but with all chunks being 2^n in size.
So, we indexed the list of lists by n.

Actually, most RT memory managers also use 2^n chunk sizes, since that
keeps the calculations fast and simple.

Yeah, the allocator we used was based on a mixture of experience. On one
side we had a group of people who had experience with RT image
generators used in flight simulators and on the other side we had a
group of people with experience implementing LISP and other programming
languages with dynamic memory allocation.

Did you merge small squares back together when possible, and if
so, how did you go about finding them in the free pool?

We tried. The thing is you have to get all four sub-squares
together before you can merge them. Which is nearly impossible. We
tried inserting them in the free list in address order. Then you
can do a simple scan looking at sequential headers and look at the
addresses to see if any sequence of four items are all part of the
same larger square. (You can do that while you are inserting
squares in the free list.) This was actually for use in the X
server on the old ESV workstations so it worked pretty well. But,
there was always some amount of unrecoverable fragmentation.

It seems to me that it should be theoretically possible to merge back
any groups of squares that used to belong together, although there
are no simple and obvious ways of finding out when to merge.

Yes, of course. IIRC, If you use the trick of computing an index for the
start of each block you can always count on certain patterns in the bits
of the index. Depending on the size of the parent block the starting
index of sub-blocks will follow the pattern x00y x01y x10y x11y when x
and y are bit strings of various lengths.

The smallest sized blocks will have addresses of the form:
x00, x01, x10, x11, the next largest block size will be:
x00bb, x01bb, x10bb, x11bb, the next will look like
x00bbbb, x01bbbb, x10bbbb, x11bbbb

and so on. So, if you now the block size you know the bits to check and
it becomes very easy to scan a sorted list of free blocks and merge then
into the next largest size.

How about this:

When allocating a rect for splitting, mark it as ALLOCATED and keep
the structure. Add the resulting child rects to the free list for
their size. Also store child referenses in a table or list in the
parent rect. Put a parent reference in each resulting rectangle.

Now, whenever a rect is freed, it’s removed from the free list, as
well as from it’s paren’t table/list of children. When a rect has no
more children, it is marked as free, and returned to the free list.

(Beware of holes; I just dreamed this up as I wrote it.

This, or something like it, works fine so long as their are a fixed
number of child blocks. I’ve used it in other memory allocators.

Of course, this cannot eliminate fragmentation caused by long lived
objects, but that’s expected. The only sane (*) way to deal with that
is to move things around to physically defragment the texture space.

(*) If you really don’t want to move things around, you’re
out of texture RAM, and you can’t DMA from system RAM, you
could resort to using smaller tiles to make use of the areas
around those long lived tiles. Actually, maybe this is
sane in the case of glSDL, considering that it already does
tiling for other reasons…?

Personally I’m starting to like the idea of a bit map based allocator
like the one described earlier. If you use two layers of bit maps you
can solve your problem pretty well. The top layer is used to allocate
max texture sized textures out of the available space and unused space
withing those blocks can be kept track of and allocated using another
bit map for each block.

		Bob PendletonOn Tue, 2003-11-11 at 15:27, David Olofson wrote:

On Tuesday 11 November 2003 20.52, Bob Pendleton wrote:

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl
–
±--------------------------------------+

Bob Pendleton: writer and programmer. +
email: Bob at Pendleton.com +
web: www.GameProgrammer.com +
±--------------------------------------+

David_Olofson · November 12, 2003, 12:57pm

[…]

Personally I’m starting to like the idea of a bit map based
allocator like the one described earlier.

Yeah. It allows for more efficient texture space allocation,
especially when dealing with odd (ie non power-of-two) sizes. It
might have fragmentation issues with certain mixes of allocation
sizes and tile sizes, but OTOH, it doesn’t chop up good contigous
space when allocating small areas.

If you use two layers of
bit maps you can solve your problem pretty well. The top layer is
used to allocate max texture sized textures out of the available
space and unused space withing those blocks can be kept track of
and allocated using another bit map for each block.

I’m not sure I get the point with the top layer bit map. To keep track
of allocations bigger than the max texture size…? That wouldn’t
make sense, since tiles in a surface can come from anywhere
physically; the only requirement is that there’s some way of finding
them when they’re needed.

Anyway, a 2048x2048 RGBA8 texture is 16 MB, which is rather big to use
as allocation granularity for many of the cards that support
2048x2048 textures. (Even some 16 MB cards support that large
textures, though it’s obviously not physically possible to keep one
in VRAM together with the frame buffer. It’s just supported because
textures can have less than 32 bpp, and/or because there are versions
of the card with more VRAM.)

Maybe it would make sense to have some kind of internal limit here,
maybe related to the display resolution or something… Tiles larger
than the screen don’t make much sense, even for huge surfaces. If
they do anything, it would be preventing OpenGL from swapping parts
of a huge surface (of which only a part at a time is used) out of
VRAM, to leave room for other data that is actually used every frame.

Then again, texture binding has a significant cost on some cards,
which makes this a balance act. Limit max texture size to twice the
size of the screen? Limit it so one texture uses less than 30% of the
available VRAM? Other ideas?

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Wednesday 12 November 2003 17.08, Bob Pendleton wrote:

Bob_Pendleton · November 12, 2003, 2:18pm

[…]

Personally I’m starting to like the idea of a bit map based
allocator like the one described earlier.

Yeah. It allows for more efficient texture space allocation,
especially when dealing with odd (ie non power-of-two) sizes. It
might have fragmentation issues with certain mixes of allocation
sizes and tile sizes, but OTOH, it doesn’t chop up good contigous
space when allocating small areas.

If you use two layers of
bit maps you can solve your problem pretty well. The top layer is
used to allocate max texture sized textures out of the available
space and unused space withing those blocks can be kept track of
and allocated using another bit map for each block.

I’m not sure I get the point with the top layer bit map. To keep track
of allocations bigger than the max texture size…? That wouldn’t
make sense, since tiles in a surface can come from anywhere
physically; the only requirement is that there’s some way of finding
them when they’re needed.

Of course tiles can come from anywhere. I was thinking of them as being
“logically” sections of a larger area so that it would be easy to fit
large textures into an array of max texture size tiles using the same
algorithm that is used to place smaller textures inside of max texture
size tiles.

This way you can handle small textures being mapped into parts of
several max texture sized tiles.

So, if each of the areas below represents a max texture size tile, and
the letter regions are textures, then you can get something like:

abbbbcccccccc444
11112cccccccc444
11112cccccccc444
11112cccccccc444
55556cccccccc888
5555666677778888
5555666677778888
5555666677778888

Where you have three textures, two of which span multiple regions. (hope
fully your allocator would do a better job than this. But this is how a
left to right, top to bottom, first fit allocator would allocate these
three textures.)

Clearly, you only need to do this when max texture size is smaller than
the available memory. Not to mention that you have to make some nasty
assumptions about the amount of available memory.

OTOH, you could just put your entire memory budget into this kind on an
allocator and work from there.

	Bob PendletonOn Wed, 2003-11-12 at 14:56, David Olofson wrote:

On Wednesday 12 November 2003 17.08, Bob Pendleton wrote:

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —

SDL mailing list
SDL at libsdl.org
http://www.libsdl.org/mailman/listinfo/sdl
–
±--------------------------------------+

Bob Pendleton: writer and programmer. +
email: Bob at Pendleton.com +
web: www.GameProgrammer.com +
±--------------------------------------+

Stephane_Marchesin · November 12, 2003, 3:16pm

David Olofson wrote:

Anyway, a 2048x2048 RGBA8 texture is 16 MB, which is rather big to use
as allocation granularity for many of the cards that support
2048x2048 textures. (Even some 16 MB cards support that large
textures, though it’s obviously not physically possible to keep one
in VRAM together with the frame buffer. It’s just supported because
textures can have less than 32 bpp, and/or because there are versions
of the card with more VRAM.)

Maybe it would make sense to have some kind of internal limit here,
maybe related to the display resolution or something… Tiles larger
than the screen don’t make much sense, even for huge surfaces.

Sure. Hey, some backends don’t even support surfaces larger than the
screen

If
they do anything, it would be preventing OpenGL from swapping parts
of a huge surface (of which only a part at a time is used) out of
VRAM, to leave room for other data that is actually used every frame.

Swapping textures from/to video memory has a higher cost than binding
new textures. I remember when one of my programs ran out of video
memory, and the 3D performance really suffered. Increasing the
granularity only makes this problem worse (to the point that the video
ram might get re-filled many times per frame).

Then again, texture binding has a significant cost on some cards,

That would have to be benchmarked I never found texture binding cost
to be that high, especially if you compare it to the cost of swapping
textures from video memory. Sure, the texture binding time is
driver-dependent, but uploading a texture to video ram will always kill
your performance.

which makes this a balance act. Limit max texture size to twice the
size of the screen? Limit it so one texture uses less than 30% of the
available VRAM? Other ideas?

You could use VRAM size but… there is no portable way that I know of
to find the video ram size in OpenGL :-/

Anyway, before starting using heuristics, you need some real-world
measures like the statistical distribution of the surfaces sizes and
such. I once tried to find an “average” surface size by running
different programs and printing statistics, just to find that each
program is really different : for example some allocate only small
surfaces, others allocate random sizes, others keep a copy of the
background… (for the record, the larger surfaces I could find were the
size of the screen, and the average surface dimension (x or y) was
around 100).

Anyway, I’m not sure this is a big deal, as there are OpenGL extensions
called “NV_texture_rectangle” and “EXT_texture_rectangle” that do what
their names say, ie prevent applications from wasting memory for non 2^n
textures. So you could just wait for it to become part of the standard
if you don’t want to solve a NP-complete problem (I for one don’t

Stephane

David_Olofson · November 12, 2003, 4:52pm

[…]

So, if each of the areas below represents a max texture size tile,
and the letter regions are textures, then you can get something
like:

abbbbcccccccc444
11112cccccccc444
11112cccccccc444
11112cccccccc444
55556cccccccc888
5555666677778888
5555666677778888
5555666677778888

thinks for a good while

Ah… Now I get it. (I think.

You effectively turn the available texture memory into a huge virtual
texture, from which you just allocate rectangular areas (addressed as
micro-tiles) as needed, disregarding texture boundaries. When
rendering, texture boundaries define quad splits - there’s no plain,
surface oriented, fixed size tiling.

Clearly, you only need to do this when max texture size is smaller
than the available memory.

Right, although you might sometimes have to use a number of smaller
textures to get at more than %50 (worst case) of the free texture
memory. If you can fit 1.9 (that is, one) 2048x2048 textures, you get
75% more space by using seven 1024x1024 textures instead.

Not to mention that you have to make
some nasty assumptions about the amount of available memory.

Yeah… In fact, I’m not sure it’s possible to do anything but make a
sensible guess and hardcode that into (gl)SDL. Maybe it’s sensible to
hardcode a limit at 512x512 or something. That’s a nice and handy 1
MB per texture, and it doesn’t generate too many extra polygons and
texture switches in reasonable resolutions. One might asume that the
vast majority of people who want to play a game in very high
resolutions, also have the CPU and GPU power to take the extra
overhead that the “small” textures cause.

OTOH, you could just put your entire memory budget into this kind
on an allocator and work from there.

That’s an option with glSDL, as it isn’t really supposed to be used
together with native OpenGL code anyway. However, I’m slightly
worried about what this might do to certain platforms, especially in
windowed mode… I have a bad feeling about allocating much more
texture RAM than you need.

So, how about just extending the large virtual texture area by adding
one or a few textures at a time, as needed? Areas within the bounding
rectangle that are not covered by a physical texture could be marked
as fully allocated as far as the area allocator is concerned.

Hmm… “Marked.”

Just copy the virtual memory design; assume that the (gigantic)
virtual texture space is fully populated by real textures, but only
actually allocate textures when the corresponding area is needed.

Of course, that requires that the area allocator is somewhat smart and
tries to keep the virtual texture space compact and nicely shaped.
(That requirement is lowered if textures are relatively small, so
that you’re likely to get free “holes” [no physical textures] in the
virtual space when freeing areas.) The big advantage is that you
don’t have to decide on any “realistic” absolute maximum surface size
to support. (SDL’s practical limit is 32767x32768, I think. You’d
need an even bigger virtual texture space to make sure allocations
don’t fail even if there is space.)

Next, one could have smart ways of populating the virtual texture
space, rather than always using max size textures. Now it’s getting
really hairy and fun.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Wednesday 12 November 2003 23.17, Bob Pendleton wrote:

David_Olofson · November 12, 2003, 5:28pm

David Olofson wrote:

Anyway, a 2048x2048 RGBA8 texture is 16 MB, which is rather big to
use as allocation granularity for many of the cards that support
2048x2048 textures. (Even some 16 MB cards support that large
textures, though it’s obviously not physically possible to keep
one in VRAM together with the frame buffer. It’s just supported
because textures can have less than 32 bpp, and/or because there
are versions of the card with more VRAM.)

Maybe it would make sense to have some kind of internal limit
here, maybe related to the display resolution or something…
Tiles larger than the screen don’t make much sense, even for huge
surfaces.

Sure. Hey, some backends don’t even support surfaces larger than
the screen

Ouch! You do mean hardware surfaces, right?

Anyway, this internal limit (or the max texture size) does not limit
the size of glSDL “hardware” surfaces, so it’s not quite the same
thing. (Tiling is applied on top of this, to make large surfaces out
of parts of textures.)

If
they do anything, it would be preventing OpenGL from swapping
parts of a huge surface (of which only a part at a time is used)
out of VRAM, to leave room for other data that is actually used
every frame.

Swapping textures from/to video memory has a higher cost than
binding new textures.

Absolutely - but if you’re out of VRAM, you’re at least better off
just swapping every now and then as you scroll across that huge
surface, than swapping all textures every fram, just because the
huge one has to fit.

I remember when one of my programs ran out of
video memory, and the 3D performance really suffered. Increasing
the granularity only makes this problem worse (to the point that
the video ram might get re-filled many times per frame).

Strange. One would think that the textures that have been unused the
longest are kicked first when others need to be swapped in. That
would handle scrolling over gigantic tiled surfaces, as well as
moving around in a 3D world with tons of textures just fine.

Then again, texture binding has a significant cost on some cards,

That would have to be benchmarked

Right.

I never found texture binding
cost to be that high,

Dito. It’s been insignificant on the cards I’ve messed with so far.

especially if you compare it to the cost of
swapping textures from video memory. Sure, the texture binding time
is driver-dependent, but uploading a texture to video ram will
always kill your performance.

Yeah.

Just to make things clear; what I’m saying is that allowing large
surfaces to make use of the max texture size means you risk forcing
OpenGL to somehow have the whole surface available, even if only a
fraction of it is visible in each frame. (I doubt your average video
card does partial caching of textures.)

Restricting the max texture size indeed means you may get some more
binding “overhead”, but it allows OpenGL to drop the invisible parts
of huge surfaces from VRAM until they become visible.

Or maybe this just isn’t implemented in your average OpenGL driver? In
that case, I guess we have to implement it in glSDL to make low
memory 3D cards usable with apps that use lots of surfaces, but don’t
blit all of them in every frame.

which makes this a balance act. Limit max texture size to twice
the size of the screen? Limit it so one texture uses less than
30% of the available VRAM? Other ideas?

You could use VRAM size but… there is no portable way that I know
of to find the video ram size in OpenGL :-/

Exactly…

Anyway, before starting using heuristics, you need some real-world
measures like the statistical distribution of the surfaces sizes
and such. I once tried to find an “average” surface size by running
different programs and printing statistics, just to find that each
program is really different : for example some allocate only small
surfaces, others allocate random sizes, others keep a copy of the
background… (for the record, the larger surfaces I could find
were the size of the screen, and the average surface dimension (x
or y) was around 100).

I would guess that cover’s most apps, but we do have to consider SFont
and the like, which are used really rather frequently, and tend to
generate insanely wide surface.

That said, even the current glSDL implementation is virtually
unlimited in that regard, as long as font height <= max texture size.

So, maybe it’s not worth it to consider the few applications that use
huge surfaces. The will run just fine on reasonably modern video
cards, and they should even run ok on anything that can DMA textures
from system ram. (That is, any AGP card and most modern PCI cards,
AFAIK.) Let’s optimize that case if it actually turns out to be a
problem.

Anyway, I’m not sure this is a big deal, as there are OpenGL
extensions called “NV_texture_rectangle” and
“EXT_texture_rectangle” that do what their names say, ie prevent
applications from wasting memory for non 2^n textures. So you could
just wait for it to become part of the standard if you don’t want
to solve a NP-complete problem (I for one don’t

Right, but I suspect both SDL and glSDL might be obsolete before every
card in use supports those extensions. Is it in OpenGL 1.4? Not
good enough, as lots of cards don’t have, and probably never will
have 1.4 drivers. Let’s not even think about 2.0…

Seriously, it would be nice to make use of that feature where it’s
available, but as it is, I think it’s just a cool performance hack
that may work for some of the potential glSDL users. It should be
simple, though, so we might as well throw it in, once we have the
required, portable, minimal system requirement stuff working.

//David Olofson - Programmer, Composer, Open Source Advocate

.- Audiality -----------------------------------------------.
| Free/Open Source audio engine for games and multimedia. |
| MIDI, modular synthesis, real time effects, scripting,… |
`-----------------------------------> http://audiality.org -’
— http://olofson.net — http://www.reologica.se —On Thursday 13 November 2003 00.24, Stephane Marchesin wrote: