Cost of creating textures vs. holding onto them

A question that I often find myself wondering, when writing programs under SDL2, is how long to hold onto textures. When does it make sense to destroy a texture (and regenerate it later if needed)?

My vague sense is that textures are relatively expensive to create, and relatively cheap to keep lying around – but I don’t really have any good information to back that up with. Did I just make up this rule of thumb?

I should perhaps emphasize here that a lot of my SDL programming is not your classic 60fps fullscreen game, but programs that are typically running in a window and only redraw in response to user input. In such an environment, it seems that texture memory might be more of a shared resource? In which it would make sense to not keep them lying around when they’re not in use.

When I say “textures are relatively expensive to create”, I’m thinking as much about power usage as I am about performance. I don’t want to be spinning up GPU fans or draining batteries, just because I keep destroying and recreating a bunch of textures.

Now of course I could try to profile my code and try to get answers to these questions, but that only tells me about my own platform. I’m also wondering if the answers to these questions differ signifcantly on the other major SDL-supported platforms.

(If you’re wondering where I’m getting so many textures from that I have to juggle, a lot of it comes from text rendering. When rendering a multi-line input field, for example, there’s a lot of text that could change on each input event – but the most common case is appending text to the end, in which case it seems like it would make sense to cache textures until they are changed. But I don’t feel like I have any sense of the tradeoffs that are involved with deciding e.g. whether or not to manage a texture cache.)

2 Likes

Honestly, It Depends.

Creating textures is slower than reusing them, of course, but in your specific case you’d only be doing it in response to the user doing something, rather than in a tight rendering loop in a video game.

In the case of a multi-line text input field, you could just make a texture that’s the size of the input field and then clear and replace its contents when the user types something (assuming you’re doing the text rendering on the CPU and then sending that off to the GPU), rather than destroying the old texture and creating a new one.

One strategy could be to keep textures around until you reach a certain memory usage limit, then kick out any that haven’t been used lately (and also keep track of how often you have to do this at runtime, so if it turns out your limit is too low you can let the cache grow instead). Of course, SDL doesn’t give you any info about how much memory a texture actually uses (some underlying GPU APIs don’t expose this information) so your own count could be wildly off unless you access the underlying API and check for yourself (if possible).

Another possibility, depending on how finnicky you need to be about text rendering, would be to use something like stb_truetype. So instead of rendering your text and then uploading that to the GPU as a texture, it renders a whole font to a texture as a texture atlas, then for a given glyph will give you its texture coordinates, how far to advance in X and Y when drawing, etc., and you just do a bunch of SDL_RenderCopy() calls with that information. The advantage being that you just have one texture per font and font size, so the memory usage is low and it’s very fast to draw. The downside is that the text rendering isn’t as nice as if you’d used something like FreeType or whatever.

I had considered this approach – repeatedly using SDL_UpdateTexture() to overwrite a texture’s contents – but my thought was that the vast majority of expense in texture creation was sending the data from RAM to the GPU, and that allocation of the texture itself was negligible. If that’s not always the case, then that’s good to know.

Yeah, I think this gets to the root of my confusion. I don’t really have the tools to gauge how good or bad of a resource hog I am. I suppose the solution here is to just do some ad hoc exploration manually in order to develop some rough instincts on the subject.

Oh cool – I’ve thought about that approach, but I didn’t know about stb_truetype actually worked that way. Of course, in the case of an input field, I imagine this approach won’t work. Unless you limit your input to (say) European-only character sets, good international support would make for some enormous textures.

The driver must also allocate memory. Depending on how badly fragmented GPU memory is, this may be slow. Of course, you wouldn’t be doing it every frame. But if you’re creating and destroying a lot of textures, fragmentation could occur quickly. The driver may also have an optimized path for replacing a texture’s contents vs creating a new one from a given image.

Visual Studio and Xcode will both tell you how much memory your app is using. There are profiling tools like valgrind and dtrace for Linux that can also tell you. On Windows and Linux there’s also RenderDoc, which should (I have not checked this) be able to tell you how much VRAM a given texture uses (though IDK if it can tell you the overall usage).

I don’t think any one font supports all possible languages anyway, so you’d have to have different fonts for different languages anyway. They don’t all have to be on the same texture.

edit: And depending on exactly what the app you’re making actually is for, it may be better to build a regular UI app instead of trying to do it with SDL.

Thanks! I’m not familiar with this tool, and it sounds really helpful. (After decades of being familiar with how to balance things like CPU vs RAM, or SSD vs hard disk, it’s disconcerting to not understand GPUs to the same degree.)

Indeed! Part of the vagueness of my original question is that I didn’t have a particular program in mind. Rather, this uncertainty about tradeoffs has been a regular refrain across many programs I’ve written with SDL2, with varying amounts of text management in them.

Not all possible languages, no, but any Unicode font is going to support the common alphabets at least (including Greek, Hebrew, Cyrillic and probably Arabic) plus a large number of non-alphabetic symbols. As an example, the DejaVuSans font contains 5720 characters.

This is, in my opinion, the weakness of the texture atlas approach, because every character you might need to support (and in the case of a multilingual text editor that’s a lot) must be included, resulting in a potentially huge texture.

My experience is that using a separate texture for every glyph (dynamically allocated so only those characters actually needed are populated) works extremely well. It limits the extent to which batching can be applied, but performance still seems to be good.

one texture per glyph means one draw call per glyph (and changing state in between by selecting a different texture), that is generally not efficient.
Having all glyphs relevant (for the current language) in one texture allows you to create one “geometry” (triangle-strip or whatever) for the whole text, likely one quad (two triangles) per glyph (with texture coordinates that select the right glyph from the texture).
Doing that might even be possible with SDL_Render, now that it has SDL_RenderGeometry - SDL Wiki

But depending on your usecase it might be easier (and good enough) to create one texture (with stb_truetype or similar) per “text” (e.g. page of a book) that you want to draw on screen and then render a quad with that texture (instead of drawing each glyph separately)

I know the theoretical arguments, but in practice it works extremely well, and that’s good enough for me.

But in general there’s no such thing as “the current language”. For example my main application is a programming language, in which keywords and variable names are in English but literal strings and comments may well contain российского текста or ελληνικού κειμένου!

Code Pages were long ago abandoned in favour of Unicode. The expectation now is that (at least) all the characters in the currently-selected font will be available, and that’s likely to be many thousands.

Imagine trying to render a page listing the translations of a phrase entered by the user into several different languages with multiple alphabets. How would you approach that using a texture atlas?

Good for you :wink:
In Yamagi Quake2’s GL3 renderer I (lazily) implemented drawing characters (mainly in the console and from diagnostic messages) with one draw-call per char (even though the “font” is a single texture).
It works mostly well enough, but on Raspberry Pi the slowdown really is noticeable when opening the console or showing lots of debugging messages - so the problem isn’t just theoretical.

Imagine trying to render a page listing the translations of a phrase entered by the user into several different languages with multiple alphabets. How would you approach that using a texture atlas?

  1. This is a very special usecase (uncommon in games and such)
  2. But, example: If a glyph fits in 24x18 pixels (which it should if the font size isn’t too big), a 1024x1024 pixel texture can hold more than 2300 glyphs - and a 2048x2048 texture can hold more than 9600 glyphs.
    Yes, those textures are kinda big, but well within what most hardware (including smartphones) supports…
  3. And you could still use the method of rendering the whole text (or just relevant parts with “unusual” chars) to a fresh texture with stb_truetype and rendering one quad with that texture

Uncommon in games, no doubt. Common in any program handling arbitrary text, when you can’t predict ahead of time what characters or alphabets might be wanted (word processing, powerpoint, web browser, programming language etc.).

Sounds like you are still thinking 96 DPI. ‘Retina’ displays (300 DPI or so) are now commonplace, and in my application a 72-point font size isn’t unreasonable. So a character cell of perhaps 300 pixels high by 200 wide!

To get a feel for the performance you can easily get using one texture per glyph, try this in which the entire frame is redrawn every frame in order to achieve the scrolling.

That’s about 300 characters (and nothing else!), uses half a core on my (high-end Desktop) CPU and when it’s running, my GPU-utilization increases from 0-10% to 20-25% (Geforce 3060Ti). That’s with both Firefox and Chromium on Linux (X11).

I don’t know how much of that is due to inefficiencies in the browser and X11 and whatever, and how much due to drawing each character separately - and keep in mind that ~300 chars isn’t that much, especially when nothing else is happening. With a higher resolution (and smaller font) the overhead will be more noticeable (even a “standard” Linux Terminal with 80x24 chars has >6x as many chars, and on modern desktops Terminal emulators are often used at bigger sizes, and my Yamagi Quake2 console has about 40x60 chars at 1680x1050 - yes, the width is limited to ~40).

Precisely, so you can’t draw any conclusions.

I don’t know of any practical way that I could use a texture atlas in my application since I can’t impose on my users any restriction on the number of glyphs, size of character, mixture of fonts and styles etc. In the seven years since it was first released, nobody has complained about the text rendering performance.

For what it’s worth, many engines/games/tools (including mine, LÖVE) dynamically build a texture atlas of glyphs at runtime per-font, on demand. This has worked very well and dramatically increased the performance of text rendering for me, although the code to build the atlas is a little more complicated than not having it of course. It avoids needing a separate draw call per glyph and it also avoids needing to know the necessary glyphs before running the app.

If text rendering isn’t any sort of performance or power usage issue at all in a particular app then it’s a moot point either way, of course.

1 Like

And that’s exactly what motivated my asking this question. If the naive approach isn’t contributing signficantly to performance or power usage, then I’d rather not build a complex approach. But finding the answer to that “if” is apparently not straightforward.

So how would that work in, say, a conventional word-processing application? The user is typing text at the keyboard, and you have no way of knowing in advance what alphabets, characters and symbols he may type. If he types a character not in the current atlas, do you dynamically build a new atlas at that point? Or what?

I would assume that’s what they’re doing.

I worry that, potentially, rebuilding the texture atlas with every new character typed could significantly affect responsiveness to the keyboard, especially if the atlas already contains a large number of glyphs.

In a word processing application responsiveness is probably more important than rendering speed (there is typically no requirement to re-render the content frequently, only when the page scrolls or is re-sized or something).

It’s unlikely that the user is going to type a completely different language at every keystroke, however.

For something like a word processing application it’d probably be better to use the system’s UI toolkit (or an abstraction meant for regular UI stuff, like QT/wxWidgets/etc) and font rendering than something meant for games etc. like SDL.

Not a different language, no, but if he’s typing a document which is primarily English but which includes quotes from another language, it is entirely likely that several consecutively-typed characters won’t be in the existing atlas. Are you proposing that when he types the first ‘foreign’ character all the letters in that alphabet should be added to the atlas at once?

There is perhaps an argument that SDL1 was designed “for games etc.” but surely that’s not the case with SDL2 (or if it ever was, it isn’t any more). It’s pretty clear from the SDL2 API that it’s intended for a much wider spectrum of applications, and of course mine isn’t a game (it’s a programming language).

If you do that it becomes a single-platform app. The purpose of SDL is to act as an abstraction layer to make it easy to write cross-platform applications, so the same source code can be expected to run substantially identically on multiple platforms. It’s perfectly sensible to want to write a cross-platform word processor!