Renderer vs SDL_gpu for text rendering?

Any time you change state or change what kind of thing is being drawn, SDL can no longer batch it into one draw call and has to submit whatever has been queue’d up for rendering and then start a new batch. So, using a different texture with RenderCopy(), changing the draw color (maybe), drawing lines, drawing rects, etc.

Just to clarify, I presume what you mean is “different destination texture”. It must be able to batch multiple source textures because that’s often what’s needed for rendering text glyphs, sprites etc.

Simple batching in OpenGL (though I can’t confirm what SDL is doing) does not batch multiple source textures. The source texture is a separate part of the driver state in OpenGL and so multiple cannot be sent in a single draw call to execute the batch. For fonts and sprites, this is why we try to use packed font texture and sprite atlases. The source texture can remain the same through multiple sprites and just the source rect is specified differently per sprite, no state change required.

Changing the texture to none or changing the shader (e.g. rendering lines, then sprites, then lines) requires state changes. To optimize batching this for OpenGL, you’d probably want to unify the shader for both and skip texture changes when possible or try to use texture arrays.

Luckily SDL_fontcache uses texture atlases. It uses 12w x 12h textures so there’s room for 144 glyphs per texture; but each of the four fonts I’m using (normal, italic, bold, bold-italic) are cached separately, so will require state changes. It sounds like I most likely want to draw the text in four passes, one for each font variant.

Text in different colours is done using SDL_SetTextureColorMod() to change the colour modulation on the source texture (the atlas). I’m guessing that’s also a state change…

I’m kinda tempted to switch to pure OpenGL, for maximum overkill for an app which started life as a terminal-based word processor, but OpenGL/OpenGLES compatibility is a bind.

No, I mean source textures. It’s a limitation imposed by the underlying graphics APIs, and is why using texture atlases is important.

The only ways for SDL to use multiple source textures in one draw call (which is what batching does; try to cram as much into one draw call as possible to minimize draw call overhead) would be to either use an array texture, which isn’t available everywhere and has the pesky limitation of all textures inside having to be the same size and pixel format, or use bindless texture arrays, which is really only possible on newer OpenGL versions (and has limitations in Metal). In either case it would then have to somehow pass the index of each texture along with the texture vertices.

I bow to your superior knowledge, but it’s not what icculus said in this post. There the question was asked:

What i have heard is that the flushing happens when you render a new texture, so if you render 2 sprites multiple times it is way faster to render A A B B rather then A B A B, since it only has to flush 2 times rather then 4, is this correct?

to which the reply was:

This will not flush at all in 2.0.10, assuming those SDL_Textures haven’t changed in some way between SDL render calls.

He does go on to say:

That being said, when flushing does happen, it benefits from using the same texture twice in a row, as SDL is now smart enough to only bind the texture once (in GL, Direct3D, etc) and draw from it multiple times, so you’ll see performance increases in this scenario beyond the higher-level “flushing” that SDL does.

So I wonder if we are talking about two different kinds of batching. I was referring to SDL’s internal batching, first enabled in SDL 2.0.10, which is independent of the backend and seemingly can batch multiple source textures. Whereas you may be talking about a lower-level batching within OpenGL.

I was talking about SDL’s batching. It doesn’t batch different source textures into one draw call, at least as far as I know, given that it has to be implemented on top of GPU APIs that don’t support doing that*

I’m assuming @icculus was talking about using texture atlases.

The last paragraph you’re quoting seems to be talking about how even without batching (turned off or just not present) SDL is smart enough where if you call SDL_RenderCopy() twice with the same source texture it won’t waste time (and incur a state change) by binding the same texture again for the second RenderCopy call (it’ll still do two draw calls though).

*with the exception of something like instanced rendering with array textures or bindless texture arrays

That’s not how I read his post: he referred to SDL_Textures in the plural. We should perhaps wait for him to comment.

Yes, it won’t bind the same texture twice, because now it caches this state and knows not to bind it again, which wasn’t true in earlier versions of SDL.

When I said about the “SDL_Textures changing,” I meant that if you try to update a texture and there are batched draws waiting that need the current contents of the texture, it will force a flush so you get correct rendering before the texture contents change. Me using a plural here was just a stylistic choice…or an oversight, whichever applies.

But @sjr is right, at the current time, this will still be two separate draw calls from the same texture; the render API isn’t (currently) smart enough to notice that a string of SDL_RenderCopy() calls all use the same texture and collect them all into a single draw.

In theory, this can be done with some reworking of SDL’s internals and without a new API, but no one has tried to implement it yet. My attitude is that the worst fire of the render API’s performance is put out by the batching code, but that doesn’t mean there isn’t still a lot of low-hanging fruit out there.

Thanks for the explanation. I’m still not entirely clear, though, whether SDL’s batching offers any benefit if you are doing many SDL_RenderCopy() calls with different source textures, or whether it doesn’t help at all.

In my app I use SDL_gfx to draw text, and it creates (and caches) a separate texture for every glyph, it doesn’t use a font atlas. I explicitly enable batching (in my case it would be disabled by default because of hints) thinking it would help, but perhaps it doesn’t.

it creates (and caches) a separate texture for every glyph

In terms of performance, this is often considered a bad idea regardless of the API used to render it, fwiw.

In this case, though, SDL’s batching still gets you some wins, as there is a lot of other state we can cache, but moving to a texture atlas will be a speedup in any case, as there are less texture binds, and maybe some day significantly less draw calls.

(but for a small amount of text, maybe this isn’t a big deal in practice.)

SDL2_gfx has been around for several years so it’s evidently not been considered something which needs to be addressed urgently. Certainly in my application the text rendering performance has never been an issue, even on relatively slow platforms like the Raspberry Pi.

Thanks for the confirmation.

but moving to a texture atlas will be a speedup in any case, as there are less texture binds, and maybe some day significantly less draw calls.

Wait, so does SDL’s batching not combine multiple calls to RenderCopy() with the same source texture into one draw call?

Wait, so does SDL’s batching not combine multiple calls to RenderCopy() with the same source texture into one draw call?

Not at the moment, but that’s planned future development (it won’t require any API changes, just improvements to the rendering backends).

I thought that was the whole point of the batching system, to reduce draw call overhead. Huh.

In this article Ryan describes in some detail how the batching works. It says “Everything that looks like a rendering operation (not just draws but setting the viewport, the cliprect, etc) goes into a linked list” (which AIUI is stored in a Vertex Buffer Object).

There’s a very simple way to implement batching under the hood that would coalesce all draw calls with the same source texture, using a multimap. Unfortunately, this very simple method is also likely to introduce errors, as it could conceivably reorder draw calls and cause sprites to be drawn on top of something they were supposed to be rendered underneath of.

A slightly more complicated method would be to introduce a “begin layer / end layer” API that uses a multimap and lets the developer do all the rendering they need one layer at a time, with the responsibility being on them to render the layers bottom-to-top.

Not that @icculus needs our advice on this, but simpler and easier would be to put all polygons with the same texture into an array, and when the array is full or something else happens that would necessitate flushing it, submit that array as a VBO/VAO for drawing. Preserves draw order etc

What I did was create my own content manager system but for glyphs.
Basically 3 structs. Glyph, glyphset and Fontmanager.
The Glyph struct stores the individual glyph.
glyphset keeps all the glyphs in a single set of a font. Such as if you had X font, size, bold or italic… one that differs slightly can be stored in a different set.
The fontmanger is responsible for loading and storing the sets and deleting them at the end of the program.

When it comes to performance glyphs are each individual texture. So if you just display them the way they appear in a line you will take a performance hit. You will get improved performance by stepping through them alphabetically. (Batching)

There is another alternative rather than storing the glyphs as textures you could store them as surfaces. this has a number of advantages. You can easily manipulate the color of them. You could render copy them yourself to a surface and then display the single surface to screen. In my case I would do that off on another thread.

opengl can give you an advantage when it comes to performance. You have access to instancing not just batching. It is also more friendly to multithreading. SDL2 appears from what I can tell to have some threading involved in its renderer system. you can run into issues were threads can conflict with that thread and actually slow stuff down.

To give you an idea how much batching can make a difference. I sent 40,000 circles bouncing across my screen. 5 different colors. Using batching I got over 60 fps. Swap back forth between circles instead of batching and it drops to 25fps. Even when I batched at 1000 circles it still gave me 60fps.

Anyway hope that helps.

More or less, this is what I went with; when we hit a draw command, we just look ahead and mush all the draw commands using the same texture into a single glDrawArrays() call.

The work-in-progress patch is here:

This has a ways to go, but initial testing looks promising!

2 Likes