SDL OpenGL ES2 autobatching patch

That’s right, but you only need the hint if you request a specific renderer. If you call SDL_CreateRenderer() with a -1 index (and haven’t forced it with the SDL_RENDER_DRIVER hint), it’ll do batching, under the assumption that you won’t call OpenGL/Direct3D/whatever directly, since you might not have gotten any specific GPU API anyhow. If you tell SDL you need OpenGL or Direct3D or whatever, we have to assume you might call it at any time, and thus turn off batching. That being said, my testing with most renderer backends without batching is still faster than 2.0.9 due to other optimizations, but YMMV for several reasons.

It’s good practice to set the hint to allow batching anyhow, though, since an end-user might force a renderer with an environment variable, even if your app doesn’t call any lower level APIs.

(It’s also worth noting that Metal always does batching; because of the nature of its command queue, we can make guarantees that we can’t in the GL state machine. I expect a potential future Vulkan or Direct3D 12 renderer would be similar.)

In 2.1, we’ll probably remove this hint and make it so you must call SDL_RendererFlush() before touching a native API and always do batching otherwise.



I’d have to experiment; my thinking is that it reduces bandwidth to only send the color as a uniform if it changes, instead of having to send it 4+ times for each primitive, but maybe in practice changing uniforms causes more stalls than just sending a larger vertex buffer once.

Some of the render backends (Direct3D 9 and 11) already do this as per-vertex data, though, since they have more-strict requirements on vertex format, but I don’t know which is better in terms of performance.


Ryan, consider it somewhat tested with no obvious problems so far. Good job!

What needs to be added is anisotropic filtering scale quality. Currently only “nearest” and “linear” have an effect and inistotropic reverts to linear. I assume that’s what’s needed for textured quads to be antialiased so that they fall between pixels? I only looked at the Direct3D renderer. I guess the equivalent applies to the OpenGL renderers.

While there, you could swap the scale x & y and rotation calculations in the CopyEx methods pretty please. Currently it applies the global scale before rotating, which I believe is undesired in all cases. Needs to rotate the quad, then scale.



1 Like

I could observe this too for the Direct3D renderer. Software, GL and GLES work just fine for me.
I am on win10 x64; sdl2 is built with visual studio from the latest upstream (12421 (144400e4630d))

Increasing the size of the window does not increase the size of the renderer/viewport (even if i set viewport manually after the resize event); Decreasing the size of the window decreases the renderer viewport too, but it will stay at that smaller size no mater how big i resize the window afterwards. The window itself resizes, the rest just stays black.


I think someone opened this in Bugzilla; I’ll try to look at it this week.


Someone’s already posted a suggested patch for this on Bugzilla. Give it a try Ryan:

I just had a quick test of the software renderer and noticed TextureBlendMode, TextureColorMod, and probably TextureAlphaMod are being ignored in HG2.0.

So how about moving color and alpha from uniform to the vertex attributes for opengles2?

1 Like

@icculus Hi
I have a question about color and batching. Why would you prefer to pass color through a uniform instead of using vertices?
In your code I found

        /* currently this is sent with each vertex, but if we move to
           shaders, we can put this in a uniform here and reduce vertex
           buffer bandwidth */

I haven’t benchmarked it, but my thinking is that generally the color doesn’t change from draw call to draw call–and certainly not between vertices–so moving color to a uniform reduces color data sent to the GPU to 25% of current usage at a minimum. If you never change the draw color (which is a likely scenario if you are just uploading textured quads), it gets set once and never again, so closer to 0% of current usage in that case.

I’ve been reworking SDL 2.0.8 to pass color to the vertex and in a few years of use I haven’t noticed any performance issues. I also implemented my own batching system for OpenGLES2 and received significant performance gains.
But now I have to upgrade to SDL 2.0.12, and the whole scheme is broken. Even if we ignore the color, the newly added batching does not work. More precisely, it works in half. Now the entire buffer is loaded into the GPU at one time, but for the same textures glDrawArray is called separately, which leads to significant FPS drops.
For example, we have implemented support for in our engine. Previously, all rendering took me one call to glDrawArray, but now each particle (maybe several thousand per frame) calls its own glDrawArray …
Is it possible in the future to change the logic to transfer the color to the vertex? Or could it break existing programs? Or if I do, how can I suggest a patch?

Is it possible in the future to change the logic to transfer the color to the vertex?

Oh, wait, I am misremembering, we already do push vertex color into uniforms now in the gles2 renderer, my apologies.

We are unlikely to change this back, but we could probably batch sequential draw calls that use the same texture.

But: if I’m understanding you correctly: you’re trying to use OpenGL on top of an existing SDL 2D renderer, and honestly, I think you take your chances by doing this, as sometimes these renderers change their internal behavior (or sometimes we might suddenly decide the default is Metal instead of OpenGL on macOS or whatever).

I would probably go so far as to recommend you copy the renderer code out of SDL and build on top of it so the behavior doesn’t change, if you absolutely need to mix your own GL calls on top of it. The zlib license allows this.

(If I’m misunderstanding this situation further, I apologize again.)

All I want from the SDL (as I wrote earlier in other topics):

  1. Real batching for difficult situations (as an example, the particle system above). To do this, transfer the colors from the uniforms to the vertex, since changing the uniforms breaks the pipeline. And also you need to add batching of the same textures (again, for the example of the particle system, one texture make 1000 draw calls instead of once).
  2. Simple API for drawing a triangle. 2D triangle.

All this is done quite simply and does not break the ideology that SDL is Simple.
As an example of what I did (for myself, I only implemented GLES 2 and D3D9, but the rest are not difficult to implement either):

typedef struct SDL_Vertex
    SDL_FPoint pos;
    SDL_FPoint tex;
    SDL_Color color;
} SDL_Vertex;

extern DECLSPEC int SDLCALL SDL_RenderDrawArray(SDL_Renderer *renderer, SDL_Texture *texture, SDL_Vertex *sdl_vertices, int offset, int count);

And this allows you to do a lot out of the box (without using complex libraries like bgfx or writing your own render) SDL_demo.wmv (3.8 MB) (sorry for the quality, but the site has a file size limit of 4096 KB).


Reading back through this thread, you said in November 2018:

I think it’s unfortunate if the experiment never took place, especially if sending the color as a uniform “breaks the pipeline” as the OP claims.

I guess it did, if I made the change? I honestly don’t remember.

I guess what I’m struggling with here is what “breaks the pipeline” means…I read this as “I call into OpenGL directly in addition to using the 2D render API and relied on how the 2D API interacted with OpenGL” but now I’m not really sure. Can I get some clarification, @rmg.nik?


I mean the rendering pipeline. When glUniform is called, batching breaks, and thus for GL/GLES2 it is impossible to draw the same texture with different colors many times in one call (unlike D3D9).
That is why I asked why it was implemented through uniforms… In addition, the current implementation of batching operates with TRIANGLES FAN, which will also prevent batching.
As I understand it, no one will redo the current logic, so I have already started replacing SDL with SDL_gpu (or I will think about bgfx as it will give advantages in the future). Although personally I only need 2D (on Win/Linux/iOS/Android), so I really don’t want to leave the SDL…

I certainly hope that’s not the case. So long as the batching can be changed internally without affecting the published SDL2 API, anything which improves performance ought to be on the table. I can see that drawing the same texture multiple times in different colors might be seen as an edge case, but if it can be supported without adversely affecting more common operations I’m all for it.

I haven’t seen the performance improvement from batching that I was hoping for, so maybe the issues you have identified are partly responsible. If it’s the case (and my understanding isn’t good enough to know) that calling SDL_SetTextureColorMod() flushes the batching queue it could explain it.

Ah, I understand the issue now.

I won’t be able to get to this for 2.0.14, but it’s a good idea for 2.0.16. I might be able to use vertices vs uniforms depending on how much the draw color changes, and do more than one draw per call if the texture hasn’t changed.

It seems to me that it is necessary to consult with someone who knows the limitations of modern (and not so) GPU platforms. Most likely, it makes no sense to save the channel bandwidth by transferring the color at the uniform.
But if you make all the backends the same (XY_UV_RGBA format), it will simplify the writing of the autobatching mechanism and make it the same type for both D3D and GL backends.

I have already changed the rendering to use SDL_gpu, and on my PC this reduced the load on the GPU a little more than twice.

Is it in 2.0.15? (I know I could check myself, but it’s easier to ask!).

It is not yet…this whole dev cycle I’ve been consumed with the GitHub migration and other tooling updates. Hoping that’s changing shortly.

1 Like