A couple of questions regarding batching in SDL 2.0.10


#1

I am kinda new to the concept of batching i computer rendering, so don’t completely understand how it works yet.

From what i have gathered is that with batching you can take multiple draw calls and send them in a batch and then you can flush them to have them rendered by the GPU. However since in SDL 2.0.10 the flushing happens automatically, when exactly does the flush happen?

What i have heard is that the flushing happens when you render a new texture, so if you render 2 sprites multiple times it is way faster to render A A B B rather then A B A B, since it only has to flush 2 times rather then 4, is this correct?

Also if that is how it works, if you try to render multiple clips of the same texture, does it also flush after each time?

I am wondering because i have made a doom style 3d engine, that calls SDL_RenderDrawCopyEx() for every verticle wall on the screen using the same texture, but when i switched to SDL 2.0.10 i didn’t really seem to get any performance boost at all.


#2

It only flushes when forced to. Specifically, in 2.0.10:

  • When you call SDL_RenderPresent(), it flushes because it needs to get pixels to the screen.
  • When you call SDL_SetRenderTarget(), if flushes because it needs to get pixels to the current render target before we start using a new one.
  • When you call SDL_RenderReadPixels(), because we need it to have the right pixels rendered before we read them. :slight_smile:
  • When you Update/Lock/Bind/Unbind/Destroy a texture that’s been used by the renderer since the last flush, so we get the right texture data when drawing with stuff that’s been batched up.
  • When you call SDL_RenderGetMetal*, so things are in sync.
  • When you call SDL_RenderFlush(), because you’re telling it that you have to flush right now. Don’t call this unless you plan to use the underlying API directly and need things to be in sync (like if you plan to use the render API forced to OpenGL and also call OpenGL directly on that same rendering context).

The idea with batching, though, is that it should generally work like it always has, just faster. Specifically:

This will not flush at all in 2.0.10, assuming those SDL_Textures haven’t changed in some way between SDL render calls.

That being said, when flushing does happen, it benefits from using the same texture twice in a row, as SDL is now smart enough to only bind the texture once (in GL, Direct3D, etc) and draw from it multiple times, so you’ll see performance increases in this scenario beyond the higher-level “flushing” that SDL does.

The batching code started in 2.0.9, so you’d have to measure against 2.0.8 to see if there was a performance boost. There are a lot of factors that will dictate if this improves performance (not the least of which is: if you have vsync turned on and you were already rendering faster than 60fps, your framerate will never go above 60).

Also, if you’re forcing a renderer, either with the SDL_HINT_RENDER_DRIVER hint or by giving a specific renderer’s index in SDL_CreateRenderer, batching is disabled by default because we don’t know if you’re going to use a specific lower-level API directly and need the pre-batching behaviour from SDL. You can force batching on in this case with SDL_HINT_RENDER_BATCHING.

–ryan.


#3

Wait, sorry, I’m wrong about that: it didn’t make it into a release until 2.0.10.

–ryan.


#4

Thanks for clearing all this up.

Is there any way for me to know exactly when it flushing a patch, or maybe see how many flushes it does per frame?

since i am already capping my FPS manually to 60 using a timer i want to see if there are any other reason to why it might be flushing more then it needs to.


#5

Is there any way for me to know exactly when it flushing a patch,

If you’re willing to build your own SDL from source code, you can uncomment or change the DebugLogRenderCommands function in SDL/src/render/SDL_render.c, which is called once for each flush.

In a perfect world, it flushes once per render target (or once per frame if you don’t use render targets).

–ryan.


#6

If there is the time and the interest, id to go a bit further and read the code and step thought a few frames using a debugger. Look at what happens in the functions etc. The new batching backend has been fun to read and understand, at least for me :smiley:


#7

i did what you suggested and Built SDL with the log uncommented which gave me this result.

1 joysticks were found.

PS4 ControllerINFO: Render commands to flush:
INFO:  1. set viewport (first=0, rect={(0, 0), 1024x576})
INFO: Render commands to flush:
INFO:  1. set viewport (first=0, rect={(0, 0), 640x360})
INFO:  2. set cliprect (enabled=false, rect={(0, 0), 0x0})
INFO:  3. clear (first=0, r=238, g=238, b=238, a=255)
INFO:  4. set draw color (first=0, r=255, g=255, b=255, a=255)
INFO:  5. copy (first=0, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  6. copy (first=96, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  7. copy (first=192, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  8. copy (first=288, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  9. copy (first=384, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  10. copy (first=480, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  11. copy (first=576, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  12. copy (first=672, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  13. copy (first=768, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  14. copy (first=864, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  15. copy (first=960, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  16. copy (first=1056, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  17. copy (first=1152, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  18. copy (first=1248, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  19. copy (first=1344, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  20. copy (first=1440, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  21. copy (first=1536, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  22. copy (first=1632, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  23. copy (first=1728, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  24. copy (first=1824, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  25. copy (first=1920, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  26. copy (first=2016, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  27. copy (first=2112, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  28. copy (first=2208, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  29. copy (first=2304, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  30. copy (first=2400, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  31. copy (first=2496, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  32. copy (first=2592, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  33. copy (first=2688, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  34. copy (first=2784, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  35. copy (first=2880, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  36. copy (first=2976, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  37. copy (first=3072, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  38. copy (first=3168, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  39. copy (first=3264, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  40. copy (first=3360, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  41. copy (first=3456, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  42. copy (first=3552, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  43. copy (first=3648, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  44. copy (first=3744, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  45. copy (first=3840, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  46. copy (first=3936, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  47. copy (first=4032, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  48. copy (first=4128, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  49. copy (first=4224, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  50. copy (first=4320, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  51. copy (first=4416, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  52. copy (first=4512, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  53. copy (first=4608, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  54. copy (first=4704, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  55. copy (first=4800, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  56. copy (first=4896, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  57. copy (first=4992, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  58. copy (first=5088, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  59. copy (first=5184, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  60. copy (first=5280, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  61. copy (first=5376, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  62. copy (first=5472, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  63. copy (first=5568, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  64. copy (first=5664, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  65. copy (first=5760, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  66. copy (first=5856, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  67. copy (first=5952, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  68. copy (first=6048, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  69. copy (first=6144, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  70. copy (first=6240, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  71. copy (first=6336, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  72. copy (first=6432, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  73. copy (first=6528, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  74. copy (first=6624, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  75. copy (first=6720, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  76. copy (first=6816, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  77. copy (first=6912, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
INFO:  78. copy (first=7008, count=1, r=255, g=255, b=255, a=255, blend=1, tex=2e6f5a0)
etc...
INFO: Render commands to flush:
INFO:  1. set viewport (first=0, rect={(0, 0), 1024x576})
INFO:  2. set cliprect (enabled=false, rect={(0, 0), 0x0})
INFO:  3. set draw color (first=0, r=255, g=255, b=255, a=255)
INFO:  4. copy (first=0, count=1, r=255, g=255, b=255, a=255, blend=0, tex=9e41c0)

It looks like it is only flushing like twice per frame, which makes it even weirder why i get no performance gain from it. i even tried to manually flush after each render call and my GPU usage still hovered around 23-25% for 60fps with 640x360 resolution


#8

@icculus Why color is a uniform for OpenGLES2? It breacks batching if I want to get something like this http://particle2dx.com/


#9

I don’t know what exactly you are doing, but in general the batching doesn’t change much about the work the GPU has to do in the end - the same pixels need to be set etc etc.
Considering you cap the framerate at 60 and don’t go anywhere near the limit of the hardware, i wouldn’t expect much of a drop or increase in usage. Try removing the framerate limiter and see if it improves the resulting framerate, not what effect it has on gpu usage.


#10

Thanks for the tip

I tried to take of the frame rate cap and checked again, with 640x360 resolution i was able to get about 180fps without batching and 220fps with batching, so the difference is definitely there.

tho, when i was running it at my laptops native screen resolution which is 1600x900 it wasn’t even able to keep a stable 60, which is why i am a little dissapointed. my laptops GPU is a AMD Radeon R7 M260X. i really just wanted it to hit a stable 60 on my laptops native screen resolution, but that is probably just to much to ask for.