Someone mentioned I should use render with SDL_RenderGeometry instead of SDL_RenderCopy. So I wrote a quick test this afternoon and saw SDL_RenderCopy is (a little) faster. Did I implement something wrong?
geometry_test.cpp (2.8 KB)
Someone mentioned I should use render with SDL_RenderGeometry instead of SDL_RenderCopy. So I wrote a quick test this afternoon and saw SDL_RenderCopy is (a little) faster. Did I implement something wrong?
geometry_test.cpp (2.8 KB)
On my system, a 2018 Mac Mini with Intel 630 GPU, I see an average of 608 FPS with SDL_RenderCopy()
and an average of 726 FPS with SDL_RenderGeometry()
.
I also changed your code to measure time with SDL_GetPerformanceCounter()
instead of event time (which AFAIK isn’t really meant for this and is much less precise), and compiled it with -O3
instead of -O2
.
I also moved the SDL_Vertex
stuff out of the main loop, since you probably won’t be recomputing the whole thing every frame, but that only gives a small boost.
geometry_test.cpp (3.1 KB)
useGeometry was faster for you? My hardware gets 6.5K with geometry, 7.5 with textures. I’ll test again when SDL3 comes out. I think this test gives a good enough reason why I should stick to RenderCopy for now
Have you tested with SDL_RenderGeometryRaw? SDL_RenderGeometry has some overhead converting the SDL_Vertex data I think, and it winds up calling SDL_RenderGeometryRaw anyway.
It just recasts it to bare floats etc, no conversions happen (at least in SDL3; I didn’t look at SDL2 but I assume it’s the same).
I’ll try it if you want to adjust my cpp file. But I rather just wait for SDL3 before looking into it again
That’s a bad way to test performance as you are running into irrelevant bottlenecks here. You should test how many sprites you can render at 30 or 60fps, not how man fps you can pump out with a small set of triangles. Also make sure you are not bound by fillrate, just choose small sprites without overdrawing them. On a powerful Desktop machine, you should get about 200 000+ sprites with ease with RenderGeometry(), while with RenderCopy may be only 30 000, like a factor 5-10 lower.
I should measure that too. On first glance (SDL2) it looked like RenderCopy was faster but that was because there were sprites off screen that it culled. Making sure everything is visible geometry is faster. I’ll try again when SDL3 becomes stable
geometry_testv2.cpp (2.8 KB)
The reason is this; a RenderCopy() call is much faster than a RenderGeometry() call. What is expensive is the communication wit CPU and GPU. But with RenderGeometry(), this communication happens only once per batch of sprites, because you only call it once per batch. So you should really test how many sprites you can push per batch at 60fps. This number will be 5-10 more than with RenderCopy().
Conversely, if you call RenderGeometry() for each sprite (or just a few sprites), then RenderCopy() will be much faster. So there is this trade-off. For that reason, the RenderGeometry() sprite batch should be as big as possible, not just few dozen sprites.
Well, you could just look at the source. You’ll see I only do one SDL_RenderGeometry call per frame
In my real code I’ll need different colors for the text I render. I suppose I should reserve a chunk of memory and call SDL_RenderGeometry once per color?
If you set the right modulation mode on the texture, you can draw it all in one call. Make the letters in the texture white, and then change the color in SDL_Vertex
Huh, I completely forgot about that when I wrote the test. It would be much more accurate if I tested multiple colors and used that
I’m going to switch to SDL_RenderGeometry (after SDL3 becomes more stable, window startup is still jarring)