SDL2 render batching -> PLEASE help ... I'm going crazy because I don't understand what I'm doing wrong

Hello SDL2 community!
Okay, first time posting anything at any forum in general, so please forgive me if I did something wrong…

I’m generally not the type to ask others for help, because I always try to solve my problems myself, BUT I ran out of ideas and the documentation isn’t helping anymore.
I already read through every post I could find on render batching / 2D drawing with SDL2 that had something to do with my problem.

My problem - short version: (if someone wants more info -> just ask :slight_smile: I’m really trying to keep this post short )
I’m loading a .png with SDL_Image and drawing the png to a window (in a loop and every frame I draw x|y + 1 more – just for testing right now).
I set “SDL_SetHint(SDL_HINT_RENDER_BATCHING, “1”)”,but it’s still too slow?
Well I guess I’m doing something wrong. / My thought process is wrong.
(in my real code I’m drawing multiple layers of 2D pictures, but when I noticed it worked MUCH too slow, I started reading about render batching and such. Until then I didn’t know anything about render batching)

From what I read about render batching I get that it waits until it gets told to transmit information to the GPU and then it just tells the GPU what to draw where in a giant ‘list’. So I thought because I’m not changing neither the source texture nor the destination window, that it should work extremely fast.
But it seems like I got something wrong there.

My test code: (to keep the test code short I just assume that initialization works)

#include <iostream>
#include <SDL.h>
#include <SDL_image.h>

int main(int argc, char* argv[])


SDL_Window* window = NULL;
SDL_Surface* screenSurface = NULL;
SDL_Texture* graphic = NULL;
SDL_Renderer* gRenderer = NULL;

window = SDL_CreateWindow("test_render_batching", SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, 1000, 500, SDL_WINDOW_SHOWN);
screenSurface = SDL_GetWindowSurface(window);
SDL_FillRect(screenSurface, NULL, SDL_MapRGB(screenSurface->format, 0xFF, 0xFF, 0xFF));

gRenderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_ACCELERATED);
SDL_SetRenderDrawColor(gRenderer, 0xFF, 0xFF, 0xFF, 0xFF);

SDL_Surface* tempSurface = NULL;
tempSurface = IMG_Load("stick.png");
graphic = SDL_CreateTextureFromSurface(gRenderer, tempSurface);

int count = 0;
Uint32 start_time = 0;
Uint32 stop_time = 0;
float offset = 0.0f;
while (true)
	start_time = SDL_GetTicks();
	for (int x = 0; x < count; x++)
		for (int y = 0; y < count; y++)
			SDL_Rect dstrect;
			dstrect.w = 50,
			dstrect.h = 50,
			dstrect.x = x,
			dstrect.y = y;
			SDL_RenderCopyEx(gRenderer, graphic, NULL, &dstrect, 0, 0, SDL_FLIP_NONE);
	stop_time = SDL_GetTicks();
	offset = (1000.0f/60) - (stop_time - start_time);
	if (offset > 0)
		char e;
		std::cout << "amount = " << count << "\n";
		std::cin >> e;

return 0;

The .png to be loaded for this test: (did I do that right with copying the .png?)
(png should be transparent except for the stick)

I would REALLY appreciate it if someone could please sacrifice some time to help my tortured soul lol.
please I’ suffering, I really want to know what I got wrong, because I’m trying to fix it since at least 1.5 month now…

Best regards


Read the doc:

You may not combine this with 3D or the rendering API on this window.

So maybe this is the problem. To clear the screen, simply use SDL_RenderClear

1 Like

Hey Sanette,
first of all thank you for your quick reply! :slight_smile:
I will look into that as soon as I get home after work tomorrow

Okay so I testet it again and it look like that doesn’t make a difference (maybe because I use it before actually creating the render)
Code change:

window = SDL_CreateWindow(“test_render_batching”, SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, 1000, 500, SDL_WINDOW_SHOWN);
screenSurface = SDL_GetWindowSurface(window);
SDL_FillRect(screenSurface, NULL, SDL_MapRGB(screenSurface->format, 0xFF, 0xFF, 0xFF));

gRenderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_ACCELERATED);
SDL_SetRenderDrawColor(gRenderer, 0xFF, 0xFF, 0xFF, 0xFF);

changed to this:

window = SDL_CreateWindow(“test_render_batching”, SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, 1000, 500, SDL_WINDOW_SHOWN);
gRenderer = SDL_CreateRenderer(window, -1, SDL_RENDERER_ACCELERATED);
SDL_SetRenderDrawColor(gRenderer, 0xFF, 0xFF, 0xFF, 0xFF);

Is a result count of ~140 normal?

What hardware and OS is this?

Have you tried profiling it, if you’re using VS then run a profile session and it’ll tell you where It’s spending the time.

If I understand correctly, a count of 140 means that you render 140*140 = 19600 images per frame of 1/60 sec, this sounds good to me.

1 Like

Os: Windows 10
Amd3+ 4ghz cpu (i think “AMD FX 8350 Octa-Core”?)
32 Gb ram
8Gb amd saphire 390x Graphic card ( “”)

Do you mean setting break points in vs and stepping though with “F10”?
If you don’t mean that, then I have no idea what you mean haha

Okay good to know that that’s a good result haha

Would using SDL for the window and opengl for drawing be much faster? Or would that only make a small difference if any at all?

No I mean in VS (I’m using 2019) choosing Debug -> Performance Profiler… -> CPU Usage and then Start, which will then run your program but profile it at the same time.

When you’ve run it enough, you stop the profile session and it will tell you which functions, including SDL ones, it has spent its time in. This may give you a hint or two as to what is causing your slowdown.

EDIT: but as stated by Sanette, maybe your results are reasonable in any case.

1 Like

Thanks I will try that as soon as I get home :+1:

If I get a chance at the weekend, I’ll run your code and see what sort of performance I get.

Okay so right now I’m learning how to use the profiler so thanks for pointing that out to me!

So I thought about it the whole day and I think I’m going to change the render to set up some 1000x1000 textures bevore I start the main loop and I’m going to generate new ones when I need them
I testet the programm and it gives a count of ~50 -> that’s more than enough for now
And generating new ones shouldn’t be a problem speed wise.

So thank’s for your help!!
See you again when I have another problem (hopefully not haha)

before I go:
how much faster would opengl rendering combined with an SDLwindow be?

the only answer is to write your test!
My impression is that it should not improve a lot.

Yeah when I find the time I will test it, but for now that seems good enough.
Thanks again

First of all, what do you mean by Too Slow?

Next, don’t mix integer and floating point math. When you do 1000.0f/60, the compiler should promote 60 to a float, but it’s better to do 1000.0/60.0f explicitly.

Also, why are you doing your own vsync? Your method can still result in tearing. Create the renderer with vsync enabled:


Then it will be locked to the monitor’s refresh rate by the driver, and will work for monitors with refresh rates other than 60 FPS.

Furthermore, if you insist on rolling your own vsync, SDL_Delay() isn’t fine grained enough to get you a steady 60FPS. SDL guarantees it will delay for at least however long you ask, but can (and probably will) delay for slightly longer. Look up SDL’s high resolution counter docs (SDL_GetPerformanceCounter() and SDL_GetPerformanceFrequency() ).

That’s also a weird way to do vsync. You’re resetting offset every frame. Better IMHO to do an accumulator:

double offset = 0.0;
// at start of frame
uint64_t frameStartPerfCounter = SDL_GetPerformanceCounter();
double start_time = // convert frameStartPerfCounter to seconds using SDL_GetPerformanceFrequency()
// at the end of the frame
uint64_t frameEndPerfCounter = SDL_GetPerformanceCounter();
double end_time = // convert frameEndPerfCounter to seconds
offset += (stop_time - start_time);
if(offset >= FRAME_LOCK_TIME) {    // assume FRAME_LOCK_TIME is a double set to 1.0/60.0
    // busy wait on SDL performance counter
    uint64_t performanceAccumulator = 0;
    // FRAME_LOCK_TIME_PERF = SDL_GetPerformanceFrequency() / 60 to get 60 FPS max
    while(SDL_GetPerformanceCounter() - frameStartPerfCounter + performanceAccumulator < FRAME_LOCK_TIME_PERF) {
        uint64_t delayStart = SDL_GetPerformanceCounter();
        uint64_t delayEnd = SDL_GetPerformanceCounter();
        performanceAccumulator += delayEnd - delayStart;    // increase by how long we actually waited
    offset = 0.0;

(untested, but similar to a framerate limiter that I used successfully)

Lastly, printing stuff to the console can be really, really slow. It’s entirely possible your performance problem is because you’re writing to the console every frame.

1 Like

I had the wrongly thought that count=140 would be a low result
Because I’m still learning and I read everywhere that the Gpu ist much faster at rendering that the cpu (in hindsight I should have known better because if I set it to software rendering it gives me count~20)

Thanks for the integer/float problem, I’ll fix that from now on

I read some SDL2 tutorials where that was done, so I kept it thinking that it should be this way, but can’t find the one where I saw that anymore.
Didn’t even think about using vsync in the render
From now on I think I will

Okay I think I understand what your code does.

Yeah I already noticed that console - slowdown stuff
I only write to console at the end of the program to easily read the result count
But yes I know that at least😂

I’m impressed by how fast the SDL community is at helping people (me) and their (my silly) problems haha

I’ll but that all in the program once I’m on the pc

Nearly 20k images per frame at 60 FPS is pretty good. And yep, the GPU is much faster than the CPU.

Re: writing to the console, I misread your code and thought you were writing to the console every frame.

The code I posted works by seeing how long it took to draw the frame, and then delaying by some amount until it exceeds a pretedermined time limit. SDL_Delay(1) tries to delay for 1 millisecond, but will probably delay for a bit longer, so it uses the high resolution counter to find out the actual length of time we were delayed for, and keeps adding it up until it’s more than 16.6666… milliseconds.

It would be better to just use the real vsync support built into SDL.

1 Like

What actually is the difference between SDL_Delay() and std::chrone::sleep()?
If I understand it right then both “stop” the current thread for an amount of time?

SDL_Delay() predates std::chrono by quite a bit. std::chrono::sleep_for essentially makes the same promises that SDL_Delay() does: your game will sleep for at least as long as requested, but will almost certainly sleep a little longer (and possibly a lot longer).

std::chrono::sleep_for doesn’t specify any time granularity that I’m aware of, which is my main concern with using it.

As an aside, seeing a talk where a guy who worked on a specific implementation of std::chrono scoffed at the very existence of std::chrono::high_resolution_clock didn’t fill me with confidence that he understood why it exists and why an explicitly high resolution clock is needed (std::chrono::steady_clock and std::chrono::system_clock may use the same high res counter as ::high_resolution_clock, but aren’t guaranteed to do so).

1 Like

Ah okay thanks for clarifying that