RenderCopy Performance / SDL_Texture Speed Expectations

pierreofthefrench · June 12, 2017, 11:23pm

SDL 2.0.4
Hi all, I’ve been trying to gauge expectations of performance with SDL and RenderCopy. I’ve been working on a few engine systems that manipulate screen sized textures using RenderCopy and SDL_TEXTUREACCESS_TARGET. This was working extremely well, but as I scaled up (in texture dimensions and count) the performance tanked. The following code simply creates a red SDL_Texture and copies it to the screen 100 times per logic loop. This results in under 60 fps, which seems extremely low. I would have thought these operations were relatively cheap, as the texture is stored on the GPU, and drawing is a quad per texture?

I am a bit new to SDL 2.X, but am having a tough time finding similar threads or benchmarks to push me towards a better implementation. Any info would be greatly appreciated.

#include <SDL.h>
#include <stdio.h>

const int SCREEN_WIDTH = 1920 * .6;
const int SCREEN_HEIGHT = 1080 * .6;

SDL_Window* gWindow = NULL;
SDL_Renderer* gRenderer = NULL;

int main( int argc, char* args[] )
{
	SDL_Init(SDL_INIT_VIDEO);
	SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "1");
	gWindow = SDL_CreateWindow("Demo", SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, SCREEN_WIDTH, SCREEN_HEIGHT, SDL_WINDOW_SHOWN);
	gRenderer = SDL_CreateRenderer(gWindow, -1, SDL_RENDERER_ACCELERATED);
	SDL_SetRenderDrawColor(gRenderer, 0xFF, 0x0, 0x0, 0xFF);

	SDL_Texture* example_texture = SDL_CreateTexture(gRenderer, SDL_PIXELFORMAT_RGBA8888,
		SDL_TEXTUREACCESS_TARGET, SCREEN_WIDTH, SCREEN_HEIGHT);
	SDL_SetRenderTarget(gRenderer, example_texture);
	SDL_RenderClear(gRenderer);

	Uint32 startTick = SDL_GetTicks();
	Uint32 countedFrames = 0;

	while (true) {

		Uint32 ticks = SDL_GetTicks();

		countedFrames++;
		if (countedFrames % 30 == 0) {
			float avgFPS = (int)countedFrames / ((ticks - startTick) / 1000.f);
			printf("fps %f\n", avgFPS);
		}
		if (countedFrames % 150 == 0) {
			printf("resetting fps counter\n");
			startTick = ticks;
			countedFrames = 0;
		}

		SDL_SetRenderDrawColor(gRenderer, 0xFF, 0xFF, 0xFF, 0xFF);
		SDL_SetRenderTarget(gRenderer, 0);
		SDL_RenderClear(gRenderer);

		for (int i = 0; i < 100; i++) {
			SDL_RenderCopy(gRenderer, example_texture, 0, 0);
		}

		SDL_RenderPresent(gRenderer);
	}

	return 0;
}

ChliHug · June 13, 2017, 7:42am

Your example is an efficient way to copy a target texture 100 times to the default render target, there’s no problem with that code. And yes, this is most likely done with one quad (or two triangles) if you have a hardware accelerated renderer.

Graphics cards are indeed fairly fast, but you have to consider that you’re asking the GPU to move almost 500 megabytes per frame. It’s very possible that your system just can’t copy the red pixels any faster because the memory bandwidth and/or texture units are all saturated. Hard to say without knowing the hardware you’re using.

I tested your example with the screen size changed to 1920x1080 on two machines: A low-end laptop with an integrated Intel card that got around 4 fps (~3.3 GB/s) and a system with a GeForce of the 560 series that got 61 fps (~50 GB/s). I then wrote a the same thing with direct OpenGL calls and got a very similar result. SDL has a very slight overhead which should not really be visible here since the the graphics card is the bottleneck.

pierreofthefrench · June 13, 2017, 1:33pm

Hi ChliHug. Bummer, I was hoping I’d done something non-optimal that could be adjusted. I guess I’d just thought about 3D rendering, and how many textures / pixels are mapped to the screen at a given point in time and assumed it was wildly greater than the screens resolution. But since those are relative the the render planes distance, I guess it’s probably likely to be lower than I thought.

I had been considering trying to rewrite some of it with OpenGL, but it sounds like the results will be similar. I guess my only hope is scaling down the rendering of my lighting system, which certainly improves performance, but lowers edge sharpness (obviously). That, or perhaps limit the number of “lighting source” updates to a low number per-frame.

I appreciate the detailed response.