SDL_Gpu Cycle Difficulties

I’ve been using SDL_Gpu for some time now, and have been reading Evan’s nice post on cycling. I have this issue where if the application runs too quickly the cycling feature is allocating many extra gigabytes of memory. The cycling in question is on a transfer buffer map/unmap for per-frame sprite batching on a buffer of about 5mb.

My understanding is this can happen when GPU bound and can cause a huge cycle chain to get constructed. Pretty much a severe memory leak. Is this true? If so, what’s the recommended strategy to sync when GPU bound? Would the simplest solution be to simply call SDL_WaitForGPUIdle each frame? Is there a recommended way to use fences for this situation?

In other older APIs I would double, or triple buffer for streaming vertices (sprite batching), and sync on a fence before writing data. But it looks like the cycle design is supposed to alleviate that necessity. How should I go about understanding when a particular resource cycle chain is growing too long, and start waiting on the GPU? It seems like I want to specify a cycle chain capacity limit and have the API wait for an available resource if full. But yeah I’m not sure how I should be thinking about this stuff. Looking for some additional guidance other than simply calling SDL_WaitForGPUIdle.

Response from Evan so far:

Acquiring the swapchain should block on vsync mode or return NULL in non-vsync. If I had to guess it sounds like your render loop is submitting commands regardless of the swapchain status which would cause the CPU to get way ahead of the GPU and consume tons of memory. But we can discuss on the forum.

Indeed yes I am just continuing onto the next frame if the swapchain is NULL, vsync off by default in my case. Would a better approach be to block, or spinlock until swapchain is not NULL? Or maybe some other approach?

For the moment I’m just doing this as a quick workaround:

Uint32 w, h;
SDL_GPUTexture* swapchain_tex;
if (SDL_AcquireGPUSwapchainTexture(app->cmd, app->window, &swapchain_tex, &w, &h) && swapchain_tex) {
	SDL_GPUBlitRegion src = {
		.texture = (SDL_GPUTexture*)cf_texture_handle(cf_canvas_get_target(app->offscreen_canvas)),
		.w = (Uint32)app->canvas_w,
		.h = (Uint32)app->canvas_h,
	};
	SDL_GPUBlitRegion dst = {
		.texture = swapchain_tex,
		.w = w,
		.h = h,
	};
	SDL_GPUBlitInfo blit_info = {
		.source = src,
		.destination = dst,
		.flip_mode = SDL_FLIP_NONE,
		.filter = SDL_GPU_FILTER_NEAREST,
		.cycle = true,
	};
	SDL_BlitGPUTexture(app->cmd, &blit_info);
} else {
	SDL_WaitForGPUIdle(app->device);
}

The best practice is to acquire the swapchain texture as early as possible in the frame. If it is NULL, just skip your render process and submit immediately. This will minimize the amount of pressure you produce after the GPU is already overloaded with work to do.

Will that best practice conflict performance-wise with Apple’s best practices when Metal is being used?

Oh well isn’t that just delightful. I wonder if we should add a DiscardGPUCommandBuffer function that can be called in this situation, that way you could set up all the commands and bail gracefully without giving the GPU pointless work to do if the swapchain isn’t available.

SDL_DiscardGPUCommandBuffer

That sounds pretty good – Something to throw into that else statement in my snippet would probably be easiest for most applications.

Would also be nice to mention this in the docs for SDL_AcquireGPUSwapchainTexture.

The reason Apple recommends not acquiring the drawable until absolutely necessary is because they also recommend using libdispatch semaphores as the method to limit how many frames in flight you have vs just trying to grab an image/drawable off the swapchain and doing nothing if none are available.