SDL3 GPU Cycling

VUNDER · April 4, 2025, 8:10am

Cycling just baffles me. I have written in intermediate layer between SDL3 GPU and my application and allow the code to push render passes with targets attached and then end and start new ones like SDL. However when on earth do I cycle? The cycling API seems like a good idea but how does it actually help, I have read the docs but still confuses me.

I have two passes. Clear and draw test. Clear uses back buffer and depth buffer and clears depth to 1.0f and backbuffer to black.
The draw pass draws indexed with about 100 pipeline states and different materials.
Now there is a dependency on both the back buffer and depth buffer between these two passes. I’ve tried cycling targets on only the first use in the frame (no effect on back buffer because its not cyclable, maybe that is the problem?). Ive tried cycling every single call on every target. The screen either shows the mesh for 1 frame then disappears or is always a black screen. With only 1 pass it works.

Can somebody please explain when to cycle? Furthermore Is there a way I just inject a GPU fence into the command buffer like a barrier so the GPU cannot reorder commands across those fences, eg a fence between the two passes should fix it, since Metal (current backend) and most drivers can re-arrange commands on the GPU and thats very likely whats happening. Sometimes the mesh would stay in frame but when I move the camera theres loads of tearing in the mesh and it looks like memory accessing is not being done correctly.

The issue with cycling is, well I NEED the updated cleared resource from that first pass in the second pass, so why on earth would I cycle and create a new one??? That just defeats the point of anything right? That new one is going to have an empty depth buffer so I will get nothing. Have I understood something wrong. No combination of cycling has worked. I want explicit fences between the passes, is that possible without needing multiuple command buffers as then I have to implement my own resource sync between passes and most passes have dependencies. The issue is with the depth buffer by the way I know it, because back buffer works and this works without depth testing. As stated the back buffer works because it can’t be cycled anyway since its a swapchain texture. The only way to make this work at the moment is multiple command buffers and having fences between executes, and me having to manually group passes which can be executed in parrallel on the same command buffer as they don’t write to any targets and only read for example.

sjr · April 4, 2025, 8:58am

You definitely do not need a separate pass just for clearing the screen and depth buffer. For your rendering pass, set the color target’s clear color to whatever and set the load_op to SDL_GPU_LOADOP_CLEAR. Set the rendering pass’ depth target’s clear_depth to 1.0f, and set its load_op to SDL_GPU_LOADOP_CLEAR:

SDL_GPUColorTargetInfo colorTargetInfo = { };
colorTargetInfo.texture = swapchainTexture;
colorTargetInfo.clear_color = (SDL_FColor){ 0.0f, 0.0f, 0.0f, 1.0f };
colorTargetInfo.load_op = SDL_GPU_LOADOP_CLEAR;
colorTargetInfo.store_op = SDL_GPU_STOREOP_STORE;

SDL_GPUDepthStencilTargetInfo depthTargetInfo = { };
depthTargetInfo.clear_depth = 1.0f;
depthTargetInfo.load_op = SDL_GPU_LOADOP_CLEAR;
depthTargetInfo.store_op = SDL_GPU_STOREOP_DONT_CARE;
depthTargetInfo.stencil_load_op = SDL_GPU_LOADOP_DONT_CARE;
depthTargetInfo.stencil_store_op = SDL_GPU_STOREOP_DONT_CARE;
depthTargetInfo.texture = depthTex;

SDL_GPURenderPass *renderPass = SDL_BeginGPURenderPass(cmdBuf, &colorTargetInfo, 1, &depthTargetInfo);

You don’t need to cycle the depth texture anyway (probably) because the GPU is only actually rendering one frame at a time.

You use cycling when you want to be able to update a buffer or whatever that may currently be in use by the GPU. Like, the CPU prepares a command buffer for frame 1 and then submits it. While the GPU works on frame 1, the CPU begins preparing frame 2. How do you handle a situation where, say, the CPU needs to update a vertex buffer while working on frame 2? You can either make the CPU sit and wait while the GPU finishes frame 1 and then change the buffer’s contents, you can carefully keep track of which parts of the buffer the GPU may be using and only update other parts, or you can keep a pool of buffers and cycle through them each frame.

The last option is what SDL GPU’s resource cycling does. Your application sees one vertex buffer, but behind the scenes SDL GPU is keeping 2 or 3 (if you’re using cycling), and cycles to the next one when you map the transfer buffer to upload new vertices. When you bind that buffer, SDL GPU binds the appropriate behind-the-scenes buffer instead. That way the GPU can be chewing on the vertex buffer for frame 1 while the CPU can prepare frame 2 without the application having to explicitly maintain its own pool of buffers etc to cycle through every frame.

edit: I wrote this at 3:30 AM, hopefully it makes sense

VUNDER · April 4, 2025, 10:34am

Ok thank you for your answer. So my passes don’t need explicit synchronisation if they are al done in one frame in one command buffer submit. I double buffer at a higher level, weirdly we use a job system and a job each frame populates the next frame with render instructions which aren’t SDL ones but specific ones I have implemented and we use IDs for targets and no explicit handles or SDL_GPUxxx pointers. The main thread then executes those instructions and waits for them to finish on the main thread while the other jobs update user input (doing this since SDL window stuff for macOS needs to be on the main thread ).

Your explanation is great thank you so cycling is used for double buffering but I don’t do that at SDL level all updates are done in a command buffer submitted before any render calls and I wait on it.

Ok another update. Doing everything in one pass works. But when I add a clear pass it still doesn’t work and I get a black screen. But then switching my Z test to greater than with 0.0f as clear does work (albeit now further away vertices are now rendered).
What could be going on? Yes I have two passes and clear one isn’t really needed. But what if I have more passes in the future, what do I do then?

Oh ok, so the clear pass not drawing anything is the problem then?

sjr · April 4, 2025, 11:53pm

So then the problem sounds like your load and store operations.

For each pass you can say what to do with anything that might already be in the render targets (load, clear, or don’t care), and then what to do with the result when rendering is done (store or don’t care).

If you’re doing multiple render passes to the same render target(s), you probably don’t need to actually create separate pass objects.

If you really need to, I’d do something like
Pass 1:
load: CLEAR <—
store: DONT_CARE
Pass 2:
load: DONT_CARE
store: DONT_CARE
…
Pass X:
load: DONT_CARE
store: STORE <—

For rendering to a texture for use later:
Pass 1:
color target destination: yourtexture (may need cycling!)
load: CLEAR
store: STORE
Pass 2:
color target destination: swapchain
load: CLEAR (this is clearing the swapchain, not our texture)
store: STORE
[use your texture in your draw calls]

edit: if you’re never using the depth texture for anything but depth testing (like, you aren’t doing shadow mapping or something) you should set its store operation to DONT_CARE