SDL3 GPU Compute-only binding issues on Metal

Hi, SDL forums. I’m trying to learn how to use the new SDL3 GPU API to write a simple cross-platform compute-only example, but I’ve been struggling on macOS/MetalLib, while the same code works fine on Windows. Please consider that I’m no expert on either Metal or D3D12, so there’s probably something obvious I’m missing here. I’ve put the code in question here: https://github.com/MVittiS/Sdl3ComputeSample

I tried following TheSpydog’s samples to make a short app that just does a “GPU compute hello world”: take two buffers, add them, and compare the results against the same operations done on the CPU. (I’m aware this is not a robust test because GPUs can opt out of IEEE-754 compliance for better perf, but that’s for another day) On macOS the shader is cross-compiled from HLSL using the open-source dxc plus Apple’s metal-shadercross tool to convert the DXIL to MetalLib; on Windows, I use the dxc that comes with VS2022.

It all seems to work on Windows and the output buffer contains the expected results, but on macOS the output buffer is just zero for all elements. I also get the following error when enabling the Metal validation layers on Xcode:

Invalid device load at offset 18446742974197834624, executing kernel function: "main" 
buffer: Out of bounds of user address space, length:1, resident:Read Write
pipeline: "(null)", UID: "4D570A263D1DD1674C87F1A891B1D20D3EEBB28437E1121537AEC7B0BFE1F6F0" encoder: "0", dispatch: 0
	* frame #0: main() - /(...)/Sdl3Compute/Src/cs.hlsl:7:21

And cs.hlsl:7:21 is the very first load operation in the compute shader.

 1: StructuredBuffer<float> inputA : register(t0, space0);
 2: StructuredBuffer<float> inputB : register(t1, space0);
 3: RWStructuredBuffer<float> output : register(u0, space1);
 4:
 5: [numthreads(512, 1, 1)]
 6: void main(uint3 tid : SV_DispatchThreadId) {
 7:    const float a = inputA[tid.x];
 8:    const float b = inputB[tid.x];
 9:
10:    output[tid.x] = a + b;
11: }

The D3D validation layers give me no warnings or errors for this program and shader. Metal validation only warns me of the OOB load, but doesn’t tell me anything about mismatched pipelines.

At first I thought that the differences in bind signatures expected between the two shader formats were the issue - from SDL_CreateGPUComputePipeline(), for Metal, the [[buffer]] attribute is used for both read-only buffers (SRVs, or :register(t*)) and read-write buffers (UAVs, or :register(u*)), meaning that the distinction HLSL makes between these two kinds of buffers may be lost on Metal. However, modifying my sample to use all read-write buffers still gives the me same error message - and also works fine on Windows.

At this point, I don’t know if the error lies in metal-shaderconverter, how I’m invoking it, SDL3_GPU’s Metal code, or some other configuration I may be missing. Any help and suggestions would be appreciated, even if it’s for filing bugs to the appropriate places beyond SDL.

What happens if you use SDL_shadercross?
(if you’re just using it for offline shader compilation then you can ignore all the stuff in the README about needing to ship a bunch of other libraries)

edit: the link to your source code is broken

It seems to work with SDL_shadercross, though it renames my main function to main0 in the generated MSL source:

#include <metal_stdlib>
#include <simd/simd.h>

using namespace metal;

struct type_StructuredBuffer_float
{
    float _m0[1];
};

struct type_RWStructuredBuffer_float
{
    float _m0[1];
};

kernel void main0(const device type_StructuredBuffer_float& inputA [[buffer(0)]], const device type_StructuredBuffer_float& inputB [[buffer(1)]], device type_RWStructuredBuffer_float& _output [[buffer(2)]], uint3 gl_GlobalInvocationID [[thread_position_in_grid]])
{
    _output._m0[gl_GlobalInvocationID.x] = inputA._m0[gl_GlobalInvocationID.x] + inputB._m0[gl_GlobalInvocationID.x];
}

I wonder what I could do to triage this issue from the metal-shadercross side, but I guess I’m happy to just have a working solution for now. Thanks for the tip!

AFAIK main is a reserved name in MSL (at least it is for vertex and fragment shaders), so SPIRV-cross (which SDL_shadercross uses) outputs main0 instead

Also, SDL_shadercross has the advantage of being created specifically for SDL3 GPU so it knows what stuff needs to get renamed to, etc in a way that SDL3 GPU can find/use/whatever

(sorry, I didn’t realize the repo was private. Changed it to public instead - the working solution is in the use_sdl_shadercross branch, and the original in the main branch)