Hi, SDL forums. I’m trying to learn how to use the new SDL3 GPU API to write a simple cross-platform compute-only example, but I’ve been struggling on macOS/MetalLib, while the same code works fine on Windows. Please consider that I’m no expert on either Metal or D3D12, so there’s probably something obvious I’m missing here. I’ve put the code in question here: https://github.com/MVittiS/Sdl3ComputeSample
I tried following TheSpydog’s samples to make a short app that just does a “GPU compute hello world”: take two buffers, add them, and compare the results against the same operations done on the CPU. (I’m aware this is not a robust test because GPUs can opt out of IEEE-754 compliance for better perf, but that’s for another day) On macOS the shader is cross-compiled from HLSL using the open-source dxc
plus Apple’s metal-shadercross
tool to convert the DXIL to MetalLib; on Windows, I use the dxc
that comes with VS2022.
It all seems to work on Windows and the output buffer contains the expected results, but on macOS the output buffer is just zero for all elements. I also get the following error when enabling the Metal validation layers on Xcode:
Invalid device load at offset 18446742974197834624, executing kernel function: "main"
buffer: Out of bounds of user address space, length:1, resident:Read Write
pipeline: "(null)", UID: "4D570A263D1DD1674C87F1A891B1D20D3EEBB28437E1121537AEC7B0BFE1F6F0" encoder: "0", dispatch: 0
* frame #0: main() - /(...)/Sdl3Compute/Src/cs.hlsl:7:21
And cs.hlsl:7:21
is the very first load operation in the compute shader.
1: StructuredBuffer<float> inputA : register(t0, space0);
2: StructuredBuffer<float> inputB : register(t1, space0);
3: RWStructuredBuffer<float> output : register(u0, space1);
4:
5: [numthreads(512, 1, 1)]
6: void main(uint3 tid : SV_DispatchThreadId) {
7: const float a = inputA[tid.x];
8: const float b = inputB[tid.x];
9:
10: output[tid.x] = a + b;
11: }
The D3D validation layers give me no warnings or errors for this program and shader. Metal validation only warns me of the OOB load, but doesn’t tell me anything about mismatched pipelines.
At first I thought that the differences in bind signatures expected between the two shader formats were the issue - from SDL_CreateGPUComputePipeline(), for Metal, the [[buffer]]
attribute is used for both read-only buffers (SRVs, or :register(t*)
) and read-write buffers (UAVs, or :register(u*)
), meaning that the distinction HLSL makes between these two kinds of buffers may be lost on Metal. However, modifying my sample to use all read-write buffers still gives the me same error message - and also works fine on Windows.
At this point, I don’t know if the error lies in metal-shaderconverter
, how I’m invoking it, SDL3_GPU’s Metal code, or some other configuration I may be missing. Any help and suggestions would be appreciated, even if it’s for filing bugs to the appropriate places beyond SDL.