SDL_UpdateTexture faster than SDL_LockTexture

Hello,

I’m working on a game of life implementation to try out SDL3. I’d used SDL2 a while back and wanted to give SDL3 a try.

I’m using the Cpu to fill an array of pixels (up to size 4096*4096) and update a texture with that pixel array once every frame. The SDL3 wiki states that SDL_UpdateTexture is slow and SDL_LockTexture/SDL_UnlockTexture should be preferred. I did some benchmarks and it seems (at least on my computer) SDL_UpdateTexture is a lot faster than locking, memcpy and unlocking (where most time is spent on SDL_UnlockTexture).
Does anyone have any insight on this? Is there another reason to prefer SDL_LockTexture or did I use it wrong?

The source code is on github (ConwaysGameOfLife/src at main · NoHitzz/ConwaysGameOfLife · GitHub). Specifically in conwayApp.h (member function render()) and texture.h.

Both are acceptable methods since you are using SDL_TEXTUREACCESS_STREAMING.

The SDL_UpdateTexture takes several branches depending on what stage of update it is in, so I’m not 100% sure: It looks like SDL_UpdateTexture utilizes SDL_UnlockTexture and SDL_LockTexture further down the line. It surprises me that it might be faster.

I think the shear size that you are requesting to be updated is what slows down the Lock/Unlock version.

I would argue that the biggest speed up that you could do right now is to set the rectangle argument (in either function) to be the size of the screen rather than the entire 4096x4096 area.

4096x4096x3 is about 50 Megabytes. That means your program is likely accessing the RAM sticks many times per update rather than potentially using the CPU’s cache, drastically changing texture rendering time. 1920x1080x3 is only about 6 Megabytes by comparison.

Thank you!
To clarify, I’m not unhappy with the performance, but I’m trying to see if I can squeeze out a little more (I’m also working on a version with multithreaded game state updates). Besides that, I’m writing SDL3 wrappers in the background of this project for future use and would like to gain a better understanding of when to use SDL_LockTexture in general. Most of the information I found online about that function is more than 10 years old, and I’m not sure if it is still relevant.

My game of life implementation relies on texture scaling. The texture is a 1:1 mapping of the game, and it is up/down scaled to the window size. I think this is more performant than writing my own Cpu bound up/down scaler, though it prevents me from reducing the texture size. If i could use a smaller pixel format with low conversion overhead, that might be beneficial (something other than ARGB8888).

The benchmark results at 4k resolution and with a random initial configuration on my 2017, 2.3GHz Intel Core i5, 8Gb Memory, MacBook Pro:

texture update with SDL_UpdateTexture:                     ~13.5ms

texture update with SDL_LockTexture/SDL_UnlockTexture:   
	SDL_LockTexture: 		                               ~0.002ms
	memcpy:              			                       ~8.12ms
	SDL_UnlockTexture: 		                               ~12.38ms

The code for the lock/unlock benchmark:

// Inside conwayApp::render(), replacing:
// gameTexture.update((pixelData + zoomIndexOffset.y * pixelPitch/4), pixelPitch, &zoomedUpdateClip);
SDL_Rect zoomedUpdateClip = {0, zoomIndexOffset.y, pixelPitch/4, zoomedSize};
void* pixels;
int pitch;
textureUpdateTimer.start();
gameTexture.lock(&zoomedUpdateClip, &pixels, &pitch);
memcpy(pixels, (void*)((Uint32*)pixelData + zoomIndexOffset.y*pixelPitch/4), zoomedSize * pixelPitch);
gameTexture.unlock();
textureUpdateTimer.stop();


// Method definitions from texture.h:
int lock(const SDL_Rect* rect, void** pixels, int* pitch) {
    return SDL_LockTexture(m_texture, rect, pixels, pitch);
}
void unlock() {
    SDL_UnlockTexture(m_texture);
}

This is irrelevant to your question, but I thought I’d mention that it is easy to write a traditional Game of Life simulation in shader code, running on the GPU itself. This will hugely outperform anything that involves CPU computation and textures. Here’s an example.

1 Like

Thanks! I’m aware of that. I might also write a Gpu-bound version at some point. The aim of this project is specifically to explore a few Cpu optimizations and write SDL3 wrappers for future emulator projects. These projects will also use Cpu rendering, or at least start out that way. This is kind of a testing ground for that.