Inexplainable race condition in SDL Renderer

I have a weird problem with a race condition in the SDL Renderer. This leads to content not being drawn under certain circumstances. The weird thing about it:
It seems that the data is correctly copied to uiBuffer. It I force repaint by externally setting blitUI it will be painted to the screen without having to run the copy-loop from main thread again. Does anyone have an idea what might be wrong here?

Full application is at http://previous.sourceforge.net. Relevant code is in trunk/src/fast_screen.c.

This is my code (simplified):

SDL_Surface*        sdlscrn;
SDL_Renderer*       sdlRenderer;
static void*        uiBuffer;
static void*        uiBufferTmp;
static SDL_atomic_t blitUI;
static SDL_SpinLock uiBufferLock;
static uint32_t     mask;
static SDL_Rect     screenRect;

Thread 0 (main):

SDL_LockSurface(sdlscrn);
int     count = sdlscrn->w * sdlscrn->h;
uint32_t* dst = (uint32_t*)uiBuffer;
uint32_t* src = (uint32_t*)sdlscrn->pixels;
SDL_AtomicLock(&uiBufferLock);
for(int i = count; --i >= 0; src++)
    *dst++ = *src == mask ? 0 : *src;
SDL_AtomicSet(&blitUI, 1);
SDL_AtomicUnlock(&uiBufferLock);
SDL_UnlockSurface(sdlscrn);

Thread 1 (draw):

SDL_Texture* uiTexture;
uint32_t r, g, b, a;
uint32_t format;
int      d;

SDL_RenderSetLogicalSize(sdlRenderer, width, height);
    
uiTexture = SDL_CreateTexture(sdlRenderer, SDL_PIXELFORMAT_UNKNOWN, SDL_TEXTUREACCESS_STREAMING, width, height);
SDL_SetTextureBlendMode(uiTexture, SDL_BLENDMODE_BLEND);
    
fbTexture = SDL_CreateTexture(sdlRenderer, SDL_PIXELFORMAT_UNKNOWN, SDL_TEXTUREACCESS_STREAMING, width, height);
SDL_SetTextureBlendMode(fbTexture, SDL_BLENDMODE_NONE);
    
SDL_QueryTexture(uiTexture, &format, &d, &d, &d);
SDL_PixelFormatEnumToMasks(format, &d, &r, &g, &b, &a);
mask = g | a;
sdlscrn     = SDL_CreateRGBSurface(SDL_SWSURFACE, width, height, 32, r, g, b, a);
uiBuffer    = malloc(sdlscrn->h * sdlscrn->pitch);
uiBufferTmp = malloc(sdlscrn->h * sdlscrn->pitch);

SDL_FillRect(sdlscrn, NULL, mask);

while (doRepaint) {
    bool updateUI = false;

    SDL_AtomicLock(&uiBufferLock);
    if (SDL_AtomicSet(&blitUI, 0)) {
        memcpy(uiBufferTmp, uiBuffer, sdlscrn->h * sdlscrn->pitch);
        updateUI = true;
    }
    SDL_AtomicUnlock(&uiBufferLock);

    if (updateUI) {
        SDL_UpdateTexture(uiTexture, NULL, uiBufferTmp, sdlscrn->pitch);
        SDL_RenderClear(sdlRenderer);
        SDL_RenderCopy(sdlRenderer, uiTexture, NULL, &screenRect);
        SDL_RenderPresent(sdlRenderer);
    } else {
        SDL_Delay(10);
    }
}

[insert usual “Here be dragons” warning when it comes to multi-threading, locking, and race conditions]

What thread is creating the renderer? Does it have to be on the main thread for your platform? The docs say that SDL_Renderer isn’t meant to be used from multiple threads, so create and use it from your draw thread.

Another thing to be aware of is that SDL_UpdateTexture() is relatively slow, and you should use SDL_LockTexture() / SDL_UnlockTexture() instead.

The renderer is created in the main thread. My platform does not support creating it in a secondary thread and things should work on all platforms anyway. Does this mean to fix this issue I would have to re-structure my code to do the rendering in the main thread? That would require a big effort.

I do not understand the second part of your message. How would I use SDL_LockTexture() and SDL_UnlockTexture() to replace SDL_UpdateTexture()?

All rendering must be done in the thread in which the renderer was created. You cannot call SDL_CreateRenderer() in one thread and (for example) SDL_RenderCopy() in a different thread.

Does this mean to fix this issue I would have to re-structure my code to do the rendering in the main thread?

Yes, you need to do window creation, rendering and also input handling (SDL_PollEvent() and friends) in the main thread. AFAIK this is a limitation of several operating systems (or their windowing systems) supported by SDL (and not SDL itself).
Modifying the sdlcrn SDL_Surface in another thread should be safe, as long as you do the SDL_UpdateTexture() in the main thread.

So you’d basically have move the logic you currently have in main thread to Thread 1, and vice versa - possibly (assuming you’re currently getting events in the main thread, before or while modifying sdlcrn) with an additional step of getting all currently available events in the main thread (probably after drawing?) and putting them in your own (thread-safe) queue that Thread 1 can consume.

Honest question: If SDL_LockTexture() is faster than SDL_UpdateTexture(), why does SDL_UpdateTexture() not just use SDL_LockTexture() (+ memcpy()), at least for SDL_TEXTUREACCESS_STREAMING textures?

1 Like

That would require quite some re-write. Is Audio-I/O also affected from these restrictions? Interestingly besides this minor race-condition of unknown source rendering seems to work on macOS, Windows and at least some Linux distros.

What backend renderer are you using? If it’s the SDL2 software renderer I’m not surprised that you are finding it works, but if it’s OpenGL (which I would expect it to be in Linux, and possibly the other platforms you mention) that is surprising, because OpenGL is highly reliant on thread-local variables.

I’m creating it this way:
sdlRenderer = SDL_CreateRenderer(sdlWindow, -1, SDL_RENDERER_ACCELERATED | SDL_RENDERER_PRESENTVSYNC);

I guess in this case it will use platform specific default renderer? It seems to use Metal on my Mac.

Yes, I would expect it to be Direct3D on Windows, Metal on Mac and OpenGL on Linux. Because I’m using shaders and 3D rendering I use a hint to force it to use OpenGL everywhere, but that makes the thread-affinity particularly critical.

There should be a way to check if some platform supports rendering in a secondary thread. Then at least those platforms with support can benefit from multithreading.

Running the main tasks of my application in a secondary thread and passing all events to it is hardly feasible and would introduce lots of complexity and overhead. On the other hand doing those tasks plus rendering on the main thread will steal quite some CPU time from the main tasks.

In my app the ‘main’ thread does all the rendering and a secondary ‘worker’ thread does the computational heavy lifting. I use the SDL2 events mechanism to pass messages from the worker thread to the main thread (the worker thread calls SDL_PushEvent() and the main thread calls SDL_PeepEvents()). I don’t find it overly complex.

I haven’t looked at the implementation, so IDK.

However, the docs for SDL_UpdateTexture() say:

This is a fairly slow function, intended for use with static textures that do not change often.
If the texture is intended to be updated often, it is preferred to create the texture as streaming and use the locking functions referenced below. While this function will work with streaming textures, for optimization reasons you may not get the pixels back if you lock the texture afterward.

So instead of copying memory from source to destination and then doing SDL_UpdateTexture(), you’d call SDL_LockTexture(), then copy from the source to the pointer given to you by SDL_LockTexture(), then call SDL_UnlockTexture().

Thinking about it I see two main issues when swapping main and secondary thread:

  • Getting events at a frequency of 200 Hz while still using VSYNC for rendering (SDL_RenderPresent will sleep until next VSYNC which on my disaply means 60 Hz frequency).
  • Forwarding events to the secondary thread (some events are meant for main thread and some for secondary thread). Filtering event types won’t help because for example some key combinations are meant for main thread while most are for secondary thread. Maybe I need to get all events on the main thread and create a thread safe queue to pass some events to the secondary thread.

The way I tackle that in my app is firstly to use (as recommended) a while loop to poll the events, so the event queue is always completely drained each frame. Secondly, having handled all the pending events I use SDL_GetTicks() to get an idea of how much time is left before the next VSYNC; if it’s a significant proportion of a frame I continue to poll for further events rather than waste time waiting in SDL_RenderPresent().

I’ve structured my code so all events are handled in the main thread.

How do you know when the next VSYNC will occur? Doesn’t the time between VSYNC vary between systems? It is impossible for me to handle all evants on the main thread. For example key presses, mouse button presses and mouse motion events need to be forwarded to the secondary thread (my application is an emulator). Keyboard shortcuts on the other hand need to be processed from the main thread.

It seems like SDL has pretty much accepted this limitation …

https://wiki.libsdl.org/FAQDevelopment#can_i_call_sdl_video_functions_from_multiple_threads

Can I call SDL video functions from multiple threads?
No, most graphics back ends are not thread-safe, so you should only call SDL video functions from the main thread of your application.

https://wiki.libsdl.org/SDL_PollEvent

SDL_PollEvent …
As this function may implicitly call SDL_PumpEvents(), you can only call this function in the thread that set the video mode.

https://wiki.libsdl.org/SDL_PumpEvents

SDL_PumpEvents …
WARNING: This should only be run in the thread that initialized the video subsystem, and for extra safety, you should consider only doing those things on the main thread in any case.

Is it necessary to process events at 200 Hz? Would anyone notice if you processed them at 60 Hz (monitor refresh rate)? I’ve read that gamers often disable VSYNC to avoid “input lag” so perhaps you could do that too?

Of course. One could attempt to measure it first, not exactly difficult (there are a few systems with variable frame rate, but as somebody who specialised in video for most of my career that’s too stupid to allow for!). But I just make an assumption that it will probably be between 50 fps and 75 fps and choose a period which works acceptably well over that range.

My application is similar, but fortunately it’s acceptable to poll for events like keyboard and mouse input, so I just use simple (FIFO) queues which are written from the main thread and tested/read from the worker thread.

I just restructured my application so that the rendering is done from the main thread. I did it just for testing, and did not re-implement all features. But I can confirm, that the problem still exists. The problem is not about threads. So back to the start. Does anyone have an idea what might cause this?

did you try SDL_LockTexture() ?